Setting up the SageMaker HyperPod observability add-on
The following list describes the prerequisites for setting up the observability add-on.
To have metrics for your Amazon SageMaker HyperPod (SageMaker HyperPod) cluster sent to a Amazon Managed Service for Prometheus workspace and to optionally view them in Amazon Managed Grafana, first attach the following managed policies and permissions to your console role.
-
Enable AWS IAM Identity Center (IAM Identity Center) to use Amazon Managed Grafana. If IAM Identity Center isn’t already enabled in your account, see Getting started with IAM Identity Center. Additionally, create at least one user in the IAM Identity Center.
-
Add the following policies and permissions to your role.
-
AWS managed policy: AmazonSageMakerHyperPodObservabilityAdminAccess
-
AWS managed policy: AWSGrafanaWorkspacePermissionManagementV2
-
Additional permissions to set up required IAM roles for Amazon Managed Grafana and Amazon Elastic Kubernetes Service add-on access:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "CreateRoleAccess", "Effect": "Allow", "Action": [ "iam:CreateRole", "iam:CreatePolicy", "iam:AttachRolePolicy", "iam:ListRoles" ], "Resource": [ "arn:aws:iam::*:role/service-role/AmazonSageMakerHyperPodObservabilityGrafanaAccess*", "arn:aws:iam::*:role/service-role/AmazonSageMakerHyperPodObservabilityAddonAccess*", "arn:aws:iam::*:policy/service-role/HyperPodObservabilityAddonPolicy*", "arn:aws:iam::*:policy/service-role/HyperPodObservabilityGrafanaPolicy*" ] }, { "Sid": "IAMGrafanaPassRoleAccess", "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": "arn:aws:iam::*:role/service-role/AmazonSageMakerHyperPodObservabilityGrafanaAccess*", "Condition": { "StringLike": { "iam:PassedToService": [ "grafana.amazonaws.com" ] } } }, { "Sid": "IAMEKSPassRoleAccess", "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": "arn:aws:iam::*:role/service-role/AmazonSageMakerHyperPodObservabilityAddonAccess*", "Condition": { "StringLike": { "iam:PassedToService": [ "pods.eks.amazonaws.com" ] } } }, { "Sid": "IAMGetRoleAccess", "Effect": "Allow", "Action": "iam:GetRole", "Resource": [ "arn:aws:iam::*:role/service-role/AmazonSageMakerHyperPodObservabilityAddonAccess*" ] } ] }
-
Additional permissions needed to manage IAM Identity Center users for Amazon Managed Grafana:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "SSOAccess", "Effect": "Allow", "Action": [ "sso:ListProfileAssociations", "sso-directory:SearchUsers", "sso-directory:SearchGroups", "sso:AssociateProfile", "sso:DisassociateProfile" ], "Resource": [ "*" ] } ] }
-
Additional permissions needed to remove and update pod identity association for the add-on:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "EKSPodIdentity", "Effect": "Allow", "Action": [ "eks:DeletePodIdentityAssociation", "eks:UpdatePodIdentityAssociation" ], "Resource": "*" } ] }
-
After you ensure that you have met the above prerequisites, you can install the observability add-on.
To quickly install the observability add-on
Open the Amazon SageMaker AI console at https://console.aws.amazon.com/sagemaker/
. -
Go to your cluster's details page.
-
On the Dashboard tab, locate the add-on named HyperPod Monitoring & Observability, and choose Quick install.
To do a custom-install of the observability add-on
-
Go to your cluster's details page.
-
On the Dashboard tab, locate the add-on named HyperPod Monitoring & Observability, and choose Custom install.
-
Specify the metrics categories that you want to see. For more information about these metrics categories, see SageMaker HyperPod cluster metrics.
-
Specify whether you want to enable Amazon CloudWatch Logs.
-
Specify whether you want the service to create a new Amazon Managed Service for Prometheus workspace.
-
To be able to view the metrics in Amazon Managed Grafana dashboards, check the box labeled Use an Amazon Managed Grafana workspace. You can specify your own workspace or let the service create a new one for you.
Note
Amazon Managed Grafana isn't available in all AWS Regions in which Amazon Managed Service for Prometheus is available. However, you can set up a Grafana workspace in any AWS Region and configure it to get metrics data from a Prometheus workspace that resides in a different AWS Region. For information, see Use AWS data source configuration to add Amazon Managed Service for Prometheus as a data source and Connect to Amazon Managed Service for Prometheus and open-source Prometheus data sources.