Dashboard setup
Use the following information to get set up with Amazon SageMaker HyperPod Amazon CloudWatch
                Observability EKS add-on. This sets you up with a detailed visual dashboard that
                provides a view into metrics for your EKS cluster hardware, team allocation, and
                tasks.
If you are having issues setting up, please see Troubleshoot for known troubleshooting solutions.
                HyperPod Amazon CloudWatch
                        Observability EKS add-on prerequisites
                The following section includes the prerequisites needed before installing the
                    Amazon EKS Observability add-on.
                
                     
                     
                - 
                        
Ensure that you have the minimum permission policy for
                            HyperPod cluster administrators, in IAM users for
                cluster admin.
                     - 
                        
Attach the CloudWatchAgentServerPolicy IAM policy to
                            your worker nodes. To do so, enter the following command. Replace
                                    my-worker-node-role with
                            the IAM role used by your Kubernetes worker nodes.
                        aws iam attach-role-policy \
--role-name my-worker-node-role \
--policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
                     
 
             
                HyperPod Amazon CloudWatch
                        Observability EKS add-on setup
                Use the following options to set up the Amazon SageMaker HyperPod Amazon CloudWatch
                    Observability EKS add-on.
                
                    - Setup using the SageMaker AI console
 - 
                            
The following permissions are required for setup and visualizing
                                the HyperPod task governance dashboard. This section
                                expands upon the permissions listed in IAM users for
                cluster admin. 
                            To manage task governance, use the sample policy:
                            JSON
- JSON
 
     
         
        - 
             
                
                
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:ListClusters",
                "sagemaker:DescribeCluster",
                "sagemaker:ListComputeQuotas",
                "sagemaker:CreateComputeQuota",
                "sagemaker:UpdateComputeQuota",
                "sagemaker:DescribeComputeQuota",
                "sagemaker:DeleteComputeQuota",
                "sagemaker:ListClusterSchedulerConfigs",
                "sagemaker:DescribeClusterSchedulerConfig",
                "sagemaker:CreateClusterSchedulerConfig",
                "sagemaker:UpdateClusterSchedulerConfig",
                "sagemaker:DeleteClusterSchedulerConfig",
                "eks:ListAddons",
                "eks:CreateAddon",
                "eks:DescribeAddon",
                "eks:DescribeCluster",
                "eks:DescribeAccessEntry",
                "eks:ListAssociatedAccessPolicies",
                "eks:AssociateAccessPolicy",
                "eks:DisassociateAccessPolicy"
            ],
            "Resource": "*"
        }
    ]
}
             
         
    
 
 
                            To grant permissions to manage Amazon CloudWatch Observability Amazon EKS and
                                view the HyperPod cluster dashboard through the SageMaker AI
                                console, use the sample policy below:
                            JSON
- JSON
 
     
         
        - 
             
                
                
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "eks:ListAddons",
                "eks:CreateAddon",
                "eks:UpdateAddon",
                "eks:DescribeAddon",
                "eks:DescribeAddonVersions",
                "sagemaker:DescribeCluster",
                "sagemaker:DescribeClusterNode",
                "sagemaker:ListClusterNodes",
                "sagemaker:ListClusters",
                "sagemaker:ListComputeQuotas",
                "sagemaker:DescribeComputeQuota",
                "sagemaker:ListClusterSchedulerConfigs",
                "sagemaker:DescribeClusterSchedulerConfig",
                "eks:DescribeCluster",
                "cloudwatch:GetMetricData",
                "eks:AccessKubernetesApi"
            ],
            "Resource": "*"
        }
    ]
}
             
         
    
 
 
                            Navigate to the Dashboard tab in the
                                SageMaker HyperPod console to install the Amazon CloudWatch Observability EKS. To
                                ensure task governance related metrics are included in the
                                    Dashboard, enable the Kueue metrics
                                checkbox. Enabling the Kueue metrics enables CloudWatch
                                    Metrics costs, after free-tier limit is
                                reached. For more information, see Metrics in
                                    Amazon CloudWatch
                                    Pricing.
                         
                    - Setup using the EKS AWS CLI
 - 
                            
Use the following EKS AWS CLI command to install the add-on:
                            aws eks create-addon --cluster-name cluster-name 
--addon-name amazon-cloudwatch-observability 
--configuration-values "configuration json"
                            Below is an example of the JSON of the configuration
                                values:
                            {
    "agent": {
        "config": {
            "logs": {
                "metrics_collected": {
                    "kubernetes": {
                        "kueue_container_insights": true,
                        "enhanced_container_insights": true
                    },
                    "application_signals": { }
                }
            },
            "traces": {
                "traces_collected": {
                    "application_signals": { }
                }
            }
        },
    },
}
                         
                    - Setup using the EKS Console UI
 - 
                            
- 
                                    
Navigate to the EKS console.
                                 - 
                                    
Choose your cluster.
                                 - 
                                    
Choose Add-ons.
                                 - 
                                    
Find the Amazon CloudWatch Observability
                                        add-on and install. Install version >= 2.4.0 for the add-on.
                                    
                                 - 
                                    
Include the following JSON, Configuration values:
                                    {
    "agent": {
        "config": {
            "logs": {
                "metrics_collected": {
                    "kubernetes": {
                        "kueue_container_insights": true,
                        "enhanced_container_insights": true
                    },
                    "application_signals": { }
                },
            },
            "traces": {
                "traces_collected": {
                    "application_signals": { }
                }
            }
        },
    },
}
                                 
 
                         
                
                Once the EKS Observability add-on has been successfully installed, you can
                    view your EKS cluster metrics under the HyperPod console
                        Dashboard tab.