

# HyperPod in Studio
<a name="sagemaker-hyperpod-studio"></a>

You can launch machine learning workloads on Amazon SageMaker HyperPod clusters and view HyperPod cluster information in Amazon SageMaker Studio. The increased visibility into cluster details and hardware metrics can help your team identify the right candidate for your pre-training or fine-tuning workloads. 

A set of commands are available to help you get started when you launch Studio IDEs on a HyperPod cluster. You can work on your training scripts, use Docker containers for the training scripts, and submit jobs to the cluster, all from within the Studio IDEs. The following sections provide information on how to set this up, how to discover clusters and monitor their tasks, how to view cluster information, and how to connect to HyperPod clusters in IDEs within Studio.

**Topics**
+ [

# Setting up HyperPod in Studio
](sagemaker-hyperpod-studio-setup.md)
+ [

# HyperPod tabs in Studio
](sagemaker-hyperpod-studio-tabs.md)
+ [

# Connecting to HyperPod clusters and submitting tasks to clusters
](sagemaker-hyperpod-studio-open.md)
+ [

# Troubleshooting
](sagemaker-hyperpod-studio-troubleshoot.md)

# Setting up HyperPod in Studio
<a name="sagemaker-hyperpod-studio-setup"></a>

You need to set up the clusters depending on your choice of the cluster orchestrator to access your clusters through Amazon SageMaker Studio. In the following sections, choose the setup that matches with your orchestrator.

The instructions assume that you already have your cluster set up. For information on the cluster orchestrators and how to set up, start with the HyperPod orchestrator pages:
+  [Orchestrating SageMaker HyperPod clusters with SlurmSlurm orchestration](sagemaker-hyperpod-slurm.md) 
+  [Orchestrating SageMaker HyperPod clusters with Amazon EKS](sagemaker-hyperpod-eks.md) 

**Topics**
+ [

# Setting up a Slurm cluster in Studio
](sagemaker-hyperpod-studio-setup-slurm.md)
+ [

# Setting up an Amazon EKS cluster in Studio
](sagemaker-hyperpod-studio-setup-eks.md)

# Setting up a Slurm cluster in Studio
<a name="sagemaker-hyperpod-studio-setup-slurm"></a>

The following instructions describe how to set up a HyperPod Slurm cluster in Studio.

1. Create a domain or have one ready. For information on creating a domain, see [Guide to getting set up with Amazon SageMaker AI](gs.md).

1. (Optional) Create and attach a custom FSx for Lustre volume to your domain. 

   1. Ensure that your FSx Lustre file system exists in the same VPC as your intended domain, and is in one of the subnets present in the domain.

   1. You can follow the instructions in [Adding a custom file system to a domain](domain-custom-file-system.md). 

1. (Optional) We recommend that you add tags to your clusters to ensure a more smooth workflow. For information on how to add tags, see [Edit a SageMaker HyperPod cluster](sagemaker-hyperpod-operate-slurm-console-ui.md#sagemaker-hyperpod-operate-slurm-console-ui-edit-clusters) to update your cluster using the SageMaker AI console.

   1. Tag your FSx for Lustre file system to your Studio domain. This will help you identify the file system while launching your Studio spaces. To do so, add the following tag to your cluster to identify it with the FSx filesystem ID, `fs-id`. 

      Tag Key = “`hyperpod-cluster-filesystem`”, Tag Value = “`fs-id`”.

   1. Tag your [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html) workspace to your Studio domain. This will be used to quickly link to your Grafana workspace directly from your cluster in Studio. To do so, add the following tag to your cluster to identify it with your Grafana workspace ID, `ws-id`.

      Tag Key = “`grafana-workspace`”, Tag Value = “`ws-id`”.

1. Add the following permission to your execution role. 

   For information on SageMaker AI execution roles and how to edit them, see [Understanding domain space permissions and execution roles](execution-roles-and-spaces.md). 

   To learn how to attach policies to an IAM user or group, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "ssm:StartSession",
                   "ssm:TerminateSession"
               ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": [
                   "sagemaker:CreateCluster",
                   "sagemaker:ListClusters"
               ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": [
                   "cloudwatch:PutMetricData",
                   "cloudwatch:GetMetricData"
               ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": [
                   "sagemaker:DescribeCluster",
                   "sagemaker:DescribeClusterNode",
                   "sagemaker:ListClusterNodes",
                   "sagemaker:UpdateCluster",
                   "sagemaker:UpdateClusterSoftware"
               ],
               "Resource": "arn:aws:sagemaker:us-east-1:111122223333:cluster/*"
           }
       ]
   }
   ```

------

1. Add a tag to this IAM role, with Tag Key = “`SSMSessionRunAs`” and Tag Value = “`os user`”. The `os user` here is the same user that you setup for the Slurm cluster. Manage access to SageMaker HyperPod clusters at an IAM role or user level by using the Run As feature in [AWS Systems Manager Agent (SSM Agent)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html). With this feature, you can start each SSM session using the operating system (OS) user associated to the IAM role or user. 

   For information on how to add tags to your execution role, see [Tag IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags_roles.html).

1. [Turn on Run As support for Linux and macOS managed nodes](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-preferences-run-as.html). The Run As settings are account wide and is required for all SSM sessions to start successfully.

1. (Optional) [Restrict task view in Studio for Slurm clusters](#sagemaker-hyperpod-studio-setup-slurm-restrict-tasks-view). For information on viewable tasks in Studio, see [Tasks](sagemaker-hyperpod-studio-tabs.md#sagemaker-hyperpod-studio-tabs-tasks).

In Amazon SageMaker Studio you can navigate to view your clusters in HyperPod clusters (under Compute).

## Restrict task view in Studio for Slurm clusters
<a name="sagemaker-hyperpod-studio-setup-slurm-restrict-tasks-view"></a>

You can restrict users to view Slurm tasks that are authorized to view, without requiring manual input of namespaces or additional permissions checks. The restriction is applied based on the users’ IAM role, providing a streamlined and secure user experience. The following section provides information on how to restrict task view in Studio for Slurm clusters. For information on viewable tasks in Studio, see [Tasks](sagemaker-hyperpod-studio-tabs.md#sagemaker-hyperpod-studio-tabs-tasks). 

All Studio users can view, manage, and interact with all Slurm cluster tasks by default. To restrict this, you can manage access to SageMaker HyperPod clusters at an IAM role or user level by using the **Run As** feature in [AWS Systems Manager Agent (SSM Agent)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html).

You can do this by tagging IAM roles with specific identifiers, such as their username or group. When a user accesses Studio, the Session Manager uses the Run As feature to execute commands as a specific Slurm user account that matches their IAM role tags. The Slurm configuration can be set up to limit task visibility based on the user account. The Studio UI will automatically filter tasks visible to that specific user account when commands are executed through the Run As feature. Once set up, each user assuming the role with the specified identifiers will have those Slurm tasks filtered based on the Slurm configuration. For information on how to add tags to your execution role, see [Tag IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags_roles.html).

# Setting up an Amazon EKS cluster in Studio
<a name="sagemaker-hyperpod-studio-setup-eks"></a>

The following instructions describe how to set up an Amazon EKS cluster in Studio.

1. Create a domain or have one ready. For information on creating a domain, see [Guide to getting set up with Amazon SageMaker AI](gs.md).

1. Add the following permission to your execution role. 

   For information on SageMaker AI execution roles and how to edit them, see [Understanding domain space permissions and execution roles](execution-roles-and-spaces.md). 

   To learn how to attach policies to an IAM user or group, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "DescribeHyerpodClusterPermissions",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:DescribeCluster"
               ],
               "Resource": "arn:aws:sagemaker:us-east-1:111122223333:cluster/cluster-name"
           },
           {
               "Effect": "Allow",
               "Action": "ec2:Describe*",
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": [
                   "ecr:CompleteLayerUpload",
                   "ecr:GetAuthorizationToken",
                   "ecr:UploadLayerPart",
                   "ecr:InitiateLayerUpload",
                   "ecr:BatchCheckLayerAvailability",
                   "ecr:PutImage"
               ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
                   "Action": [
                       "cloudwatch:PutMetricData",
                       "cloudwatch:GetMetricData"
                       ],
               "Resource": "*"
           },
           {
               "Sid": "UseEksClusterPermissions",
               "Effect": "Allow",
               "Action": [
                   "eks:DescribeCluster",
                   "eks:AccessKubernetesApi",
                   "eks:DescribeAddon"
               ],
               "Resource": "arn:aws:eks:us-east-1:111122223333:cluster/cluster-name"
           },
           {
               "Sid": "ListClustersPermission",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:ListClusters"
               ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": [
                   "ssm:StartSession",
                   "ssm:TerminateSession"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. [Grant IAM users access to Kubernetes with EKS access entries](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html).

   1. Navigate to the Amazon EKS cluster associated with your HyperPod cluster.

   1. Choose the **Access** tab and [create an access entry](https://docs.aws.amazon.com/eks/latest/userguide/creating-access-entries.html) for the execution role you created. 

      1. In step 1, Select the execution role you created above in the **IAM** principal dropdown.

      1. In step 2, select a policy name and select an access scope that you want the users to have access to. 

1. (Optional) To ensure a more smooth experience, we recommend that you add tags to your clusters. For information on how to add tags, see [Edit a SageMaker HyperPod cluster](sagemaker-hyperpod-operate-slurm-console-ui.md#sagemaker-hyperpod-operate-slurm-console-ui-edit-clusters) to update your cluster using the SageMaker AI console.

   1. Tag your [Amazon Managed Grafana](https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Service-Grafana.html) workspace to your Studio domain. This will be used to quickly link to your Grafana workspace directly from your cluster in Studio. To do so, add the following tag to your cluster to identify it with your Grafana workspace ID, `ws-id`.

     Tag Key = “`grafana-workspace`”, Tag Value = “`ws-id`”.

1. (Optional) [Restrict task view in Studio for EKS clusters](#sagemaker-hyperpod-studio-setup-eks-restrict-tasks-view). For information on viewable tasks in Studio, see [Tasks](sagemaker-hyperpod-studio-tabs.md#sagemaker-hyperpod-studio-tabs-tasks).

## Restrict task view in Studio for EKS clusters
<a name="sagemaker-hyperpod-studio-setup-eks-restrict-tasks-view"></a>

You can restrict Kubernetes namespace permissions for users, so that they will only have access to view tasks belonging to a specified namespace. The following provides information on how to restrict the task view in Studio for EKS clusters. For information on viewable tasks in Studio, see [Tasks](sagemaker-hyperpod-studio-tabs.md#sagemaker-hyperpod-studio-tabs-tasks). 

Users will have visibility to all EKS cluster tasks by default. You can restrict users’ visibility for EKS cluster tasks to specified namespaces, ensuring that users can access the resources they need while maintaining strict access controls. You will need to provide the namespace for the user to display jobs of that namespace once the following is set up.

Once the restriction is applied, you will need to provide the namespace to the users assuming the role. Studio will only display the jobs of the namespace once the user provides inputs namespace they have permissions to view in the **Tasks** tab. 

The following configuration allows administrators to grant specific, limited access to data scientists for viewing tasks within the cluster. This configuration grants the following permissions:
+ List and get pods
+ List and get events
+ Get Custom Resource Definitions (CRDs)

YAML Configuration

```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pods-events-crd-cluster-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["get", "list"]
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pods-events-crd-cluster-role-binding
subjects:
- kind: Group
  name: pods-events-crd-cluster-level
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: pods-events-crd-cluster-role
  apiGroup: rbac.authorization.k8s.io
```

1. Save the YAML configuration to a file named `cluster-role.yaml`.

1. Apply the configuration using [https://kubernetes.io/docs/reference/kubectl/](https://kubernetes.io/docs/reference/kubectl/):

   ```
   kubectl apply -f cluster-role.yaml
   ```

1. Verify the configuration:

   ```
   kubectl get clusterrole pods-events-crd-cluster-role
   kubectl get clusterrolebinding pods-events-crd-cluster-role-binding
   ```

1. Assign users to the `pods-events-crd-cluster-level` group through your identity provider or IAM.

# HyperPod tabs in Studio
<a name="sagemaker-hyperpod-studio-tabs"></a>

In Amazon SageMaker Studio you can navigate to one of your clusters in **HyperPod clusters** (under **Compute**) and view your list of clusters. The displayed clusters contain information like tasks, hardware metrics, settings, and metadata details. This visibility can help your team identify the right candidate for your pre-training or finetuning workloads. The following sections provide information on each type of information.

## Tasks
<a name="sagemaker-hyperpod-studio-tabs-tasks"></a>

Amazon SageMaker HyperPod provides a view of your cluster tasks. Tasks are operations or jobs that are sent to the cluster. These can be machine learning operations, like training, running experiments, or inference. The following section provides information on your HyperPod cluster tasks.

In Amazon SageMaker Studio, you can navigate to one of your clusters in **HyperPod clusters** (under **Compute**) and view the **Tasks** information on your cluster. If you are having any issues with viewing tasks, see [Troubleshooting](sagemaker-hyperpod-studio-troubleshoot.md).

The task table includes:

------
#### [ For Slurm clusters ]

For Slurm clusters, the tasks currently in the Slurm job scheduler queue are shown in the table. The information shown for each task includes the task name, status, job ID, partition, run time, nodes, created by, and actions.

For a list and details about past jobs, use the [https://slurm.schedmd.com/sacct.html](https://slurm.schedmd.com/sacct.html) command in JupyterLab or a Code Editor terminal. The `sacct` command is used to view *historical information* about jobs that have *finished* or are *complete* in the system. It provides accounting information, including job resources usage like memory and exit status. 

By default, all Studio users can view, manage, and interact with all available Slurm tasks. To restrict the viewable tasks to Studio users, see [Restrict task view in Studio for Slurm clusters](sagemaker-hyperpod-studio-setup-slurm.md#sagemaker-hyperpod-studio-setup-slurm-restrict-tasks-view).

------
#### [ For Amazon EKS clusters ]

For Amazon EKS clusters, kubeflow (PyTorch, MPI, TensorFlow) tasks are shown in the table. PyTorch tasks are shown by default. You can sort for PyTorch, MPI, and TensorFlow under **Task type**. The information that is shown for each task includes the task name, status, namespace, priority class, and creation time. 

By default, all users can view jobs across all namespaces. To restrict the viewable Kubernetes namespaces available to Studio users, see [Restrict task view in Studio for EKS clusters](sagemaker-hyperpod-studio-setup-eks.md#sagemaker-hyperpod-studio-setup-eks-restrict-tasks-view). If a user cannot view the tasks and is asked to provide a namespace, they need to get that information from the administrator. 

------

## Metrics
<a name="sagemaker-hyperpod-studio-tabs-metrics"></a>

Amazon SageMaker HyperPod provides a view of your Slurm or Amazon EKS cluster utilization metrics. The following provides information on your HyperPod cluster metrics. 

You will need to install the Amazon EKS add-on to view the following metrics. For more information, see [Install the Amazon CloudWatch Observability EKS add-on](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-addon.html).

In Amazon SageMaker Studio, you can navigate to one of your clusters in **HyperPod clusters** (under **Compute**) and view the **Metrics** details on your cluster. Metrics provides a comprehensive view of cluster utilization metrics, including hardware, team, and task metrics. This includes compute availability and usage, team allocation and utilization, and task run and wait time information. 

## Settings
<a name="sagemaker-hyperpod-studio-tabs-settings"></a>

Amazon SageMaker HyperPod provides a view of your cluster settings. The following provides information on your HyperPod cluster settings.

In Amazon SageMaker Studio you can navigate to one of your clusters in **HyperPod clusters** (under **Compute**) and view the **Settings** information on your cluster. The information includes the following:
+ **Instances** details, including instance ID, status, instance type, and instance group
+ **Instance groups** details, including instance group name, type, counts, and compute information
+ **Orchestration** details, including the orchestrator, version, and certification authority
+ **Cluster resiliency** details
+ **Security** details, including subnets and security groups

## Details
<a name="sagemaker-hyperpod-studio-tabs-details"></a>

Amazon SageMaker HyperPod provides a view of your cluster metadata details. The following paragraph provides information on how to get your HyperPod cluster details.

In Amazon SageMaker Studio, you can navigate to one of your clusters in **HyperPod clusters** (under **Compute**) and view the **Details** on your cluster. This includes the tags, logs, and metadata.

# Connecting to HyperPod clusters and submitting tasks to clusters
<a name="sagemaker-hyperpod-studio-open"></a>

You can launch machine learning workloads on HyperPod clusters within Amazon SageMaker Studio IDEs. When you launch Studio IDEs on a HyperPod cluster, a set of commands are available to help you get started. You can work on your training scripts, use Docker containers for the training scripts, and submit jobs to the cluster, all from within the Studio IDEs. The following section provides information on how to connect your cluster to Studio IDEs.

In Amazon SageMaker Studio you can navigate to one of your clusters in **HyperPod clusters** (under **Compute**) and view your list of clusters. You can connect your cluster to an IDE listed under **Actions**. 

You can also choose your custom file system from the list of options. For information on how to get this set up, see [Setting up HyperPod in Studio](sagemaker-hyperpod-studio-setup.md).

Alternatively, you can create a space and launch an IDE using the AWS CLI. Use the following commands to do so. The following example creates a `Private` `JupyterLab` space for `user-profile-name` with the `fs-id` FSx for Lustre file system attached.

1. Create a space using the [https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/create-space.html](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/create-space.html) AWS CLI.

   ```
   aws sagemaker create-space \
   --region your-region \
   --ownership-settings "OwnerUserProfileName=user-profile-name" \
   --space-sharing-settings "SharingType=Private" \
   --space-settings "AppType=JupyterLab,CustomFileSystems=[{FSxLustreFileSystem={FileSystemId=fs-id}}]"
   ```

1. Create the app using the [https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/create-app.html](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/create-app.html) AWS CLI.

   ```
   aws sagemaker create-app \
   --region your-region \
   --space-name space-name \
   --resource-spec '{"ec2InstanceType":"'"instance-type"'","appEnvironmentArn":"'"image-arn"'"}'
   ```

Once you have your applications open, you can submit tasks directly to the clusters you are connected to. 

# Troubleshooting
<a name="sagemaker-hyperpod-studio-troubleshoot"></a>

The following section lists troubleshooting solutions for HyperPod in Studio.

**Topics**
+ [

## Tasks tab
](#sagemaker-hyperpod-studio-troubleshoot-tasks)
+ [

## Metrics tab
](#sagemaker-hyperpod-studio-troubleshoot-metrics)

## Tasks tab
<a name="sagemaker-hyperpod-studio-troubleshoot-tasks"></a>

If you get Custom Resource Definition (CRD) is not configured on the cluster while in the **Tasks** tab.
+ Grant `EKSAdminViewPolicy` and `ClusterAccessRole` policies to your domain execution role. 

  For information on how to add tags to your execution role, see [Tag IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags_roles.html).

  To learn how to attach policies to an IAM user or group, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

If the tasks grid for Slurm metrics doesn’t stop loading in the **Tasks** tab.
+ Ensure that `RunAs` enabled in your [AWS Session Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager.html) preferences and the role you are using has the `SSMSessionRunAs` tag attached. 
  + To enable `RunAs`, navigate to the **Preference** tab in the [Systems Manager console](https://console.aws.amazon.com/systems-manager/session-manager). 
  +  [Turn on Run As support for Linux and macOS managed nodes](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-preferences-run-as.html) 

For restricted task view in Studio for EKS clusters:
+ If your execution role doesn’t have permissions to list namespaces for EKS clusters.
  + See [Restrict task view in Studio for EKS clusters](sagemaker-hyperpod-studio-setup-eks.md#sagemaker-hyperpod-studio-setup-eks-restrict-tasks-view).
+ If users are experiencing issues with access for EKS clusters.

  1. Verify RBAC is enabled by running the following AWS CLI command.

     ```
     kubectl api-versions | grep rbac
     ```

     This should return rbac.authorization.k8s.io/v1.

  1. Check if the `ClusterRole` and `ClusterRoleBinding` exist by running the following commands.

     ```
     kubectl get clusterrole pods-events-crd-cluster-role
     kubectl get clusterrolebinding pods-events-crd-cluster-role-binding
     ```

  1. Verify user group membership. Ensure the user is correctly assigned to the `pods-events-crd-cluster-level` group in your identity provider or IAM.
+ If user can't see any resources.
  + Verify group membership and ensure the `ClusterRoleBinding` is correctly applied.
+ If users can see resources in all namespaces.
  + If namespace restriction is required, consider using `Role` and `RoleBinding` instead of `ClusterRole` and `ClusterRoleBinding`.
+ If configuration appears correct, but permissions aren't applied.
  + Check if there are any `NetworkPolicies` or `PodSecurityPolicies` interfering with access.

## Metrics tab
<a name="sagemaker-hyperpod-studio-troubleshoot-metrics"></a>

If there are no Amazon CloudWatch metrics are displayed in the **Metrics** tab.
+ The `Metrics` section of HyperPod cluster details uses CloudWatch to fetch the data. In order to see the metrics in this section, you need to have enabled [Cluster and task observability](sagemaker-hyperpod-eks-cluster-observability-cluster.md). Contact your administrator to configure metrics.