Creating SageMaker HyperPod clusters using AWS CloudFormation templates - Amazon SageMaker AI

Creating SageMaker HyperPod clusters using AWS CloudFormation templates

You can create SageMaker HyperPod clusters using the CloudFormation templates for HyperPod. You must install AWS CLI to proceed.

Configure resources in the console and deploy using CloudFormation

You can configure resources using the AWS Management Console and deploy using the CloudFormation templates.

Follow these steps.

  1. Follow instructions in Creating a SageMaker HyperPod cluster with Amazon EKS orchestration to configure your AWS resources that you will need to create your cluster.

  2. At the end of the Create cluster page, choose Download CloudFormation template parameters. This will open the Using the configuration file to create the cluster using the AWS CLI window on the right of the page.

  3. On the Using the configuration file to create the cluster using the AWS CLI window, choose Download configuration parameters file. The file will be downloaded to your machine. You can edit the configuration JSON file based on your needs or leave it as is if no change is required.

  4. Run the create-stack AWS CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.

    aws cloudformation create-stack --stack-name my-stack --template-url https://aws-sagemaker-hyperpod-cluster-setup.amazonaws.com/templates-slurm/main-stack-slurm-based-template.yaml --parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM
  5. To view the status of the resources provisioning, navigate to the CloudFormation console.

    After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.

  6. After the status of the cluster turns to InService, you can start logging into the cluster nodes. To access the cluster nodes and start running ML workloads, see Jobs on SageMaker HyperPod clusters.

Configure and deploy resources using CloudFormation

You can configure and deploy resources using the CloudFormation templates for SageMaker HyperPod.

Follow these steps.

  1. Download a CloudFormation template for SageMaker HyperPod from the sagemaker-hyperpod-cluster-setup GitHub repository.

  2. Run the create-stack AWS CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.

    aws cloudformation create-stack --stack-name my-stack --template-url URL_of_the_file_that_contains_the_template_body --parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM
  3. To view the status of the resources provisioning, navigate to the CloudFormation console.

    After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.

  4. After the status of the cluster turns to InService, you can start logging into the cluster nodes.