Create a SageMaker HyperPod cluster on training plans using the SageMaker API, or AWS CLI
To use SageMaker training plans for your Amazon SageMaker HyperPod cluster, specify the ARN of the training
plan you want to use in the TrainingPlanArn parameter of the ClusterInstanceGroupSpecification when calling the CreateCluster API operation.
Ensure that the subnet associated with the designated AZ of your plan is included in the
VPCConfig of your cluster configuration. You can retrieve the
AvailabilityZone of a training plan in the response of a DescribeTrainingPlan API call.
The following sample illustrates how to create a new SageMaker HyperPod cluster and provide an
instance group with a training plan in the --instance-groups attribute of the
create-cluster AWS CLI command.
# Create a cluster aws sagemaker create-cluster \ --cluster-namecluster-name\ --instance-groups '[ \ { \ "InstanceCount":1,\ "InstanceGroupName": "controller-nodes",\ "InstanceType": "ml.t3.xlarge",\ "LifeCycleConfig": {"SourceS3Uri":source_s3_uri, "OnCreate": "on_create.sh"},\ "ExecutionRole": "arn:aws:iam::customer_account_id:role/execution_role",\ "ThreadsPerCore":1,\ },\ { \ "InstanceCount":2, \ "InstanceGroupName": "worker-nodes",\ "InstanceType": "p4d.24xlarge",\ "LifeCycleConfig": {"SourceS3Uri":source_s3_uri, "OnCreate": "on_create.sh"},\ "ExecutionRole": "arn:aws:iam::customer_account_id}:role/execution_role}",\ "ThreadsPerCore":1,\ "TrainingPlanArn":training_plan_arn,\ }]'
For information about how to create an HyperPod cluster using the AWS CLI, see
create-cluster.
After creating the cluster, you can verify that your instance group was properly assigned
capacity from the training plan by calling the DescribeCluster API.
aws sagemaker describe-cluster --cluster-namecluster-name