

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 使用 SageMaker API 在训练计划上创建 SageMaker HyperPod 集群，或者 AWS CLI
<a name="use-training-plan-for-hyperpod-creation-using-api-cli-sdk"></a>

要对您的 Amazon SageMaker HyperPod 集群使用 SageMaker 训练计划，请在调用 [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCluster.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCluster.html)API 操作[https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html)时在的[https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html#sagemaker-Type-ClusterInstanceGroupSpecification-TrainingPlanArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html#sagemaker-Type-ClusterInstanceGroupSpecification-TrainingPlanArn)参数中指定要使用的训练计划的 ARN。

确保与计划的指定可用区关联的子网包含在集群配置的 `VPCConfig` 中。您可以在 [``DescribeTrainingPlan](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingPlan.html)API 调用的响应中检索训练计划。`AvailabilityZone`

以下示例说明了如何创建新 SageMaker HyperPod 集群并在`create-cluster` AWS CLI 命令的`--instance-groups`属性中为实例组提供训练计划。

```
# Create a cluster         
aws sagemaker create-cluster \
  --cluster-name {{cluster-name}} \
  --instance-groups '[ \
        { \
            "InstanceCount": {{1}},\
            "InstanceGroupName": "{{controller-nodes}}",\
            "InstanceType": "{{ml.t3.xlarge}}",\
            "LifeCycleConfig": {"SourceS3Uri": {{source_s3_uri}}, "OnCreate": "on_create.sh"},\
            "ExecutionRole": "arn:aws:iam::{{customer_account_id}}:role/{{execution_role}}",\
            "ThreadsPerCore": {{1}},\
        },\
        { \
            "InstanceCount": {{2}}, \
            "InstanceGroupName": "{{worker-nodes}}",\
            "InstanceType": "{{p4d.24xlarge}}",\
            "LifeCycleConfig": {"SourceS3Uri": {{source_s3_uri}}, "OnCreate": "on_create.sh"},\
            "ExecutionRole": "arn:aws:iam::{{customer_account_id}}}:role/{{execution_role}}}",\
            "ThreadsPerCore": {{1}},\
            "TrainingPlanArn": {{training_plan_arn}},\
        }]'
```

有关如何使用创建 HyperPod 集群的信息 AWS CLI，请参阅[https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-cluster.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-cluster.html)。

创建集群后，您可以通过调用 `DescribeCluster` API 来验证实例组是否已正确分配训练计划中的容量。

```
aws sagemaker describe-cluster --cluster-name {{cluster-name}}
```