ClusterInstanceGroupDetails
Details of an instance group in a SageMaker HyperPod cluster.
Contents
- ActiveOperations
-
A map indicating active operations currently in progress for the instance group of a SageMaker HyperPod cluster. When there is a scaling operation in progress, this map contains a key
Scalingwith value 1.Type: String to integer map
Valid Keys:
ScalingValid Range: Minimum value of 1.
Required: No
- ActiveSoftwareUpdateConfig
-
The configuration to use when updating the AMI versions.
Type: DeploymentConfiguration object
Required: No
- CapacityRequirements
-
The instance capacity requirements for the instance group.
Type: ClusterCapacityRequirements object
Required: No
- CurrentCount
-
The number of instances that are currently in the instance group of a SageMaker HyperPod cluster.
Type: Integer
Valid Range: Minimum value of 0.
Required: No
- CurrentImageId
-
The ID of the Amazon Machine Image (AMI) currently in use by the instance group.
Type: String
Length Constraints: Minimum length of 7. Maximum length of 21.
Pattern:
ami-[0-9a-fA-F]{8,17}|defaultRequired: No
- DesiredImageId
-
The ID of the Amazon Machine Image (AMI) desired for the instance group.
Type: String
Length Constraints: Minimum length of 7. Maximum length of 21.
Pattern:
ami-[0-9a-fA-F]{8,17}|defaultRequired: No
- ExecutionRole
-
The execution role for the instance group to assume.
Type: String
Length Constraints: Minimum length of 20. Maximum length of 2048.
Pattern:
arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+Required: No
- InstanceGroupName
-
The name of the instance group of a SageMaker HyperPod cluster.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 63.
Pattern:
[a-zA-Z0-9](-*[a-zA-Z0-9])*Required: No
- InstanceStorageConfigs
-
The additional storage configurations for the instances in the SageMaker HyperPod cluster instance group.
Type: Array of ClusterInstanceStorageConfig objects
Array Members: Minimum number of 0 items. Maximum number of 2 items.
Required: No
- InstanceType
-
The instance type of the instance group of a SageMaker HyperPod cluster.
Type: String
Valid Values:
ml.p4d.24xlarge | ml.p4de.24xlarge | ml.p5.48xlarge | ml.p5.4xlarge | ml.p6e-gb200.36xlarge | ml.trn1.32xlarge | ml.trn1n.32xlarge | ml.g5.xlarge | ml.g5.2xlarge | ml.g5.4xlarge | ml.g5.8xlarge | ml.g5.12xlarge | ml.g5.16xlarge | ml.g5.24xlarge | ml.g5.48xlarge | ml.c5.large | ml.c5.xlarge | ml.c5.2xlarge | ml.c5.4xlarge | ml.c5.9xlarge | ml.c5.12xlarge | ml.c5.18xlarge | ml.c5.24xlarge | ml.c5n.large | ml.c5n.2xlarge | ml.c5n.4xlarge | ml.c5n.9xlarge | ml.c5n.18xlarge | ml.m5.large | ml.m5.xlarge | ml.m5.2xlarge | ml.m5.4xlarge | ml.m5.8xlarge | ml.m5.12xlarge | ml.m5.16xlarge | ml.m5.24xlarge | ml.t3.medium | ml.t3.large | ml.t3.xlarge | ml.t3.2xlarge | ml.g6.xlarge | ml.g6.2xlarge | ml.g6.4xlarge | ml.g6.8xlarge | ml.g6.16xlarge | ml.g6.12xlarge | ml.g6.24xlarge | ml.g6.48xlarge | ml.gr6.4xlarge | ml.gr6.8xlarge | ml.g6e.xlarge | ml.g6e.2xlarge | ml.g6e.4xlarge | ml.g6e.8xlarge | ml.g6e.16xlarge | ml.g6e.12xlarge | ml.g6e.24xlarge | ml.g6e.48xlarge | ml.p5e.48xlarge | ml.p5en.48xlarge | ml.p6-b200.48xlarge | ml.trn2.3xlarge | ml.trn2.48xlarge | ml.c6i.large | ml.c6i.xlarge | ml.c6i.2xlarge | ml.c6i.4xlarge | ml.c6i.8xlarge | ml.c6i.12xlarge | ml.c6i.16xlarge | ml.c6i.24xlarge | ml.c6i.32xlarge | ml.m6i.large | ml.m6i.xlarge | ml.m6i.2xlarge | ml.m6i.4xlarge | ml.m6i.8xlarge | ml.m6i.12xlarge | ml.m6i.16xlarge | ml.m6i.24xlarge | ml.m6i.32xlarge | ml.r6i.large | ml.r6i.xlarge | ml.r6i.2xlarge | ml.r6i.4xlarge | ml.r6i.8xlarge | ml.r6i.12xlarge | ml.r6i.16xlarge | ml.r6i.24xlarge | ml.r6i.32xlarge | ml.i3en.large | ml.i3en.xlarge | ml.i3en.2xlarge | ml.i3en.3xlarge | ml.i3en.6xlarge | ml.i3en.12xlarge | ml.i3en.24xlarge | ml.m7i.large | ml.m7i.xlarge | ml.m7i.2xlarge | ml.m7i.4xlarge | ml.m7i.8xlarge | ml.m7i.12xlarge | ml.m7i.16xlarge | ml.m7i.24xlarge | ml.m7i.48xlarge | ml.r7i.large | ml.r7i.xlarge | ml.r7i.2xlarge | ml.r7i.4xlarge | ml.r7i.8xlarge | ml.r7i.12xlarge | ml.r7i.16xlarge | ml.r7i.24xlarge | ml.r7i.48xlargeRequired: No
- KubernetesConfig
-
The Kubernetes configuration for the instance group that contains labels and taints to be applied for the nodes in this instance group.
Type: ClusterKubernetesConfigDetails object
Required: No
- LifeCycleConfig
-
Details of LifeCycle configuration for the instance group.
Type: ClusterLifeCycleConfig object
Required: No
- MinCount
-
The minimum number of instances that must be available in the instance group of a SageMaker HyperPod cluster before it transitions to
InServicestatus.Type: Integer
Valid Range: Minimum value of 0. Maximum value of 6758.
Required: No
- OnStartDeepHealthChecks
-
A flag indicating whether deep health checks should be performed when the cluster instance group is created or updated.
Type: Array of strings
Array Members: Minimum number of 1 item. Maximum number of 2 items.
Valid Values:
InstanceStress | InstanceConnectivityRequired: No
- OverrideVpcConfig
-
The customized Amazon VPC configuration at the instance group level that overrides the default Amazon VPC configuration of the SageMaker HyperPod cluster.
Type: VpcConfig object
Required: No
- ScheduledUpdateConfig
-
The configuration object of the schedule that SageMaker follows when updating the AMI.
Type: ScheduledUpdateConfig object
Required: No
- SoftwareUpdateStatus
-
Status of the last software udpate request.
Status transitions follow these possible sequences:
-
Pending -> InProgress -> Succeeded
-
Pending -> InProgress -> RollbackInProgress -> RollbackComplete
-
Pending -> InProgress -> RollbackInProgress -> Failed
Type: String
Valid Values:
Pending | InProgress | Succeeded | Failed | RollbackInProgress | RollbackCompleteRequired: No
-
- Status
-
The current status of the cluster instance group.
-
InService: The instance group is active and healthy. -
Creating: The instance group is being provisioned. -
Updating: The instance group is being updated. -
Failed: The instance group has failed to provision or is no longer healthy. -
Degraded: The instance group is degraded, meaning that some instances have failed to provision or are no longer healthy. -
Deleting: The instance group is being deleted.
Type: String
Valid Values:
InService | Creating | Updating | Failed | Degraded | SystemUpdating | DeletingRequired: No
-
- TargetCount
-
The number of instances you specified to add to the instance group of a SageMaker HyperPod cluster.
Type: Integer
Valid Range: Minimum value of 0. Maximum value of 6758.
Required: No
- TargetStateCount
-
Represents the number of running nodes using the desired Image ID.
-
During software update operations: This count shows the number of nodes running on the desired Image ID. If a rollback occurs, the current image ID and desired image ID (both included in the describe cluster response) swap values. The TargetStateCount then shows the number of nodes running on the newly designated desired image ID (which was previously the current image ID).
-
During simultaneous scaling and software update operations: This count shows the number of instances running on the desired image ID, including any new instances created as part of the scaling request. New nodes are always created using the desired image ID, so TargetStateCount reflects the total count of nodes running on the desired image ID, even during rollback scenarios.
Type: Integer
Valid Range: Minimum value of 0. Maximum value of 6758.
Required: No
-
- ThreadsPerCore
-
The number you specified to
TreadsPerCoreinCreateClusterfor enabling or disabling multithreading. For instance types that support multithreading, you can specify 1 for disabling multithreading and 2 for enabling multithreading. For more information, see the reference table of CPU cores and threads per CPU core per instance type in the Amazon Elastic Compute Cloud User Guide.Type: Integer
Valid Range: Minimum value of 1. Maximum value of 2.
Required: No
- TrainingPlanArn
-
The Amazon Resource Name (ARN); of the training plan associated with this cluster instance group.
For more information about how to reserve GPU capacity for your SageMaker HyperPod clusters using Amazon SageMaker Training Plan, see
CreateTrainingPlan.Type: String
Length Constraints: Minimum length of 50. Maximum length of 2048.
Pattern:
arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:training-plan/.*Required: No
- TrainingPlanStatus
-
The current status of the training plan associated with this cluster instance group.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 63.
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: