This is the new CloudFormation Template Reference Guide. Please update your bookmarks and links. For help getting started with CloudFormation, see the AWS CloudFormation User Guide.
AWS::SageMaker::Cluster
Creates an Amazon SageMaker HyperPod cluster. SageMaker HyperPod is a capability of SageMaker for creating and managing persistent clusters for developing large machine learning models, such as large language models (LLMs) and diffusion models. To learn more, see Amazon SageMaker HyperPod in the Amazon SageMaker Developer Guide.
Syntax
To declare this entity in your CloudFormation template, use the following syntax:
JSON
{ "Type" : "AWS::SageMaker::Cluster", "Properties" : { "AutoScaling" :ClusterAutoScalingConfig, "ClusterName" :String, "ClusterRole" :String, "InstanceGroups" :[ ClusterInstanceGroup, ... ], "NodeProvisioningMode" :String, "NodeRecovery" :String, "Orchestrator" :Orchestrator, "RestrictedInstanceGroups" :[ ClusterRestrictedInstanceGroup, ... ], "Tags" :[ Tag, ... ], "TieredStorageConfig" :TieredStorageConfig, "VpcConfig" :VpcConfig} }
YAML
Type: AWS::SageMaker::Cluster Properties: AutoScaling:ClusterAutoScalingConfigClusterName:StringClusterRole:StringInstanceGroups:- ClusterInstanceGroupNodeProvisioningMode:StringNodeRecovery:StringOrchestrator:OrchestratorRestrictedInstanceGroups:- ClusterRestrictedInstanceGroupTags:- TagTieredStorageConfig:TieredStorageConfigVpcConfig:VpcConfig
Properties
AutoScaling-
The autoscaling configuration for the cluster. Enables automatic scaling of cluster nodes based on workload demand using a Karpenter-based system.
Required: No
Type: ClusterAutoScalingConfig
Update requires: No interruption
ClusterName-
The name of the SageMaker HyperPod cluster.
Required: No
Type: String
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}$Minimum:
1Maximum:
63Update requires: Replacement
ClusterRole-
The Amazon Resource Name (ARN) of the IAM role that HyperPod assumes to perform cluster autoscaling operations. This role must have permissions for
sagemaker:BatchAddClusterNodesandsagemaker:BatchDeleteClusterNodes. This is only required when autoscaling is enabled and when HyperPod is performing autoscaling operations.Required: No
Type: String
Pattern:
^arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+$Minimum:
20Maximum:
2048Update requires: No interruption
InstanceGroups-
The instance groups of the SageMaker HyperPod cluster. To delete an instance group, remove it from the array.
Required: No
Type: Array of ClusterInstanceGroup
Minimum:
1Update requires: No interruption
NodeProvisioningMode-
The mode for provisioning nodes in the cluster. You can specify the following modes:
-
Continuous: Scaling behavior that enables 1) concurrent operation execution within instance groups, 2) continuous retry mechanisms for failed operations, 3) enhanced customer visibility into cluster events through detailed event streams, 4) partial provisioning capabilities. Your clusters and instance groups remain
InServicewhile scaling. This mode is only supported for EKS orchestrated clusters.
Required: No
Type: String
Allowed values:
ContinuousUpdate requires: No interruption
-
NodeRecovery-
Specifies whether to enable or disable the automatic node recovery feature of SageMaker HyperPod. Available values are
Automaticfor enabling andNonefor disabling.Required: No
Type: String
Allowed values:
Automatic | NoneUpdate requires: No interruption
Orchestrator-
The orchestrator type for the SageMaker HyperPod cluster. Currently,
'eks'is the only available option.Required: No
Type: Orchestrator
Update requires: No interruption
RestrictedInstanceGroups-
The specialized instance groups for training models like Amazon Nova to be created in the SageMaker HyperPod cluster.
Required: No
Type: Array of ClusterRestrictedInstanceGroup
Minimum:
1Update requires: No interruption
-
A tag object that consists of a key and an optional value, used to manage metadata for SageMaker AWS resources.
You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags.
For more information on adding metadata to your AWS resources with tagging, see Tagging AWS resources. For advice on best practices for managing AWS resources with tagging, see Tagging Best Practices: Implement an Effective AWS Resource Tagging Strategy
. Required: No
Type: Array of Tag
Maximum:
50Update requires: No interruption
TieredStorageConfig-
The configuration for managed tier checkpointing on the HyperPod cluster. When enabled, this feature uses a multi-tier storage approach for storing model checkpoints, providing faster checkpoint operations and improved fault tolerance across cluster nodes.
Required: No
Type: TieredStorageConfig
Update requires: No interruption
VpcConfig-
Specifies an Amazon Virtual Private Cloud (VPC) that your SageMaker jobs, hosted models, and compute resources have access to. You can control access to and from your resources by configuring a VPC. For more information, see Give SageMaker Access to Resources in your Amazon VPC.
Required: No
Type: VpcConfig
Update requires: Replacement
Return values
Ref
Fn::GetAtt
The Fn::GetAtt intrinsic function returns a value for a specified attribute of this type. The following are the available attributes and sample return values.
For more information about using the Fn::GetAtt intrinsic function, see Fn::GetAtt.
ClusterArn-
The Amazon Resource Name (ARN) of the SageMaker HyperPod cluster.
ClusterStatus-
The status of the SageMaker HyperPod cluster.
CreationTime-
The time when the SageMaker HyperPod cluster is created.
FailureMessage-
The failure message of the SageMaker HyperPod cluster.