AWS::SageMaker::Cluster TieredStorageConfig - AWS CloudFormation

This is the new CloudFormation Template Reference Guide. Please update your bookmarks and links. For help getting started with CloudFormation, see the AWS CloudFormation User Guide.

AWS::SageMaker::Cluster TieredStorageConfig

Defines the configuration for managed tier checkpointing in a HyperPod cluster. Managed tier checkpointing uses multiple storage tiers, including cluster CPU memory, to provide faster checkpoint operations and improved fault tolerance for large-scale model training. The system automatically saves checkpoints at high frequency to memory and periodically persists them to durable storage, like Amazon S3.

Syntax

To declare this entity in your CloudFormation template, use the following syntax:

JSON

{ "InstanceMemoryAllocationPercentage" : Integer, "Mode" : String }

Properties

InstanceMemoryAllocationPercentage

The percentage (int) of cluster memory to allocate for checkpointing.

Required: No

Type: Integer

Minimum: 0

Maximum: 100

Update requires: No interruption

Mode

Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to Enable, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set to Disable, the feature is turned off and the memory management daemon is removed from the cluster.

Required: Yes

Type: String

Allowed values: Enable | Disable

Update requires: No interruption