Class: Aws::SageMaker::Types::ClusterTieredStorageConfig

Inherits:
Struct
  • Object
show all
Defined in:
gems/aws-sdk-sagemaker/lib/aws-sdk-sagemaker/types.rb

Overview

Defines the configuration for managed tier checkpointing in a HyperPod cluster. Managed tier checkpointing uses multiple storage tiers, including cluster CPU memory, to provide faster checkpoint operations and improved fault tolerance for large-scale model training. The system automatically saves checkpoints at high frequency to memory and periodically persists them to durable storage, like Amazon S3.

Constant Summary collapse

SENSITIVE =
[]

Instance Attribute Summary collapse

Instance Attribute Details

#instance_memory_allocation_percentageInteger

The percentage (int) of cluster memory to allocate for checkpointing.

Returns:

  • (Integer)


5874
5875
5876
5877
5878
5879
# File 'gems/aws-sdk-sagemaker/lib/aws-sdk-sagemaker/types.rb', line 5874

class ClusterTieredStorageConfig < Struct.new(
  :mode,
  :instance_memory_allocation_percentage)
  SENSITIVE = []
  include Aws::Structure
end

#modeString

Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to Enable, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set to Disable, the feature is turned off and the memory management daemon is removed from the cluster.

Returns:

  • (String)


5874
5875
5876
5877
5878
5879
# File 'gems/aws-sdk-sagemaker/lib/aws-sdk-sagemaker/types.rb', line 5874

class ClusterTieredStorageConfig < Struct.new(
  :mode,
  :instance_memory_allocation_percentage)
  SENSITIVE = []
  include Aws::Structure
end