/AWS1/CL_SGMCLUSTTIEREDSTRGCFG¶
Defines the configuration for managed tier checkpointing in a HyperPod cluster. Managed tier checkpointing uses multiple storage tiers, including cluster CPU memory, to provide faster checkpoint operations and improved fault tolerance for large-scale model training. The system automatically saves checkpoints at high frequency to memory and periodically persists them to durable storage, like Amazon S3.
CONSTRUCTOR¶
IMPORTING¶
Required arguments:¶
iv_mode TYPE /AWS1/SGMCLUSTERCONFIGMODE /AWS1/SGMCLUSTERCONFIGMODE¶
Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to
Enable, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set toDisable, the feature is turned off and the memory management daemon is removed from the cluster.
Optional arguments:¶
iv_instmemoryallocpercentage TYPE /AWS1/SGMCLSTINSTMEMALLOCPER00 /AWS1/SGMCLSTINSTMEMALLOCPER00¶
The percentage (int) of cluster memory to allocate for checkpointing.
Queryable Attributes¶
Mode¶
Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to
Enable, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set toDisable, the feature is turned off and the memory management daemon is removed from the cluster.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_MODE() |
Getter for MODE, with configurable default |
ASK_MODE() |
Getter for MODE w/ exceptions if field has no value |
HAS_MODE() |
Determine if MODE has a value |
InstanceMemoryAllocationPercentage¶
The percentage (int) of cluster memory to allocate for checkpointing.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_INSTMEMORYALLOCPERCAGE() |
Getter for INSTMEMORYALLOCPERCENTAGE, with configurable defa |
ASK_INSTMEMORYALLOCPERCAGE() |
Getter for INSTMEMORYALLOCPERCENTAGE w/ exceptions if field |
HAS_INSTMEMORYALLOCPERCAGE() |
Determine if INSTMEMORYALLOCPERCENTAGE has a value |