AWS SDK Version 3 for .NET
API Reference

AWS services or capabilities described in AWS Documentation may vary by region/location. Click Getting Started with Amazon AWS to see specific differences applicable to the China (Beijing) Region.

Replaces specific nodes within a SageMaker HyperPod cluster with new hardware. BatchReplaceClusterNodes terminates the specified instances and provisions new replacement instances with the same configuration but fresh hardware. The Amazon Machine Image (AMI) and instance configuration remain the same.

This operation is useful for recovering from hardware failures or persistent issues that cannot be resolved through a reboot.

  • Data Loss Warning: Replacing nodes destroys all instance volumes, including both root and secondary volumes. All data stored on these volumes will be permanently lost and cannot be recovered.

  • To safeguard your work, back up your data to Amazon S3 or an FSx for Lustre file system before invoking the API on a worker node group. This will help prevent any potential data loss from the instance root volume. For more information about backup, see Use the backup script provided by SageMaker HyperPod.

  • If you want to invoke this API on an existing cluster, you'll first need to patch the cluster by running the UpdateClusterSoftware API. For more information about patching a cluster, see Update the SageMaker HyperPod platform software of a cluster.

  • You can replace up to 25 nodes in a single request.

Note:

This is an asynchronous operation using the standard naming convention for .NET 4.5 or higher. For .NET 3.5 the operation is implemented as a pair of methods using the standard naming convention of BeginBatchReplaceClusterNodes and EndBatchReplaceClusterNodes.

Namespace: Amazon.SageMaker
Assembly: AWSSDK.SageMaker.dll
Version: 3.x.y.z

Syntax

C#
public abstract Task<BatchReplaceClusterNodesResponse> BatchReplaceClusterNodesAsync(
         BatchReplaceClusterNodesRequest request,
         CancellationToken cancellationToken
)

Parameters

request
Type: Amazon.SageMaker.Model.BatchReplaceClusterNodesRequest

Container for the necessary parameters to execute the BatchReplaceClusterNodes service method.

cancellationToken
Type: System.Threading.CancellationToken

A cancellation token that can be used by other objects or threads to receive notice of cancellation.

Return Value


The response from the BatchReplaceClusterNodes service method, as returned by SageMaker.

Exceptions

ExceptionCondition
ResourceNotFoundException Resource being access is not found.

Version Information

.NET:
Supported in: 8.0 and newer, Core 3.1

.NET Standard:
Supported in: 2.0

.NET Framework:
Supported in: 4.5 and newer

See Also