IAmazonSageMaker.BatchRebootClusterNodes Method (BatchRebootClusterNodesRequest)

Reboots specific nodes within a SageMaker HyperPod cluster using a soft recovery mechanism. BatchRebootClusterNodes performs a graceful reboot of the specified nodes by calling the Amazon Elastic Compute Cloud RebootInstances API, which attempts to cleanly shut down the operating system before restarting the instance.

This operation is useful for recovering from transient issues or applying certain configuration changes that require a restart.

Rebooting a node may cause temporary service interruption for workloads running on that node. Ensure your workloads can handle node restarts or use appropriate scheduling to minimize impact.
You can reboot up to 25 nodes in a single request.
For SageMaker HyperPod clusters using the Slurm workload manager, ensure rebooting nodes will not disrupt critical cluster operations.

Note:

For .NET Core this operation is only available in asynchronous form. Please refer to BatchRebootClusterNodesAsync.

Namespace: Amazon.SageMaker
Assembly: AWSSDK.SageMaker.dll
Version: 3.x.y.z

Syntax

public abstract BatchRebootClusterNodesResponse BatchRebootClusterNodes(
         BatchRebootClusterNodesRequest request
)

Parameters

request: Type: Amazon.SageMaker.Model.BatchRebootClusterNodesRequest

Container for the necessary parameters to execute the BatchRebootClusterNodes service method.

Return Value

Type: BatchRebootClusterNodesResponse

The response from the BatchRebootClusterNodes service method, as returned by SageMaker.

Exceptions

Exception	Condition
ResourceNotFoundException	Resource being access is not found.

Version Information

.NET Framework:
Supported in: 4.5 and newer, 3.5

IAmazonSageMaker.BatchRebootClusterNodes (BatchRebootClusterNodesRequest)

Method

Syntax

Parameters

Return Value

Exceptions

Version Information

See Also