SageMaker / Client / start_cluster_health_check
start_cluster_health_check¶
- SageMaker.Client.start_cluster_health_check(**kwargs)¶
Start deep health checks for a SageMaker HyperPod cluster. You can use DescribeClusterNode API to track progress of the deep health checks. The unhealthy nodes will be automatically rebooted or replaced. Please see Resilience-related Kubernetes labels by SageMaker HyperPod for details.
See also: AWS API Documentation
Request Syntax
response = client.start_cluster_health_check( ClusterName='string', DeepHealthCheckConfigurations=[ { 'InstanceGroupName': 'string', 'InstanceIds': [ 'string', ], 'DeepHealthChecks': [ 'InstanceStress'|'InstanceConnectivity', ] }, ] )
- Parameters:
ClusterName (string) –
[REQUIRED]
The string name or the Amazon Resource Name (ARN) of the SageMaker HyperPod cluster.
DeepHealthCheckConfigurations (list) –
[REQUIRED]
A list of configurations containing instance group names, EC2 instance IDs, and deep health checks to perform.
(dict) –
The configuration of deep health checks for an instance group.
Note
Overlapping deep health check configurations will be merged into a single operation.
InstanceGroupName (string) – [REQUIRED]
The name of the instance group.
InstanceIds (list) –
A list of Amazon Elastic Compute Cloud (EC2) instance IDs on which to perform deep health checks.
Note
Leave this field blank to perform deep health checks on the entire instance group.
(string) –
DeepHealthChecks (list) – [REQUIRED]
A list of deep health checks to be performed.
(string) –
- Return type:
dict
- Returns:
Response Syntax
{ 'ClusterArn': 'string' }
Response Structure
(dict) –
ClusterArn (string) –
The Amazon Resource Name (ARN) of the SageMaker HyperPod cluster on which the deep health checks were initiated.
Exceptions
SageMaker.Client.exceptions.ResourceNotFound