BatchAddClusterNodes
Adds nodes to a HyperPod cluster by incrementing the target count for one or more instance groups.
This operation returns a unique NodeLogicalId
for each node being added, which can be used
to track the provisioning status of the node. This API provides a safer alternative to UpdateCluster
for scaling operations by avoiding unintended configuration changes.
Note
This API is only supported for clusters using Continuous
as the NodeProvisioningMode
.
Request Syntax
{
"ClientToken": "string
",
"ClusterName": "string
",
"NodesToAdd": [
{
"IncrementTargetCountBy": number
,
"InstanceGroupName": "string
"
}
]
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- ClientToken
-
A unique, case-sensitive identifier that you provide to ensure the idempotency of the request. This token is valid for 8 hours. If you retry the request with the same client token within this timeframe and the same parameters, the API returns the same set of
NodeLogicalIds
with their latest status.Type: String
Length Constraints: Minimum length of 0. Maximum length of 64.
Pattern:
[\x21-\x7E]+
Required: No
- ClusterName
-
The name of the HyperPod cluster to which you want to add nodes.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 256.
Pattern:
(arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:cluster/[a-z0-9]{12})|([a-zA-Z0-9](-*[a-zA-Z0-9]){0,62})
Required: Yes
- NodesToAdd
-
A list of instance groups and the number of nodes to add to each. You can specify up to 5 instance groups in a single request, with a maximum of 50 nodes total across all instance groups.
Type: Array of AddClusterNodeSpecification objects
Array Members: Minimum number of 1 item. Maximum number of 5 items.
Required: Yes
Response Syntax
{
"Failed": [
{
"ErrorCode": "string",
"FailedCount": number,
"InstanceGroupName": "string",
"Message": "string"
}
],
"Successful": [
{
"InstanceGroupName": "string",
"NodeLogicalId": "string",
"Status": "string"
}
]
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- Failed
-
A list of errors that occurred during the node addition operation. Each entry includes the instance group name, error code, number of failed additions, and an error message.
Type: Array of BatchAddClusterNodesError objects
- Successful
-
A list of
NodeLogicalIDs
that were successfully added to the cluster. TheNodeLogicalID
is unique per cluster and does not change between instance replacements. Each entry includes aNodeLogicalId
that can be used to track the node's provisioning status (withDescribeClusterNode
), the instance group name, and the current status of the node.Type: Array of NodeAdditionResult objects
Errors
For information about the errors that are common to all actions, see Common Errors.
- ResourceLimitExceeded
-
You have exceeded an SageMaker resource limit. For example, you might have too many training jobs created.
HTTP Status Code: 400
- ResourceNotFound
-
Resource being access is not found.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: