Skip to content

/AWS1/IF_SGM=>BATCHREBOOTCLUSTERNODES()

About BatchRebootClusterNodes

Reboots specific nodes within a SageMaker HyperPod cluster using a soft recovery mechanism. BatchRebootClusterNodes performs a graceful reboot of the specified nodes by calling the Amazon Elastic Compute Cloud RebootInstances API, which attempts to cleanly shut down the operating system before restarting the instance.

This operation is useful for recovering from transient issues or applying certain configuration changes that require a restart.

  • Rebooting a node may cause temporary service interruption for workloads running on that node. Ensure your workloads can handle node restarts or use appropriate scheduling to minimize impact.

  • You can reboot up to 25 nodes in a single request.

  • For SageMaker HyperPod clusters using the Slurm workload manager, ensure rebooting nodes will not disrupt critical cluster operations.

Method Signature

METHODS /AWS1/IF_SGM~BATCHREBOOTCLUSTERNODES
  IMPORTING
    !IV_CLUSTERNAME TYPE /AWS1/SGMCLUSTERNAMEORARN OPTIONAL
    !IT_NODEIDS TYPE /AWS1/CL_SGMCLUSTERNODEIDS_W=>TT_CLUSTERNODEIDS OPTIONAL
    !IT_NODELOGICALIDS TYPE /AWS1/CL_SGMCLSTNODELOGICALI00=>TT_CLUSTERNODELOGICALIDLIST OPTIONAL
  RETURNING
    VALUE(OO_OUTPUT) TYPE REF TO /aws1/cl_sgmbtcrebootclstnod01
  RAISING
    /AWS1/CX_SGMRESOURCENOTFOUND
    /AWS1/CX_SGMCLIENTEXC
    /AWS1/CX_SGMSERVEREXC
    /AWS1/CX_RT_TECHNICAL_GENERIC
    /AWS1/CX_RT_SERVICE_GENERIC.

IMPORTING

Required arguments:

iv_clustername TYPE /AWS1/SGMCLUSTERNAMEORARN /AWS1/SGMCLUSTERNAMEORARN

The name or Amazon Resource Name (ARN) of the SageMaker HyperPod cluster containing the nodes to reboot.

Optional arguments:

it_nodeids TYPE /AWS1/CL_SGMCLUSTERNODEIDS_W=>TT_CLUSTERNODEIDS TT_CLUSTERNODEIDS

A list of EC2 instance IDs to reboot using soft recovery. You can specify between 1 and 25 instance IDs.

  • Either NodeIds or NodeLogicalIds must be provided (or both), but at least one is required.

  • Each instance ID must follow the pattern i- followed by 17 hexadecimal characters (for example, i-0123456789abcdef0).

it_nodelogicalids TYPE /AWS1/CL_SGMCLSTNODELOGICALI00=>TT_CLUSTERNODELOGICALIDLIST TT_CLUSTERNODELOGICALIDLIST

A list of logical node IDs to reboot using soft recovery. You can specify between 1 and 25 logical node IDs.

The NodeLogicalId is a unique identifier that persists throughout the node's lifecycle and can be used to track nodes that are still being provisioned and don't yet have an EC2 instance ID assigned.

  • This parameter is only supported for clusters using Continuous as the NodeProvisioningMode. For clusters using the default provisioning mode, use NodeIds instead.

  • Either NodeIds or NodeLogicalIds must be provided (or both), but at least one is required.

RETURNING

oo_output TYPE REF TO /aws1/cl_sgmbtcrebootclstnod01 /AWS1/CL_SGMBTCREBOOTCLSTNOD01

Domain /AWS1/RT_ACCOUNT_ID
Primitive Type NUMC

Examples

Syntax Example

This is an example of the syntax for calling the method. It includes every possible argument and initializes every possible value. The data provided is not necessarily semantically accurate (for example the value "string" may be provided for something that is intended to be an instance ID, or in some cases two arguments may be mutually exclusive). The syntax shows the ABAP syntax for creating the various data structures.

DATA(lo_result) = lo_client->batchrebootclusternodes(
  it_nodeids = VALUE /aws1/cl_sgmclusternodeids_w=>tt_clusternodeids(
    ( new /aws1/cl_sgmclusternodeids_w( |string| ) )
  )
  it_nodelogicalids = VALUE /aws1/cl_sgmclstnodelogicali00=>tt_clusternodelogicalidlist(
    ( new /aws1/cl_sgmclstnodelogicali00( |string| ) )
  )
  iv_clustername = |string|
).

This is an example of reading all possible response values

lo_result = lo_result.
IF lo_result IS NOT INITIAL.
  LOOP AT lo_result->get_successful( ) into lo_row.
    lo_row_1 = lo_row.
    IF lo_row_1 IS NOT INITIAL.
      lv_clusternodeid = lo_row_1->get_value( ).
    ENDIF.
  ENDLOOP.
  LOOP AT lo_result->get_failed( ) into lo_row_2.
    lo_row_3 = lo_row_2.
    IF lo_row_3 IS NOT INITIAL.
      lv_clusternodeid = lo_row_3->get_nodeid( ).
      lv_batchrebootclusternodes = lo_row_3->get_errorcode( ).
      lv_string = lo_row_3->get_message( ).
    ENDIF.
  ENDLOOP.
  LOOP AT lo_result->get_failednodelogicalids( ) into lo_row_4.
    lo_row_5 = lo_row_4.
    IF lo_row_5 IS NOT INITIAL.
      lv_clusternodelogicalid = lo_row_5->get_nodelogicalid( ).
      lv_batchrebootclusternodes = lo_row_5->get_errorcode( ).
      lv_string = lo_row_5->get_message( ).
    ENDIF.
  ENDLOOP.
  LOOP AT lo_result->get_successfulnodelogicalids( ) into lo_row_6.
    lo_row_7 = lo_row_6.
    IF lo_row_7 IS NOT INITIAL.
      lv_clusternodelogicalid = lo_row_7->get_value( ).
    ENDIF.
  ENDLOOP.
ENDIF.