Create a NodeClass
Important
You must start with 0 nodes in your instance group and let Karpenter handle the autoscaling. If you start with more than 0 nodes, Karpenter will scale them down to 0.
A node class (NodeClass) defines infrastructure-level settings that apply
to groups of nodes in your Amazon EKS cluster, including network configuration, storage
settings, and resource tagging. A HyperPodNodeClass is a custom
NodeClass that maps to pre-created instance groups in SageMaker HyperPod,
defining constraints around which instance types and Availability Zones are supported
for Karpenter's autoscaling decisions.
Considerations for creating a node class
-
You can specify up to 10 instance groups in a
NodeClass. -
If you choose to delete an instance group, we recommend removing it from your
NodeClassbefore deleting it from your HyperPod cluster. If an instance group is deleted while it is used in aNodeClass, theNodeClasswill be marked as notReadyfor provisioning and won't be used for subsequent scaling operations until the instance group is removed fromNodeClass. -
When you remove instance groups from a
NodeClass, Karpenter will detect a drift on the nodes that were managed by Karpenter in the instance group(s) and disrupt the nodes based on your disruption budget controls. -
Subnets used by the instance group should belong to the same AZ. Subnets are specified either using
OverrideVpcConfigat the instance group level or the cluster level.VpcConfigis used by default. -
Only on-demand capacity is supported at this time. Instance groups with Training plan or reserved capacity are not supported.
-
Instance groups with
DeepHealthChecks (DHC)are not supported. This is because a DHC takes around 60-90 minutes to complete and pods will remain in pending state during that time which can cause over-provisioning.
The following steps cover how to create a NodeClass.
-
Create a YAML file (for example, nodeclass.yaml) with your
NodeClassconfiguration. -
Apply the configuration to your cluster using kubectl.
-
Reference the
NodeClassin yourNodePoolconfiguration. -
Here's a sample
NodeClassthat uses a ml.c5.xlarge and ml.c5.4xlarge instance types:apiVersion: karpenter.sagemaker.amazonaws.com/v1 kind: HyperpodNodeClass metadata: name: sample-nc spec: instanceGroups: # name of InstanceGroup in HyperPod cluster. InstanceGroup needs to pre-created # MaxItems: 10 - auto-c5-xaz1 - auto-c5-4xaz2 -
Apply the configuration:
kubectl apply -f nodeclass.yaml -
Monitor the NodeClass status to ensure the Ready condition in status is set to True:
kubectl get hyperpodnodeclass sample-nc -o yamlapiVersion: karpenter.sagemaker.amazonaws.com/v1 kind: HyperpodNodeClass metadata: creationTimestamp: "<timestamp>" name: sample-nc uid: <resource-uid> spec: instanceGroups: - auto-c5-az1 - auto-c5-4xaz2 status: conditions: // true when all IGs in the spec are present in SageMaker cluster, false otherwise - lastTransitionTime: "<timestamp>" message: "" observedGeneration: 3 reason: InstanceGroupReady status: "True" type: InstanceGroupReady // true if subnets of IGs are discoverable, false otherwise - lastTransitionTime: "<timestamp>" message: "" observedGeneration: 3 reason: SubnetsReady status: "True" type: SubnetsReady // true when all dependent resources are Ready [InstanceGroup, Subnets] - lastTransitionTime: "<timestamp>" message: "" observedGeneration: 3 reason: Ready status: "True" type: Ready instanceGroups: - instanceTypes: - ml.c5.xlarge name: auto-c5-az1 subnets: - id: <subnet-id> zone: <availability-zone-a> zoneId: <zone-id-a> - instanceTypes: - ml.c5.4xlarge name: auto-c5-4xaz2 subnets: - id: <subnet-id> zone: <availability-zone-b> zoneId: <zone-id-b>