

# Custom Amazon Machine Images (AMIs) for SageMaker HyperPod clusters
<a name="hyperpod-custom-ami-support"></a>

Using base Amazon Machine Images (AMIs) provided and made public by Amazon SageMaker HyperPod, you can build custom AMIs. With a custom AMI, you can create specialized environments for AI workloads with pre-configured software stacks, driver customizations, proprietary dependencies, and security agents. This capability eliminates the need for complex post-launch bootstrapping using lifecycle configuration scripts.

With custom AMIs, you can standardize environments across different stages, accelerate startup times, and have full control over your runtime environment while leveraging SageMaker HyperPod's infrastructure capabilities and scaling advantages. This helps you maintain control over your AI infrastructure while still benefiting from SageMaker HyperPod's optimized base runtime.

You can build upon the SageMaker HyperPod performance-tuned base images by adding security agents, compliance tools, and specialized libraries while preserving all the distributed training benefits. This capability removes the previously required choice between infrastructure optimization and organizational security policies.

The custom AMI experience integrates seamlessly with established enterprise security workflows. Security teams build hardened images using SageMaker HyperPod's public AMIs as a base, and AI platform teams can specify these custom AMIs when creating or updating clusters through the SageMaker HyperPod APIs. The APIs validate image compatibility, handle necessary permissions, and maintain backwards compatibility so existing workflows continue functioning. Organizations with stringent security protocols can eliminate the error-prone alternative of installing security agents at runtime through lifecycle scripts. By aligning with enterprise security practices rather than forcing organizations to adapt their protocols to SageMaker HyperPod's limitations, custom AMIs remove a common barrier to adoption for security-conscious organizations running critical AI workloads.

For release notes on updates to the public AMIs, see [Public AMI releases](sagemaker-hyperpod-release-public-ami.md). To learn how to get started with building a custom AMI and using it in your HyperPod clusters, see the following topics.

**Topics**
+ [Build a custom AMI](hyperpod-custom-ami-how-to.md)
+ [Cluster management with custom AMIs](hyperpod-custom-ami-cluster-management.md)

# Build a custom AMI
<a name="hyperpod-custom-ami-how-to"></a>

The following page explains how to build a custom Amazon Machine Image (AMI) using Amazon SageMaker HyperPod base AMIs. You begin by selecting a base AMI, and then you create your own customized AMI using any of the common methods for creating new images, such as the AWS CLI.

## Select a SageMaker HyperPod base AMI
<a name="hyperpod-custom-ami-select-base"></a>

You can select a SageMaker HyperPod base AMI through one of the following methods.

### AWS console selection
<a name="hyperpod-custom-ami-console-selection"></a>

You can select public SageMaker HyperPod AMIs through the AWS console or by using the `DescribeImages` API call. SageMaker HyperPod AMIs are public and visible in every AWS account. You can find them in the Amazon EC2 AMI catalog by applying a filter to search for public AMIs owned by Amazon.

To find SageMaker HyperPod AMIs in the console:

1. Sign in to the Amazon EC2 console.

1. In the left navigation pane, choose **AMIs**.

1. For the **Image type** dropdown, select **Public images**.

1. In the search bar filters, set the **Owner alias** filter to **amazon**.

1. Search for AMIs prefixed as **HyperPod EKS** and select the AMI (preferably latest) that works for your use case. For instance, you can choose an AMI between Kubernetes 1.31 versus Kubernetes 1.30.

### Fetch latest public AMI ID through the AWS CLI
<a name="hyperpod-custom-ami-cli-fetch"></a>

If you want to always use the latest release public AMI, it is more efficient to use the public SageMaker HyperPod SSM parameter that contains the value of the latest AMI ID released by SageMaker HyperPod.

The following example shows how to retrieve the latest AMI ID using the AWS CLI:

```
aws ssm get-parameter \
  --name "/aws/service/sagemaker-hyperpod/ami/x86_64/eks-1.31-amazon-linux-2/latest/ami-id" \
  --region us-east-1 \
  --query "Parameter.Value" \
  --output text
```

**Note**  
Replace the parameter name with the corresponding Kubernetes version as required. For example, if you want to use Kubernetes 1.30, use the following parameter: `/aws/service/hyperpod/ami/x86_64/eks-1.30-amazon-linux-2/latest/ami-id`.

## Build your custom AMI
<a name="hyperpod-custom-ami-build"></a>

After you have selected a SageMaker HyperPod public AMI, use that as the base AMI to build your own custom AMI with one of the following methods. Note that this is not an exhaustive list for building AMIs. You can use any method of your choice for building AMIs. SageMaker HyperPod does not have any specific recommendation.
+ **AWS Management Console**: You can launch an Amazon EC2 instance using the SageMaker HyperPod AMI, make desired customizations, and then create an AMI from that instance.
+ **AWS CLI**: You can also use the `aws ec2 create-image` command to create an AMI from an existing Amazon EC2 instance after performing the customization.
+ **HashiCorp Packer**: Packer is an open-source tool from HashiCorp that enables you to create identical machine images for multiple platforms from a single source configuration. It supports creating AMIs for AWS, as well as images for other cloud providers and virtualization platforms.
+ **Image Builder**: EC2 Image Builder is a fully managed AWS service that makes it easier to automate the creation, maintenance, validation, sharing, and deployment of Linux or Windows Server images. For more information, see the [EC2 Image Builder User Guide](https://docs.aws.amazon.com/imagebuilder/latest/userguide/what-is-image-builder.html).

### Build a custom AMI with customer managed AWS KMS encryption
<a name="hyperpod-custom-ami-build-kms"></a>

The following sections describe how to build a custom AMI with a customer managed AWS KMS key to encrypt your HyperPod cluster volumes. For more information about customer managed keys in HyperPod and granting the required IAM and KMS key policy permissions, see [Customer managed AWS KMS key encryption for SageMaker HyperPod](smcluster-cmk.md). If you plan to use a custom AMI that is encrypted with a customer managed key, ensure that you also encrypt your HyperPod cluster's Amazon EBS root volume with the same key.

#### AWS CLI example: Create a new AMI using EC2 Image Builder and a HyperPod base image
<a name="hyperpod-custom-ami-cli-example"></a>

The following example shows how to create an AMI using Image Builder with AWS KMS encryption:

```
aws imagebuilder create-image-recipe \
    name "hyperpod-custom-recipe" \
    version "1.0.0" \
    parent-image "<hyperpod-base-image-id>" \
    block-device-mappings DeviceName="/dev/xvda",Ebs={VolumeSize=100,VolumeType=gp3,Encrypted=true,KmsKeyId=arn:aws:kms:us-east-1:111122223333:key/key-id,DeleteOnTermination=true}
```

#### Amazon EC2 console: Create a new AMI from an Amazon EC2
<a name="hyperpod-custom-ami-console-example"></a>

To create an AMI from an Amazon EC2 instance using the Amazon EC2 console:

1. Right-click on your customized Amazon EC2 instance and choose **Create Image**.

1. In the **Encryption** section, select **Encrypt snapshots**.

1. Select your KMS key from the dropdown. For example: `arn:aws:kms:us-east-2:111122223333:key/<your-kms-key-id>` or use the key alias: `alias/<your-hyperpod-key>`.

#### AWS CLI example: Create a new AMI from an Amazon EC2 instance
<a name="hyperpod-custom-ami-cli-create-image"></a>

Use the `aws ec2 create-image` command with AWS KMS encryption:

```
aws ec2 create-image \
    instance-id "<instance-id>" \
    name "MyCustomHyperPodAMI" \
    description "Custom HyperPod AMI" \
    block-device-mappings '[
        {
            "DeviceName": "/dev/xvda",
            "Ebs": {
                "Encrypted": true,
                "KmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/key-id",
                "VolumeType": "gp2" 
            }
        }
    ]'
```

# Cluster management with custom AMIs
<a name="hyperpod-custom-ami-cluster-management"></a>

After the custom AMI is built, you can use it for creating or updating an Amazon SageMaker HyperPod cluster. You can also scale up or add instance groups that use the new AMI.

## Permissions required for cluster operations
<a name="hyperpod-custom-ami-permissions"></a>

Add the following permissions to the cluster admin user who operates and configures SageMaker HyperPod clusters. The following policy example includes the minimum set of permissions for cluster administrators to run the SageMaker HyperPod core APIs and manage SageMaker HyperPod clusters with custom AMI.

Note that AMI and AMI EBS snapshot sharing permissions are included through `ModifyImageAttribute` and `ModifySnapshotAttribute` API permissions as part of the following policy. For scoping down the sharing permissions, you can take the following steps:
+ Add tags to control the AMI sharing permissions to AMI and AMI snapshot. For example, you can tag the AMI with `AllowSharing` as `true`.
+ Add the context key in the policy to only allow AMI sharing for AMIs tagged with certain tags.

The following policy is a scoped down policy to ensure only AMIs tagged with `AllowSharing` as `true` are allowed.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::111122223333:role/your-execution-role-name"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateCluster",
                "sagemaker:DeleteCluster",
                "sagemaker:DescribeCluster",
                "sagemaker:DescribeClusterNode",
                "sagemaker:ListClusterNodes",
                "sagemaker:ListClusters",
                "sagemaker:UpdateCluster",
                "sagemaker:UpdateClusterSoftware",
                "sagemaker:BatchDeleteClusterNodes",
                "eks:DescribeCluster",
                "eks:CreateAccessEntry",
                "eks:DescribeAccessEntry",
                "eks:DeleteAccessEntry",
                "eks:AssociateAccessPolicy",
                "iam:CreateServiceLinkedRole",
                "ec2:DescribeImages",
                "ec2:DescribeSnapshots"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:ModifyImageAttribute",
                "ec2:ModifySnapshotAttribute"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "ec2:ResourceTag/AllowSharing": "true"
                }
            }
        }
    ]
}
```

------

**Important**  
If you plan to use an encrypted custom AMI, then make sure that your KMS key meets the permissions described in [Customer managed AWS KMS key encryption for SageMaker HyperPod](smcluster-cmk.md). Additionally, ensure that your custom AMI's KMS key is also used to encrypt your cluster's Amazon EBS root volume.

## Create a cluster
<a name="hyperpod-custom-ami-api-create"></a>

You can specify your custom AMI in the `ImageId` field for the `CreateCluster` operation.

The following examples show how to create a cluster with a custom AMI, both with and without an AWS KMS customer managed key for encrypting the cluster volumes.

------
#### [ Standard example ]

The following example shows how to create a cluster with a custom AMI.

```
aws sagemaker create-cluster \
   --cluster-name <exampleClusterName> \
   --orchestrator 'Eks={ClusterArn='<eks_cluster_arn>'}' \
   --node-provisioning-mode Continuous \
   --instance-groups '{
   "InstanceGroupName": "<exampleGroupName>",
   "InstanceType": "ml.c5.2xlarge",
   "InstanceCount": 2,
   "LifeCycleConfig": {
      "SourceS3Uri": "<s3://amzn-s3-demo-bucket>",
      "OnCreate": "on_create_noop.sh"
   },
   "ImageId": "<your_custom_ami>",
   "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>",
   "ThreadsPerCore": 1,
   "InstanceStorageConfigs": [
   
        {
            "EbsVolumeConfig": {
                "VolumeSizeInGB": 200
            }
        }
   ]
}' --vpc-config '{
   "SecurityGroupIds": ["<security_group>"],
   "Subnets": ["<subnet>"]
}'
```

------
#### [ Customer managed key example ]

The following example shows how to create a cluster with a custom AMI while specifying your own AWS KMS customer managed key for encrypting the cluster's Amazon EBS volumes. It is possible to specify different customer managed keys for the root volume and the instance storage volume. If you don't use customer managed keys in the `InstanceStorageConfigs` field, then an AWS owned KMS key is used to encrypt the volumes. If you use different keys for the root volume and secondary instance storage volumes, then set the required KMS key policies on both of your keys.

```
aws sagemaker create-cluster \
   --cluster-name <exampleClusterName> \
   --orchestrator 'Eks={ClusterArn='<eks_cluster_arn>'}' \
   --node-provisioning-mode Continuous \
   --instance-groups '{
   "InstanceGroupName": "<exampleGroupName>",
   "InstanceType": "ml.c5.2xlarge",
   "InstanceCount": 2,
   "LifeCycleConfig": {
      "SourceS3Uri": "<s3://amzn-s3-demo-bucket>",
      "OnCreate": "on_create_noop.sh"
   },
   "ImageId": "<your_custom_ami>",
   "ExecutionRole": "<arn:aws:iam:us-east-1:444455556666:role/Admin>",
   "ThreadsPerCore": 1,
   "InstanceStorageConfigs": [
             # Root volume configuration
            {
                "EbsVolumeConfig": {
                    "RootVolume": True,
                    "VolumeKmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/key-id"
                }
            },
            # Instance storage volume configuration
            {
                "EbsVolumeConfig": {
                    "VolumeSizeInGB": 100,
                    "VolumeKmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/key-id"
                }
            }
   ]
}' --vpc-config '{
   "SecurityGroupIds": ["<security_group>"],
   "Subnets": ["<subnet>"]
}'
```

------

## Update the cluster software
<a name="hyperpod-custom-ami-api-update"></a>

If you want to update an existing instance group on your cluster with your custom AMI, you can use the `UpdateClusterSoftware` operation and specify your custom AMI in the `ImageId` field. Note that unless you specify the name of a specific instance group in your request, then the new image is applied to all of the instance groups in your cluster.

The following example shows how to update a cluster's platform software with a custom AMI:

```
aws sagemaker update-cluster-software \
   --cluster-name <exampleClusterName> \
   --instance-groups <instanceGroupToUpdate> \
   --image-id <customAmiId>
```

## Scale up an instance group
<a name="hyperpod-custom-ami-scale-up"></a>

The following examples show how to scale up an instance group for a cluster using a custom AMI, both with and without using an AWS KMS customer managed key for encryption.

------
#### [ Standard example ]

The following example shows how to scale up an instance group with a custom AMI.

```
aws sagemaker update-cluster \
    --cluster-name <exampleClusterName> --instance-groups '[{                  
    "InstanceGroupName": "<exampleGroupName>",
   "InstanceType": "ml.c5.2xlarge",
   "InstanceCount": 2,
   "LifeCycleConfig": {
      "SourceS3Uri": "<s3://amzn-s3-demo-bucket>",
      "OnCreate": "on_create_noop.sh"
   },
   "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>",
   "ThreadsPerCore": 1,
   "ImageId": "<your_custom_ami>"
}]'
```

------
#### [ Customer managed key example ]

The following example shows how to update and scale up your cluster with a custom AMI while specifying your own AWS KMS customer managed key for encrypting the cluster's Amazon EBS volumes. It is possible to specify different customer managed keys for the root volume and the instance storage volume. If you don't use customer managed keys in the `InstanceStorageConfigs` field, then an AWS owned KMS key is used to encrypt the volumes. If you use different keys for the root volume and secondary instance storage volumes, then set the required KMS key policies on both of your keys.

```
aws sagemaker update-cluster \
    --cluster-name <exampleClusterName> --instance-groups '[{                  
    "InstanceGroupName": "<exampleGroupName>",
   "InstanceType": "ml.c5.2xlarge",
   "InstanceCount": 2,
   "LifeCycleConfig": {
      "SourceS3Uri": "<s3://amzn-s3-demo-bucket>",
      "OnCreate": "on_create_noop.sh"
   },
   "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>",
   "ThreadsPerCore": 1,
   "ImageId": "<your_custom_ami>",
   "InstanceStorageConfigs": [
             # Root volume configuration
            {
                "EbsVolumeConfig": {
                    "RootVolume": True,
                    "VolumeKmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/key-id"
                }
            },
            # Instance storage volume configuration
            {
                "EbsVolumeConfig": {
                    "VolumeSizeInGB": 100,
                    "VolumeKmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/key-id"
                }
            }
   ]
}]'
```

------

## Add an instance group
<a name="hyperpod-custom-ami-add-instance-group"></a>

The following example shows how to add an instance group to a cluster using a custom AMI:

```
aws sagemaker update-cluster \
   --cluster-name "<exampleClusterName>" \
   --instance-groups '{
   "InstanceGroupName": "<exampleGroupName>",
   "InstanceType": "ml.c5.2xlarge",
   "InstanceCount": 2,
   "LifeCycleConfig": {
      "SourceS3Uri": "<s3://amzn-s3-demo-bucket>",
      "OnCreate": "on_create_noop.sh"
   },
   "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>",
   "ThreadsPerCore": 1,
   "ImageId": "<your_custom_ami>"
}' '{
   "InstanceGroupName": "<exampleGroupName2>",
   "InstanceType": "ml.c5.2xlarge",
   "InstanceCount": 1,
   "LifeCycleConfig": {
      "SourceS3Uri": "<s3://amzn-s3-demo-bucket>",
      "OnCreate": "on_create_noop.sh"
   },
   "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>",
   "ThreadsPerCore": 1,
   "ImageId": "<your_custom_ami>"
}'
```