Setting up your HyperPod clusters for model deployment
This guide provides you with a comprehensive setup guide for enabling inference capabilities on Amazon SageMaker HyperPod clusters. The following steps help you set up the infrastructure, permissions, and operators required to support machine learning engineers in deploying and managing inference endpoints.
Prerequisites
Before proceeding, verify that your AWS credentials are properly configured and have the necessary permissions. Verify that you've created a HyperPod cluster with Creating a SageMaker HyperPod cluster .
-
Configure kubectl to connect to the newly created HyperPod cluster orchestrated by Amazon EKS cluster. Specify the Region and HyperPod cluster name.
export HYPERPOD_CLUSTER_NAME=<hyperpod-cluster-name> export REGION=<region> # S3 bucket where tls certificates will be uploaded BUCKET_NAME="<Enter name of your s3 bucket>" # This should be bucket name, not URI export EKS_CLUSTER_NAME=$(aws --region $REGION sagemaker describe-cluster --cluster-name $HYPERPOD_CLUSTER_NAME \ --query 'Orchestrator.Eks.ClusterArn' --output text | \ cut -d'/' -f2) aws eks update-kubeconfig --name $EKS_CLUSTER_NAME --region $REGION
-
Set default env variables.
LB_CONTROLLER_POLICY_NAME="AWSLoadBalancerControllerIAMPolicy-$HYPERPOD_CLUSTER_NAME" LB_CONTROLLER_ROLE_NAME="aws-load-balancer-controller-$HYPERPOD_CLUSTER_NAME" S3_MOUNT_ACCESS_POLICY_NAME="S3MountpointAccessPolicy-$HYPERPOD_CLUSTER_NAME" S3_CSI_ROLE_NAME="SM_HP_S3_CSI_ROLE-$HYPERPOD_CLUSTER_NAME" KEDA_OPERATOR_POLICY_NAME="KedaOperatorPolicy-$HYPERPOD_CLUSTER_NAME" KEDA_OPERATOR_ROLE_NAME="keda-operator-role-$HYPERPOD_CLUSTER_NAME" PRESIGNED_URL_ACCESS_POLICY_NAME="PresignedUrlAccessPolicy-$HYPERPOD_CLUSTER_NAME" HYPERPOD_INFERENCE_ACCESS_POLICY_NAME="HyperpodInferenceAccessPolicy-$HYPERPOD_CLUSTER_NAME" HYPERPOD_INFERENCE_ROLE_NAME="HyperpodInferenceRole-$HYPERPOD_CLUSTER_NAME" HYPERPOD_INFERENCE_SA_NAME="hyperpod-inference-service-account" HYPERPOD_INFERENCE_SA_NAMESPACE="kube-system" JUMPSTART_GATED_ROLE_NAME="JumpstartGatedRole-$HYPERPOD_CLUSTER_NAME" FSX_CSI_ROLE_NAME="AmazonEKSFSxLustreCSIDriverFullAccess-$HYPERPOD_CLUSTER_NAME"
-
Extract the Amazon EKS cluster name from the cluster ARN, update the local kubeconfig, and verify connectivity by listing all pods across namespaces.
kubectl get pods --all-namespaces
-
(Optional) Install the NVIDIA device plugin to enable GPU support on the cluster.
#Install nvidia device plugin kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.5/nvidia-device-plugin.yml # Verify that GPUs are visible to k8s kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia.com/gpu
Prepare your environment for inference operator installation
Now that your HyperPod cluster is configured, the next step is to install the inference operator. The inference operator is a Kubernetes operator that enables deployment and management of machine learning inference endpoints on your Amazon EKS cluster.
Complete the next critical preparation steps to ensure your Amazon EKS cluster has the proper security configurations and supporting infrastructure components. This involves configuring IAM roles and security policies for cross-service authentication, installing the AWS Load Balancer Controller for ingress management, setting up Amazon S3 and Amazon FSx CSI drivers for persistent storage access, and deploying KEDA and cert-manager for autoscaling and certificate management capabilities.
-
Gather essential AWS resource identifiers and ARNs required for configuring service integrations between EKS, SageMaker, and IAM components.
%%bash -x export ACCOUNT_ID=$(aws --region $REGION sts get-caller-identity --query 'Account' --output text) export OIDC_ID=$(aws --region $REGION eks describe-cluster --name $EKS_CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5) export EKS_CLUSTER_ROLE=$(aws eks --region $REGION describe-cluster --name $EKS_CLUSTER_NAME --query 'cluster.roleArn' --output text)
-
Associate an IAM OIDCidentity provider with your EKS cluster.
eksctl utils associate-iam-oidc-provider --region=$REGION --cluster=$EKS_CLUSTER_NAME --approve
-
Create the trust policy and permission policy JSON documents required for the HyperPod inference operator IAM role. These policies enable secure cross-service communication between EKS, SageMaker, and other AWS services.
bash # Create trust policy JSON cat << EOF > trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "sagemaker.amazonaws.com" ] }, "Action": "sts:AssumeRole" }, { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}:aud": "sts.amazonaws.com", "oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}:sub": "system:serviceaccount:*:*" } } } ] } EOF # Create permission policy JSON cat << EOF > permission-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::$BUCKET_NAME" "arn:aws:s3:::$BUCKET_NAME/*" ] }, { "Effect": "Allow", "Action": [ "ecr:GetAuthorizationToken" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:GetRepositoryPolicy", "ecr:DescribeRepositories", "ecr:ListImages", "ecr:DescribeImages", "ecr:BatchGetImage", "ecr:GetLifecyclePolicy", "ecr:GetLifecyclePolicyPreview", "ecr:ListTagsForResource", "ecr:DescribeImageScanFindings" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ec2:AssignPrivateIpAddresses", "ec2:AttachNetworkInterface", "ec2:CreateNetworkInterface", "ec2:DeleteNetworkInterface", "ec2:DescribeInstances", "ec2:DescribeTags", "ec2:DescribeNetworkInterfaces", "ec2:DescribeInstanceTypes", "ec2:DescribeSubnets", "ec2:DetachNetworkInterface", "ec2:DescribeDhcpOptions", "ec2:ModifyNetworkInterfaceAttribute", "ec2:UnassignPrivateIpAddresses", "ec2:CreateTags", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeVolumes", "ec2:DescribeVolumesModifications", "ec2:DescribeVpcs" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "eks:Describe*", "eks:List*", "eks:AssociateAccessPolicy", "eks:AccessKubernetesApi", "eks-auth:AssumeRoleForPodIdentity" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "elasticloadbalancing:Create*", "elasticloadbalancing:Describe*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "sagemaker:CreateModel", "sagemaker:DescribeModel", "sagemaker:DeleteModel", "sagemaker:ListModels", "sagemaker:CreateEndpointConfig", "sagemaker:DescribeEndpointConfig", "sagemaker:DeleteEndpointConfig", "sagemaker:CreateEndpoint", "sagemaker:DeleteEndpoint", "sagemaker:DescribeEndpoint", "sagemaker:UpdateEndpoint", "sagemaker:ListTags", "sagemaker:EnableClusterInference", "sagemaker:DescribeClusterInference", "sagemaker:DescribeHubContent" ], "Resource": "arn:aws:sagemaker:$REGION:*:*" }, { "Effect": "Allow", "Action": [ "fsx:DescribeFileSystems" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "acm:ImportCertificate", "acm:DeleteCertificate" ], "Resource": "arn:aws:acm:$REGION:$ACCOUNT_ID:certificate/*" }, { "Sid": "AllowPassRoleToSageMaker", "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": "arn:aws:iam::$ACCOUNT_ID:role/$HYPERPOD_INFERENCE_ROLE_NAME", "Condition": { "StringEquals": { "iam:PassedToService": "sagemaker.amazonaws.com" } } }, { "Sid": "CloudWatchEMFPermissions", "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData", "logs:PutLogEvents", "logs:DescribeLogStreams", "logs:DescribeLogGroups", "logs:CreateLogStream", "logs:CreateLogGroup" ], "Resource": "*" } ] } EOF
-
Create execution Role for the inference operator.
aws iam create-policy --policy-name $HYPERPOD_INFERENCE_ACCESS_POLICY_NAME --policy-document file://permission-policy.json export policy_arn="arn:aws:iam::${ACCOUNT_ID}:policy/$HYPERPOD_INFERENCE_ACCESS_POLICY_NAME" # Create the IAM role eksctl create iamserviceaccount --approve --role-only --name=$HYPERPOD_INFERENCE_SA_NAME --namespace=$HYPERPOD_INFERENCE_SA_NAMESPACE --cluster=$EKS_CLUSTER_NAME --attach-policy-arn=$policy_arn --role-name=$HYPERPOD_INFERENCE_ROLE_NAME --region=$REGION
aws iam create-role --role-name $HYPERPOD_INFERENCE_ROLE_NAME --assume-role-policy-document file://trust-policy.json aws iam put-role-policy --role-name $HYPERPOD_INFERENCE_ROLE_NAME --policy-name InferenceOperatorInlinePolicy --policy-document file://permission-policy.json
-
Download and create the IAM policy required for the AWS Load Balancer Controller to manage Application Load Balancers and Network Load Balancers in your EKS cluster.
%%bash -x export ALBController_IAM_POLICY_NAME=HyperPodInferenceALBControllerIAMPolicy curl -o AWSLoadBalancerControllerIAMPolicy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.13.0/docs/install/iam_policy.json aws iam create-policy --policy-name $ALBController_IAM_POLICY_NAME --policy-document file://AWSLoadBalancerControllerIAMPolicy.json
-
Create an IAM service account that links the Kubernetes service account with the IAM policy, enabling the AWS Load Balancer Controller to assume the necessary AWS permissions through IRSA (IAM Roles for Service Accounts).
%%bash -x export ALB_POLICY_ARN="arn:aws:iam::$ACCOUNT_ID:policy/$ALBController_IAM_POLICY_NAME" # Create IAM service account with gathered values eksctl create iamserviceaccount \ --approve \ --override-existing-serviceaccounts \ --name=aws-load-balancer-controller \ --namespace=kube-system \ --cluster=$EKS_CLUSTER_NAME \ --attach-policy-arn=$ALB_POLICY_ARN \ --region=$REGION # Print the values for verification echo "Cluster Name: $EKS_CLUSTER_NAME" echo "Region: $REGION" echo "Policy ARN: $ALB_POLICY_ARN"
-
Apply Tags (
kubernetes.io.role/elb
) to all subnets in the EKS cluster (both public and private).export VPC_ID=$(aws --region $REGION eks describe-cluster --name $EKS_CLUSTER_NAME --query 'cluster.resourcesVpcConfig.vpcId' --output text) # Add Tags aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=${VPC_ID}" "Name=map-public-ip-on-launch,Values=true" \ --query 'Subnets[*].SubnetId' --output text | \ tr '\t' '\n' | \ xargs -I{} aws ec2 create-tags --resources {} --tags Key=kubernetes.io/role/elb,Value=1 # Verify Tags are added aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=${VPC_ID}" "Name=map-public-ip-on-launch,Values=true" \ --query 'Subnets[*].SubnetId' --output text | \ tr '\t' '\n' | xargs -n1 -I{} aws ec2 describe-tags --filters "Name=resource-id,Values={}" "Name=key,Values=kubernetes.io/role/elb" --query "Tags[0].Value" --output text
-
Create a Namespace for KEDA and the Cert Manager.
kubectl create namespace keda kubectl create namespace cert-manager
-
Create an Amazon S3 VPC endpoint.
aws ec2 create-vpc-endpoint \ --vpc-id ${VPC_ID} \ --vpc-endpoint-type Gateway \ --service-name "com.amazonaws.${REGION}.s3" \ --route-table-ids $(aws ec2 describe-route-tables --filters "Name=vpc-id,Values=${VPC_ID}" --query 'RouteTables[].Associations[].RouteTableId' --output text | tr ' ' '\n' | sort -u | tr '\n' ' ')
-
Configure S3 storage access:
-
Create an IAM policy that grants the necessary S3 permissions for using Mountpoint for Amazon S3, which enables file system access to S3 buckets from within the cluster.
%%bash -x cat <<EOF> s3accesspolicy.json { "Version": "2012-10-17", "Statement": [ { "Sid": "MountpointFullBucketAccess", "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::*", "arn:aws:s3:::*/*" ] }, { "Sid": "MountpointFullObjectAccess", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:AbortMultipartUpload", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::*", "arn:aws:s3:::*/*" ] } ] } EOF aws iam create-policy \ --policy-name S3MountpointAccessPolicy \ --policy-document file://s3accesspolicy.json cat <<EOF>> s3accesstrustpolicy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$ACCOUNT_ID:oidc-provider/oidc.eks.$REGION.amazonaws.com/id/${OIDC_ID}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.eks.$REGION.amazonaws.com/id/${OIDC_ID}:aud": "sts.amazonaws.com", "oidc.eks.$REGION.amazonaws.com/id/${OIDC_ID}:sub": "system:serviceaccount:kube-system:${s3-csi-driver-sa}" } } } ] } EOF aws iam create-role --role-name $S3_CSI_ROLE_NAME --assume-role-policy-document file://s3accesstrustpolicy.json aws iam attach-role-policy --role-name $S3_CSI_ROLE_NAME --policy-arn "arn:aws:iam::$ACCOUNT_ID:policy/S3MountpointAccessPolicy"
-
(Optional) Create an IAM service account for the Amazon S3 CSI driver. The Amazon S3 CSI driver requires an IAM service account with appropriate permissions to mount S3 buckets as persistent volumes in your EKS cluster. This step creates the necessary IAM role and Kubernetes service account with the required S3 access policy.
%%bash -x export S3_CSI_ROLE_NAME="SM_HP_S3_CSI_ROLE-$REGION" export S3_CSI_POLICY_ARN=$(aws iam list-policies --query 'Policies[?PolicyName==`S3MountpointAccessPolicy`]' | jq '.[0].Arn' | tr -d '"') eksctl create iamserviceaccount \ --name s3-csi-driver-sa \ --namespace kube-system \ --cluster $EKS_CLUSTER_NAME \ --attach-policy-arn $S3_CSI_POLICY_ARN \ --approve \ --role-name $S3_CSI_ROLE_NAME \ --region $REGION kubectl label serviceaccount s3-csi-driver-sa app.kubernetes.io/component=csi-driver app.kubernetes.io/instance=aws-mountpoint-s3-csi-driver app.kubernetes.io/managed-by=EKS app.kubernetes.io/name=aws-mountpoint-s3-csi-driver -n kube-system --overwrite
-
(Optional) Install the Amazon S3 CSI driver add-on. This driver enables your pods to mount S3 buckets as persistent volumes, providing direct access to S3 storage from within your Kubernetes workloads.
%%bash -x export S3_CSI_ROLE_ARN=$(aws iam get-role --role-name $S3_CSI_ROLE_NAME --query 'Role.Arn' --output text) eksctl create addon --name aws-mountpoint-s3-csi-driver --cluster $EKS_CLUSTER_NAME --service-account-role-arn $S3_CSI_ROLE_ARN --force
-
(Optional) Create a Persistent Volume Claim (PVC) for S3 storage. This PVC enables your pods to request and use S3 storage as if it were a traditional file system.
%%bash -x cat <<EOF> pvc_s3.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: s3-claim spec: accessModes: - ReadWriteMany # supported options: ReadWriteMany / ReadOnlyMany storageClassName: "" # required for static provisioning resources: requests: storage: 1200Gi # ignored, required volumeName: s3-pv EOF kubectl apply -f pvc_s3.yaml
-
-
(Optional) Configure FSx storage access. Create an IAM service account for the Amazon FSx CSI driver. This service account will be used by the FSx CSI driver to interact with the Amazon FSx service on behalf of your cluster.
%%bash -x eksctl create iamserviceaccount \ --name fsx-csi-controller-sa \ --namespace kube-system \ --cluster $EKS_CLUSTER_NAME \ --attach-policy-arn arn:aws:iam::aws:policy/AmazonFSxFullAccess \ --approve \ --role-name FSXLCSI-${EKS_CLUSTER_NAME}-${REGION} \ --region $REGION
Create the KEDA operator role
-
Create the trust policy and permissions policy.
# Create trust policy cat <<EOF > /tmp/keda-trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$ACCOUNT_ID:oidc-provider/oidc.eks.$REGION.amazonaws.com/id/$OIDC_ID" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "oidc.eks.$REGION.amazonaws.com/id/$OIDC_ID:sub": "system:serviceaccount:kube-system:keda-operator", "oidc.eks.$REGION.amazonaws.com/id/$OIDC_ID:aud": "sts.amazonaws.com" } } } ] } EOF # Create permissions policy cat <<EOF > /tmp/keda-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:GetMetricData", "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "aps:QueryMetrics", "aps:GetLabels", "aps:GetSeries", "aps:GetMetricMetadata" ], "Resource": "*" } ] } EOF # Create the role aws iam create-role \ --role-name keda-operator-role \ --assume-role-policy-document file:///tmp/keda-trust-policy.json # Create the policy KEDA_POLICY_ARN=$(aws iam create-policy \ --policy-name KedaOperatorPolicy \ --policy-document file:///tmp/keda-policy.json \ --query 'Policy.Arn' \ --output text) # Attach the policy to the role aws iam attach-role-policy \ --role-name keda-operator-role \ --policy-arn $KEDA_POLICY_ARN
-
If you're using gated models, create an IAM role to access the gated models.
-
Create an IAM policy.
%%bash -s $REGION cat <<EOF> /tmp/presignedurl-policy.json { "Version": "2012-10-17", "Statement": [ { "Sid": "CreatePresignedUrlAccess", "Effect": "Allow", "Action": [ "sagemaker:CreateHubContentPresignedUrls" ], "Resource": [ "arn:aws:sagemaker:$1:aws:hub/SageMakerPublicHub", "arn:aws:sagemaker:$1:aws:hub-content/SageMakerPublicHub/*/*" ] } ] } EOF aws iam create-policy --policy-name PresignedUrlAccessPolicy --policy-document file:///tmp/presignedurl-policy.json JUMPSTART_GATED_ROLE_NAME="JumpstartGatedRole-${REGION}-${HYPERPOD_CLUSTER_NAME}" cat <<EOF > /tmp/trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$ACCOUNT_ID:oidc-provider/oidc.eks.$REGION.amazonaws.com/id/$OIDC_ID" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "oidc.eks.$REGION.amazonaws.com/id/$OIDC_ID:sub": "system:serviceaccount:*:$HYPERPOD_INFERENCE_SA_NAME", "oidc.eks.$REGION.amazonaws.com/id/$OIDC_ID:aud": "sts.amazonaws.com" } } }, { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF
-
Create an IAM role.
# Create the role using existing trust policy aws iam create-role \ --role-name $JUMPSTART_GATED_ROLE_NAME \ --assume-role-policy-document file:///tmp/trust-policy.json # Attach the existing PresignedUrlAccessPolicy to the role aws iam attach-role-policy \ --role-name $JUMPSTART_GATED_ROLE_NAME \ --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/PresignedUrlAccessPolicy
JUMPSTART_GATED_ROLE_ARN_LIST= !aws iam get-role --role-name=$JUMPSTART_GATED_ROLE_NAME --query "Role.Arn" --output text JUMPSTART_GATED_ROLE_ARN = JUMPSTART_GATED_ROLE_ARN_LIST[0] !echo $JUMPSTART_GATED_ROLE_ARN
-
Add
SageMakerFullAccess
policy to the execution role.aws iam attach-role-policy --role-name=$HYPERPOD_INFERENCE_ROLE_NAME --policy-arn=arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
-
Install the inference operator
-
Install the HyperPod inference operator. This step gathers the required AWS resource identifiers and generates the Helm installation command with the appropriate configuration parameters.
Access the helm chart from https://github.com/aws/sagemaker-hyperpod-cli/tree/main/helm_chart
. git clone https://github.com/aws/sagemaker-hyperpod-cli cd sagemaker-hyperpod-cli cd helm_chart/HyperPodHelmChart
%%bash -x HYPERPOD_INFERENCE_ROLE_ARN=$(aws iam get-role --role-name=$HYPERPOD_INFERENCE_ROLE_NAME --query "Role.Arn" --output text) echo $HYPERPOD_INFERENCE_ROLE_ARN S3_CSI_ROLE_ARN=$(aws iam get-role --role-name=$S3_CSI_ROLE_NAME --query "Role.Arn" --output text) echo $S3_CSI_ROLE_ARN HYPERPOD_CLUSTER_ARN=$(aws sagemaker describe-cluster --cluster-name $HYPERPOD_CLUSTER_NAME --query "ClusterArn") # Verify values echo "Cluster Name: $EKS_CLUSTER_NAME" echo "Execution Role: $HYPERPOD_INFERENCE_ROLE_ARN" echo "Hyperpod ARN: $HYPERPOD_CLUSTER_ARN" # Run the the HyperPod inference operator installation. helm install hyperpod-inference-operator charts/inference-operator -n kube-system \ --set region=$REGION \ --set eksClusterName=$EKS_CLUSTER_NAME \ --set hyperpodClusterArn=$HYPERPOD_CLUSTER_ARN \ --set executionRoleArn=$HYPERPOD_INFERENCE_ROLE_ARN \ --set s3.serviceAccountRoleArn=$S3_CSI_ROLE_ARN \ --set s3.node.serviceAccount.create=false \ --set keda.podIdentity.aws.irsa.roleArn="arn:aws:iam::$ACCOUNT_ID:role/keda-operator-role" \ --set tlsCertificateS3Bucket="s3://$BUCKET_NAME" \ --set alb.region=$REGION \ --set alb.clusterName=$EKS_CLUSTER_NAME \ --set alb.vpcId=$VPC_ID # For JumpStart Gated Model usage, Add # --set jumpstartGatedModelDownloadRoleArn=$UMPSTART_GATED_ROLE_ARN
-
Configure the service account annotations for IAM integration. This annotation enables the operator's service account to assume the necessary IAM permissions for managing inference endpoints and interacting with AWS services.
%%bash -x EKS_CLUSTER_ROLE_NAME=$(echo $EKS_CLUSTER_ROLE | sed 's/.*\///') # Annotate service account kubectl annotate serviceaccount hyperpod-inference-operator-controller-manager \ -n hyperpod-inference-system \ eks.amazonaws.com/role-arn=arn:aws:iam::${ACCOUNT_ID}:role/${EKS_CLUSTER_ROLE_NAME} \ --overwrite
Verify the inference operator is working
-
Create a model deployment configuration file. This creates a Kubernetes manifest file that defines a JumpStart model deployment for the HyperPod inference operator. The configuration specifies how to deploy a pre-trained model from Amazon SageMaker JumpStart as an inference endpoint on your Amazon EKS cluster.
cat <<EOF>> simple_model_install.yaml --- apiVersion: inference.sagemaker.aws.amazon.com/v1alpha1 kind: JumpStartModel metadata: name: testing-deployment-bert namespace: default spec: model: modelId: "huggingface-eqa-bert-base-cased" sageMakerEndpoint: name: "hp-inf-ep-for-testing" server: instanceType: "ml.c5.2xlarge" environmentVariables: - name: SAMPLE_ENV_VAR value: "sample_value" maxDeployTimeInSeconds: 1800 EOF
-
Deploy the model and clean up the configuration file. This step creates the JumpStart model resource and removes the temporary configuration file to maintain a clean workspace.
%%bash -x kubectl create -f simple_model_install.yaml rm -rfv simple_model_install.yaml
-
Check if the model is installed and running. This verification ensures that the operator can successfully assume AWS permissions for managing inference endpoints.
%%bash # Get the service account details kubectl get serviceaccount -n hyperpod-inference-system # Check if the service account has the AWS annotations kubectl describe serviceaccount hyperpod-inference-operator-controller-manager -n hyperpod-inference-system
-
Write the model input file. This creates a JSON input file containing sample data to test the deployed model's question-answering capabilities.
%%writefile demo-input.json {"question" :"what is the name of the planet?","context" : "earth"}
-
Invoke the SageMaker endpoint to perform load testing to validate the inference endpoint's performance and reliability.
%%bash #!/bin/bash for i in {1..1000} do echo "Invocation #$i" aws sagemaker-runtime invoke-endpoint \ --endpoint-name testing-deployment-jumpstart-9 \ --region {REGION} \ --body fileb://demo-input.json \ --content-type application/list-text \ --accept application/json \ "demoout_${i}.json" # Add a small delay to prevent throttling (optional) #sleep 0.5 rm -f "demoout_${i}.json" done