Deploy models from JumpStart using kubectl
The following steps show you how to deploy a JumpStart model to a HyperPod cluster using kubectl.
The following instructions contain code cells and commands designed to run in a Jupyter notebook environment, such as Amazon SageMaker Studio or SageMaker Notebook Instances. Each code block represents a notebook cell that should be executed sequentially. The interactive elements, including model discovery tables and status monitoring commands, are optimized for the notebook interface and may not function properly in other environments. Ensure you have access to a notebook environment with the necessary AWS permissions before proceeding.
Prerequisites
Ensure you've set up inference capabilities on your Amazon SageMaker HyperPod clusters. For more information, see Setting up your HyperPod clusters for model deployment.
Setup and configuration
Select your Region
Choose the Region to deploy your HyperPod cluster to and where you want to
run your inference workloads. You can also add other customizations to the
sagemaker_client
.
region_name = <REGION> import boto3 from botocore.config import Config # Configure retry options boto3_config = Config( retries={ 'max_attempts': 10, # Maximum number of retry attempts 'mode': 'adaptive' # Use adaptive mode for exponential backoff } ) sagemaker_client=boto3.client("sagemaker", region_name=region_name, config=boto3_config)
Choose your model and cluster
-
View all SageMaker public hub models and HyperPod clusters.
interactive_view(get_all_public_hub_model_data(sagemaker_client)) interactive_view(get_all_cluster_data(sagemaker_client))
-
Configure the model ID and cluster name you've selected into the variables below.
Note
Check with your cluster admin to ensure permissions are granted for this notebook execution role. You can run
!aws sts get-caller-identity --query "Arn"
to check which execution role you are using# Change the model_id based on your requirement. A list of model IDs is available in step 1 of this notebook. # Proprietary models are not supported model_id = "<insert model id here>" from sagemaker.hyperpod.inference.notebook_utils import validate_public_hub_model_is_not_proprietary validate_public_hub_model_is_not_proprietary(sagemaker_client, model_id)
# Select the cluster name where you want to deploy the model. List of clusters is available in step 1 of this notebook. cluster_name = "<insert cluster name here>" from sagemaker.hyperpod.inference.notebook_utils import validate_cluster_can_support_public_hub_modelfrom sagemaker.hyperpod.inference.notebook_utils import get_public_hub_model_compatible_instances validate_cluster_can_support_public_hub_model(sagemaker_client, model_id, cluster_name) interactive_view(get_public_hub_model_compatible_instances(sagemaker_client, model_id))
# Select the instance that is relevant for your model deployment and exists within the selected cluster. instance_type = "ml.g5.8xlarge"
-
Confirm with the cluster admin which namespace you are permitted to use. The admin should have created a hyperpod-inference service account in your namespace.
cluster_namespace = "default"
Configure the S3 bucket name
Configure the S3 bucket names for certificates. This bucket needs to have a folder named "certificates" where certificates will be uploaded. The bucket also needs to be in the same Region defined above.
# Set the S3 bucket name where TLS certificates will be stored for secure model communication certificate_bucket = "<insert bucket name here>"
import yaml from datetime import datetime # Get current time in format suitable for endpoint name current_time = datetime.now().strftime("%Y%m%d-%H%M%S") sagemaker_endpoint_name=f"{model_id}-{current_time}" def generate_jumpstart_model_yaml(model_id, model_version, namespace, instance_type, output_file_path, certificate_bucket): """ Generate a JumpStartModel YAML file with the provided parameters. Args: model_id (str): The model ID model_version (str): The model version namespace (str): The namespace instance_type (str): The instance type output_file_path (str): Path where the YAML file will be saved """ # Create the YAML structure tlsCertificateOutputS3Uri = "s3://" + certificate_bucket + "/certificates/" model_config = { "apiVersion": "inference.sagemaker.aws.amazon.com/v1alpha1", "kind": "JumpStartModel", "metadata": { "name": model_id, "namespace": namespace }, "spec": { "sageMakerEndpoint": { "name": sagemaker_endpoint_name }, "model": { "modelHubName": "SageMakerPublicHub", "modelId": model_id, # modelVersion is optional "modelVersion": model_version # acceptEula is optional, set value to True when using a gated model }, "server": { "instanceType": instance_type }, "tlsConfig": { "tlsCertificateOutputS3Uri": tlsCertificateOutputS3Uri } } } # Write to YAML file with open(output_file_path, 'w') as file: yaml.dump(model_config, file, default_flow_style=False) print(f"YAML file created successfully at: {output_file_path}")
# Import JumpStart utilities to retrieve model specifications and version information from sagemaker.jumpstart import utilsfrom sagemaker.jumpstart.enums import JumpStartScriptScope model_specs = utils.verify_model_region_and_return_specs( model_id, "*", JumpStartScriptScope.INFERENCE, region=region_name) model_version = model_specs.version
# Generate the output filename for the Kubernetes YAML configuration output_file_path=f"jumpstart-model-{model_id}.yaml" generate_jumpstart_model_yaml( model_id=model_id, model_version=model_version, namespace=cluster_namespace, instance_type=instance_type, output_file_path=output_file_path, certificate_bucket=certificate_bucket ) import os os.environ["JUMPSTART_YAML_FILE_PATH"]=output_file_path os.environ["MODEL_ID"]=model_id
Deploy your model
Update your kubernetes configuration and deploy your model
-
Retrieve the EKS cluster name from HyperPod.
!aws sagemaker describe-cluster --cluster-name $cluster_name --query "Orchestrator.Eks.ClusterArn"
-
Configure kubectl to connect to the EKS cluster.
!aws eks update-kubeconfig --name "<insert name of eks cluster from above>" --region $region_name
-
Deploy your JumpStart model.
!kubectl apply -f $JUMPSTART_YAML_FILE_PATH
Monitor the status of your model deployment
-
Ensure that the model is successfully deployed.
!kubectl describe JumpStartModel $model_id -n $cluster_namespace
-
Ensure that the endpoint is successfully created.
!kubectl describe SageMakerEndPointRegistration sagemaker_endpoint_name -n $cluster_namespace
Invoke your model endpoint
You can programmatically retrieve example payloads from the JumpStartModel object.
import boto3 prompt = "{\"inputs\": \"What is AWS SageMaker?\"}}" runtime_client = boto3.client('sagemaker-runtime', region_name=region_name) response = runtime_client.invoke_endpoint( EndpointName=sagemaker_endpoint_name, ContentType="application/json", Body=prompt ) print(response["Body"].read().decode())
Manage your deployment
Clean up resources
Delete your JumpStart model deployment once you no longer need it.
!kubectl delete JumpStartModel $model_id -n $cluster_namespace
Troubleshooting
Use these debugging commands if your deployment isn't working as expected.
-
Check the status of Kubernetes deployment. This command inspects the underlying Kubernetes deployment object that manages the pods running your model. Use this to troubleshoot pod scheduling, resource allocation, and container startup issues.
!kubectl describe deployment $model_id -n $cluster_namespace
-
Check the status of your JumpStart model resource. This command examines the custom JumpStartModel resource that manages the high-level model configuration and deployment lifecycle. Use this to troubleshoot model-specific issues like configuration errors or SageMaker endpoint creation problems.
!kubectl describe JumpStartModel $model_id -n $cluster_namespace
-
Check the status of all Kubernetes objects. This command provides a comprehensive overview of all related Kubernetes resources in your namespace. Use this for a quick health check to see the overall state of pods, services, deployments, and custom resources associated with your model deployment.
!kubectl get pods,svc,deployment,JumpStartModel,sagemakerendpointregistration -n $cluster_namespace