Deploy models from JumpStart using kubectl

The following steps show you how to deploy a JumpStart model to a HyperPod cluster using kubectl.

The following instructions contain code cells and commands designed to run in a Jupyter notebook environment, such as Amazon SageMaker Studio or SageMaker Notebook Instances. Each code block represents a notebook cell that should be executed sequentially. The interactive elements, including model discovery tables and status monitoring commands, are optimized for the notebook interface and may not function properly in other environments. Ensure you have access to a notebook environment with the necessary AWS permissions before proceeding.

Prerequisites

Ensure you've set up inference capabilities on your Amazon SageMaker HyperPod clusters. For more information, see Setting up your HyperPod clusters for model deployment.

Setup and configuration

Select your Region

Choose the Region to deploy your HyperPod cluster to and where you want to run your inference workloads. You can also add other customizations to the sagemaker_client.


region_name = <REGION>

import boto3
from botocore.config import Config
# Configure retry options
boto3_config = Config(
    retries={
        'max_attempts': 10,  # Maximum number of retry attempts
        'mode': 'adaptive'   # Use adaptive mode for exponential backoff
    }
)

sagemaker_client=boto3.client("sagemaker", region_name=region_name, config=boto3_config)

Choose your model and cluster

View all SageMaker public hub models and HyperPod clusters.


interactive_view(get_all_public_hub_model_data(sagemaker_client))

interactive_view(get_all_cluster_data(sagemaker_client))

Configure the model ID and cluster name you've selected into the variables below.

Note

Check with your cluster admin to ensure permissions are granted for this notebook execution role. You can run !aws sts get-caller-identity --query "Arn" to check which execution role you are using


# Change the model_id based on your requirement. A list of model IDs is available in step 1 of this notebook.
# Proprietary models are not supported
model_id = "<insert model id here>"

from sagemaker.hyperpod.inference.notebook_utils import validate_public_hub_model_is_not_proprietary

validate_public_hub_model_is_not_proprietary(sagemaker_client, model_id)


# Select the cluster name where you want to deploy the model. List of clusters is available in step 1 of this notebook.
cluster_name = "<insert cluster name here>"

from sagemaker.hyperpod.inference.notebook_utils import validate_cluster_can_support_public_hub_modelfrom sagemaker.hyperpod.inference.notebook_utils import get_public_hub_model_compatible_instances
validate_cluster_can_support_public_hub_model(sagemaker_client, model_id, cluster_name)
interactive_view(get_public_hub_model_compatible_instances(sagemaker_client, model_id))


# Select the instance that is relevant for your model deployment and exists within the selected cluster.
instance_type = "ml.g5.8xlarge"

Confirm with the cluster admin which namespace you are permitted to use. The admin should have created a hyperpod-inference service account in your namespace.
```
cluster_namespace = "default"             
```

Configure the S3 bucket name

Configure the S3 bucket names for certificates. This bucket needs to have a folder named "certificates" where certificates will be uploaded. The bucket also needs to be in the same Region defined above.


# Set the S3 bucket name where TLS certificates will be stored for secure model communication
certificate_bucket = "<insert bucket name here>"


import yaml
from datetime import datetime

# Get current time in format suitable for endpoint name
current_time = datetime.now().strftime("%Y%m%d-%H%M%S")
sagemaker_endpoint_name=f"{model_id}-{current_time}"


def generate_jumpstart_model_yaml(model_id, model_version, namespace, instance_type, output_file_path, certificate_bucket):
    """
    Generate a JumpStartModel YAML file with the provided parameters.

    Args:
        model_id (str): The model ID
        model_version (str): The model version
        namespace (str): The namespace
        instance_type (str): The instance type
        output_file_path (str): Path where the YAML file will be saved
    """

    # Create the YAML structure
    tlsCertificateOutputS3Uri = "s3://" + certificate_bucket + "/certificates/"
    model_config = {
        "apiVersion": "inference.sagemaker.aws.amazon.com/v1alpha1",
        "kind": "JumpStartModel",
        "metadata": {
            "name": model_id,
            "namespace": namespace
        },
        "spec": {
            "sageMakerEndpoint": {
                "name": sagemaker_endpoint_name
            },
            "model": {
                "modelHubName": "SageMakerPublicHub",
                "modelId": model_id,
                # modelVersion is optional
                "modelVersion": model_version
                # acceptEula is optional, set value to True when using a gated model
            },
            "server": {
                "instanceType": instance_type
            },
            "tlsConfig": {
                "tlsCertificateOutputS3Uri": tlsCertificateOutputS3Uri
            }
        }
    }

    # Write to YAML file
    with open(output_file_path, 'w') as file:
        yaml.dump(model_config, file, default_flow_style=False)

    print(f"YAML file created successfully at: {output_file_path}")


# Import JumpStart utilities to retrieve model specifications and version information
from sagemaker.jumpstart import utilsfrom sagemaker.jumpstart.enums import JumpStartScriptScope

model_specs = utils.verify_model_region_and_return_specs(
        model_id, "*", JumpStartScriptScope.INFERENCE, region=region_name)     
model_version = model_specs.version


# Generate the output filename for the Kubernetes YAML configuration
output_file_path=f"jumpstart-model-{model_id}.yaml"
generate_jumpstart_model_yaml(
    model_id=model_id,
    model_version=model_version,
    namespace=cluster_namespace,
    instance_type=instance_type,
    output_file_path=output_file_path,
    certificate_bucket=certificate_bucket
)

import os
os.environ["JUMPSTART_YAML_FILE_PATH"]=output_file_path
os.environ["MODEL_ID"]=model_id

Deploy your model

Update your kubernetes configuration and deploy your model

Retrieve the EKS cluster name from HyperPod.


!aws sagemaker describe-cluster --cluster-name $cluster_name --query "Orchestrator.Eks.ClusterArn"

Configure kubectl to connect to the EKS cluster.


!aws eks update-kubeconfig --name "<insert name of eks cluster from above>" --region $region_name

Deploy your JumpStart model.


!kubectl apply -f $JUMPSTART_YAML_FILE_PATH

Monitor the status of your model deployment

Ensure that the model is successfully deployed.


!kubectl describe JumpStartModel $model_id -n $cluster_namespace

Ensure that the endpoint is successfully created.


!kubectl describe SageMakerEndPointRegistration sagemaker_endpoint_name -n $cluster_namespace

Invoke your model endpoint

You can programmatically retrieve example payloads from the JumpStartModel object.


import boto3

prompt = "{\"inputs\": \"What is AWS SageMaker?\"}}"

runtime_client = boto3.client('sagemaker-runtime', region_name=region_name)
response = runtime_client.invoke_endpoint(
    EndpointName=sagemaker_endpoint_name,
    ContentType="application/json",
    Body=prompt
)
print(response["Body"].read().decode())

Manage your deployment

Clean up resources

Delete your JumpStart model deployment once you no longer need it.


!kubectl delete JumpStartModel $model_id -n $cluster_namespace

Troubleshooting

Use these debugging commands if your deployment isn't working as expected.

Check the status of Kubernetes deployment. This command inspects the underlying Kubernetes deployment object that manages the pods running your model. Use this to troubleshoot pod scheduling, resource allocation, and container startup issues.
```
!kubectl describe deployment $model_id -n $cluster_namespace               
```
Check the status of your JumpStart model resource. This command examines the custom JumpStartModel resource that manages the high-level model configuration and deployment lifecycle. Use this to troubleshoot model-specific issues like configuration errors or SageMaker endpoint creation problems.
```
!kubectl describe JumpStartModel $model_id -n $cluster_namespace               
```
Check the status of all Kubernetes objects. This command provides a comprehensive overview of all related Kubernetes resources in your namespace. Use this for a quick health check to see the overall state of pods, services, deployments, and custom resources associated with your model deployment.
```
!kubectl get pods,svc,deployment,JumpStartModel,sagemakerendpointregistration -n $cluster_namespace              
```

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Deploy models from JumpStart using Studio

Deploy custom fine-tuned models from Amazon S3 and Amazon FSx using kubectl