Prerequisiti Configurazione e configurazione Implementa il tuo modello da Amazon S3 o Amazon FSx Verifica lo stato della distribuzione Gestisci la tua implementazione

Distribuisci modelli personalizzati e ottimizzati da Amazon S3 e Amazon utilizzando kubectl FSx

I passaggi seguenti mostrano come distribuire modelli archiviati su Amazon S3 o Amazon su un cluster FSx SageMaker HyperPod Amazon utilizzando kubectl.

Le seguenti istruzioni contengono celle di codice e comandi progettati per essere eseguiti in un ambiente notebook Jupyter, come Amazon SageMaker Studio o SageMaker Notebook Instances. Ogni blocco di codice rappresenta una cella del notebook che deve essere eseguita in sequenza. Gli elementi interattivi, tra cui le tabelle di individuazione dei modelli e i comandi di monitoraggio dello stato, sono ottimizzati per l'interfaccia del notebook e potrebbero non funzionare correttamente in altri ambienti. Assicurati di avere accesso a un ambiente notebook con le AWS autorizzazioni necessarie prima di procedere.

Prerequisiti

Verifica di aver configurato le funzionalità di inferenza sui tuoi SageMaker HyperPod cluster Amazon. Per ulteriori informazioni, consulta HyperPod Configurazione dei cluster per la distribuzione dei modelli.

Configurazione e configurazione

Sostituisci tutti i valori segnaposto con i tuoi identificatori di risorsa effettivi.

Inizializza il nome del cluster. Questo identifica il HyperPod cluster in cui verrà distribuito il modello.


# Specify your hyperpod cluster name here
hyperpod_cluster_name="<Hyperpod_cluster_name>"

# NOTE: For sample deployment, we use g5.8xlarge for deepseek-r1 1.5b model which has sufficient memory and GPU
instance_type="ml.g5.8xlarge"

Inizializza lo spazio dei nomi del cluster. L'amministratore del cluster dovrebbe aver già creato un account del servizio di inferenza hyperpod nel tuo spazio dei nomi.
```
cluster_namespace="<namespace>"
```

Definisci il metodo di supporto per creare file YAML per la distribuzione

La seguente funzione di supporto genera i file di configurazione YAML di Kubernetes necessari per distribuire il modello. Questa funzione crea diverse strutture YAML a seconda che il modello sia archiviato su Amazon S3 o FSx Amazon, gestendo automaticamente le configurazioni specifiche dello storage. Utilizzerai questa funzione nelle prossime sezioni per generare i file di distribuzione per il backend di storage scelto.


def generate_inferenceendpointconfig_yaml(deployment_name, model_id, namespace, instance_type, output_file_path, region, tls_certificate_s3_location, model_location, sagemaker_endpoint_name, fsxFileSystemId="", isFsx=False, s3_bucket=None):
    """
    Generate a InferenceEndpointConfig YAML file for S3 storage with the provided parameters.

    Args:
        deployment_name (str): The deployment name
        model_id (str): The model ID
        namespace (str): The namespace
        instance_type (str): The instance type
        output_file_path (str): Path where the YAML file will be saved
        region (str): Region where bucket exists
        tls_certificate_s3_location (str): S3 location for TLS certificate
        model_location (str): Location of the model
        sagemaker_endpoint_name (str): Name of the SageMaker endpoint
        fsxFileSystemId (str): FSx filesystem ID (optional)
        isFsx (bool): Whether to use FSx storage (optional)
        s3_bucket (str): S3 bucket where model exists (optional, only needed when isFsx is False)
    """

    # Create the YAML structure
    model_config = {
        "apiVersion": "inference.sagemaker.aws.amazon.com/v1alpha1",
        "kind": "InferenceEndpointConfig",
        "metadata": {
            "name": deployment_name,
            "namespace": namespace
        },
        "spec": {
            "modelName": model_id,
            "endpointName": sagemaker_endpoint_name,  
            "invocationEndpoint": "invocations",
            "instanceType": instance_type,
            "modelSourceConfig": {},
            "worker": {
                "resources": {
                    "limits": {
                        "nvidia.com/gpu": 1,
                    },
                    "requests": {
                        "nvidia.com/gpu": 1,
                        "cpu": "30000m",
                        "memory": "100Gi"
                    }
                },
                "image": "763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.4.0-tgi2.3.1-gpu-py311-cu124-ubuntu22.04-v2.0",
                "modelInvocationPort": {
                    "containerPort": 8080,
                    "name": "http"
                },
                "modelVolumeMount": {
                    "name": "model-weights",
                    "mountPath": "/opt/ml/model"
                },
                "environmentVariables": [
                    {
                        "name": "HF_MODEL_ID",
                        "value": "/opt/ml/model"
                    },
                    {
                        "name": "SAGEMAKER_PROGRAM",
                        "value": "inference.py",
                    },
                    {
                        "name": "SAGEMAKER_SUBMIT_DIRECTORY",
                        "value": "/opt/ml/model/code",   
                    },
                    {
                        "name": "MODEL_CACHE_ROOT",
                        "value": "/opt/ml/model"
                    },
                    {
                        "name": "SAGEMAKER_ENV",
                        "value": "1",
                    }
                ]
            },
            "tlsConfig": {
                "tlsCertificateOutputS3Uri": tls_certificate_s3_location,
            }
        },
    }

    if (not isFsx):
        if s3_bucket is None:
            raise ValueError("s3_bucket is required when isFsx is False")
        model_config["spec"]["modelSourceConfig"] = {
            "modelSourceType": "s3",
            "s3Storage": {
                "bucketName": s3_bucket,
                "region": region,
            },
            "modelLocation": model_location
        }
    else:
        model_config["spec"]["modelSourceConfig"] = {
            "modelSourceType": "fsx",
            "fsxStorage": {
                "fileSystemId": fsxFileSystemId,
            },
            "modelLocation": model_location
        }
    

    # Write to YAML file
    with open(output_file_path, 'w') as file:
        yaml.dump(model_config, file, default_flow_style=False)

    print(f"YAML file created successfully at: {output_file_path}")

Implementa il tuo modello da Amazon S3 o Amazon FSx

Stage the model to Amazon S3

Crea il bucket Amazon S3 per archiviare gli artefatti del tuo modello. Il bucket S3 deve trovarsi nella stessa regione del cluster. HyperPod


s3_client = boto3.client('s3', region_name=region_name, config=boto3_config)
base_name = "hyperpod-inference-s3-beta"

def get_account_id():
    sts = boto3.client('sts')
    return sts.get_caller_identity()["Account"]

account_id = get_account_id()
s3_bucket = f"{base_name}-{account_id}"

try:
    s3_client.create_bucket(
        Bucket=s3_bucket,
        CreateBucketConfiguration={"LocationConstraint": region_name}
    )
    print(f"Bucket '{s3_bucket}' is created successfully.")
except botocore.exceptions.ClientError as e:
    error_code = e.response["Error"]["Code"]
    if error_code in ("BucketAlreadyExists", "BucketAlreadyOwnedByYou"):
        print(f"Bucket '{s3_bucket}' already exists. Skipping creation.")
    else:
        raise  # Re-raise unexpected exceptions

Scarica Deployment YAML per distribuire il modello dai dati del bucket S3.


# Get current time in format suitable for endpoint name
current_time = datetime.now().strftime("%Y%m%d%H%M%S")
model_id = "deepseek15b" ## Can be a name of your choice
deployment_name = f"{model_id}-{current_time}"
model_location = "deepseek15b" ## This is the folder on your s3 file where the model is located
sagemaker_endpoint_name=f"{model_id}-{current_time}"

output_file_path=f"inferenceendpointconfig-s3-model-{model_id}.yaml"
generate_inferenceendpointconfig_yaml(
    deployment_name=deployment_name,
    model_id=model_id,
    model_location=model_location,
    namespace=cluster_namespace,
    instance_type=instance_type,
    output_file_path=output_file_path,
    sagemaker_endpoint_name=sagemaker_endpoint_name,
    s3_bucket=s3_bucket,
    region=region_name,
    tls_certificate_s3_location=tls_certificate_s3_location
)


os.environ["INFERENCE_ENDPOINT_CONFIG_YAML_FILE_PATH"]=output_file_path
os.environ["MODEL_ID"]=model_id

Stage the model to Amazon FSx

(Facoltativo) Crea un volume. FSx Questo passaggio è facoltativo perché potresti già avere un FSx filesystem esistente con lo stesso VPC, gruppo di sicurezza e ID di sottorete del HyperPod cluster che desideri utilizzare.


# Initialize the subnet ID and Security Group for FSx. These should be the same as that of the HyperPod cluster.
SUBNET_ID = "<HyperPod-subnet-id>"
SECURITY_GROUP_ID = "<HyperPod-security-group-id>"

# Configuration
CONFIG = {
    'SUBNET_ID': SUBNET_ID,
    'SECURITY_GROUP_ID': SECURITY_GROUP_ID,
    'STORAGE_CAPACITY': 1200,
    'DEPLOYMENT_TYPE': 'PERSISTENT_2',
    'THROUGHPUT': 250,
    'COMPRESSION_TYPE': 'LZ4',
    'LUSTRE_VERSION': '2.15'
}

JUMPSTART_MODEL_LOCATION_ON_S3 = "s3://jumpstart-cache-prod-us-east-2/deepseek-llm/deepseek-llm-r1-distill-qwen-1-5b/artifacts/inference-prepack/v2.0.0/"

# Create FSx client
fsx = boto3.client('fsx')

# Create FSx for Lustre file system
response = fsx.create_file_system(
    FileSystemType='LUSTRE',
    FileSystemTypeVersion=CONFIG['LUSTRE_VERSION'],
    StorageCapacity=CONFIG['STORAGE_CAPACITY'],
    SubnetIds=[CONFIG['SUBNET_ID']],
    SecurityGroupIds=[CONFIG['SECURITY_GROUP_ID']],
    LustreConfiguration={
        'DeploymentType': CONFIG['DEPLOYMENT_TYPE'],
        'PerUnitStorageThroughput': CONFIG['THROUGHPUT'],
        'DataCompressionType': CONFIG['COMPRESSION_TYPE'],
    }
)

# Get the file system ID
file_system_id = response['FileSystem']['FileSystemId']

print(f"Creating FSx filesystem with ID: {file_system_id}")
print(f"In subnet: {CONFIG['SUBNET_ID']}")
print(f"With security group: {CONFIG['SECURITY_GROUP_ID']}")

# Wait for the file system to become available
while True:
    response = fsx.describe_file_systems(FileSystemIds=[file_system_id])
    status = response['FileSystems'][0]['Lifecycle']
    if status == 'AVAILABLE':
        break
    print(f"Waiting for file system to become available... Current status: {status}")
    time.sleep(30)

dns_name = response['FileSystems'][0]['DNSName']
mount_name = response['FileSystems'][0]['LustreConfiguration']['MountName']

# Print the file system details
print("\nFile System Details:")
print(f"File System ID: {file_system_id}")
print(f"DNS Name: {dns_name}")
print(f"Mount Name: {mount_name}")

(Facoltativo) Monta FSx e copia i dati da S3 a. FSx Questo passaggio è facoltativo perché i dati del modello potrebbero già esistere nel FSx filesystem. Questo passaggio è necessario solo se si desidera copiare i dati da S3 a. FSx

Nota

Sostituisci i valori di file_system_id, dns_name e mount_name con il tuo FSX NEL CASO in cui non utilizzi il fsx del passaggio precedente e utilizzi il tuo FSX.


## NOTE: Replace values of file_system_id, dns_name, and mount_name with your FSx in case you are not using the FSx filesystem from the previous step and using your own FSx filesystem.

# file_system_id = response['FileSystems'][0]['FileSystemId']
# dns_name = response['FileSystems'][0]['DNSName']
# mount_name = response['FileSystems'][0]['LustreConfiguration']['MountName']
# print(f"File System ID: {file_system_id}")
# print(f"DNS Name: {dns_name}")
# print(f"Mount Name: {mount_name}")



# FSx file system details
mount_point = f'/mnt/fsx_{file_system_id}'  # This will create something like /mnt/fsx_20240317_123456

print(f"Creating mount point at: {mount_point}")

# Create mount directory if it doesn't exist
!sudo mkdir -p {mount_point}

# Mount the FSx Lustre file system
mount_command = f"sudo mount -t lustre {dns_name}@tcp:/{mount_name} {mount_point}"
!{mount_command}

# Verify the mount
!df -h | grep fsx

print(f"File system mounted at {mount_point}")

!sudo chmod 777 {mount_point}

!aws s3 cp $JUMPSTART_MODEL_LOCATION_ON_S3 $mount_point/deepseek-1-5b --recursive

!ls $mount_point

!sudo umount {mount_point}

!sudo rm -rf {mount_point}

Scarica Deployment FSx YAML per distribuire il modello dai dati.


# Get current time in format suitable for endpoint name
current_time = datetime.now().strftime("%Y%m%d%H%M%S")
model_id = "deepseek15b" ## Can be a name of your choice
deployment_name = f"{model_id}-{current_time}"
model_location = "deepseek-1-5b" ## This is the folder on your s3 file where the model is located
sagemaker_endpoint_name=f"{model_id}-{current_time}"

output_file_path=f"inferenceendpointconfig-fsx-model-{model_id}.yaml"
generate_inferenceendpointconfig_yaml(
    deployment_name=deployment_name,
    model_id=model_id,
    model_location=model_location,
    namespace=cluster_namespace,
    instance_type=instance_type,
    output_file_path=output_file_path,
    region=region_name,
    tls_certificate_s3_location=tls_certificate_s3_location,
    sagemaker_endpoint_name=sagemaker_endpoint_name,
    fsxFileSystemId=file_system_id,
    isFsx=True
)


os.environ["INFERENCE_ENDPOINT_CONFIG_YAML_FILE_PATH"]=output_file_path
os.environ["MODEL_ID"]=model_id

Implementa il modello nel tuo cluster

Ottieni il nome del cluster Amazon EKS dall'ARN del HyperPod cluster per l'autenticazione kubectl.


cluster_arn = !aws sagemaker describe-cluster --cluster-name $hyperpod_cluster_name --query "Orchestrator.Eks.ClusterArn" --region $region_name
cluster_name = cluster_arn[0].strip('"').split('/')[-1]
print(cluster_name)

Configura kubectl per l'autenticazione con il cluster Hyperpod EKS utilizzando le credenziali AWS
```
!aws eks update-kubeconfig --name $cluster_name --region $region_name
```

InferenceEndpointConfigImplementa il tuo modello.


!kubectl apply -f $INFERENCE_ENDPOINT_CONFIG_YAML_FILE_PATH

Verifica lo stato della distribuzione

Verifica se il modello è stato distribuito correttamente.


!kubectl describe InferenceEndpointConfig $deployment_name -n $cluster_namespace

Il comando restituisce un output simile al seguente:

Name:                             deepseek15b-20250624043033
Reason:                           NativeDeploymentObjectFound
Status:
  Conditions:
    Last Transition Time:  2025-07-10T18:39:51Z
    Message:               Deployment, ALB Creation or SageMaker endpoint registration creation for model is in progress
    Reason:                InProgress
    Status:                True
    Type:                  DeploymentInProgress
    Last Transition Time:  2025-07-10T18:47:26Z
    Message:               Deployment and SageMaker endpoint registration for model have been created successfully
    Reason:                Success
    Status:                True
    Type:                  DeploymentComplete

Verifica che l'endpoint sia stato creato correttamente.


!kubectl describe SageMakerEndpointRegistration $sagemaker_endpoint_name -n $cluster_namespace

Il comando restituisce un output simile al seguente:

Name:         deepseek15b-20250624043033
Namespace:    ns-team-a
Kind:         SageMakerEndpointRegistration

Status:
  Conditions:
    Last Transition Time:  2025-06-24T04:33:42Z
    Message:               Endpoint created.
    Status:                True
    Type:                  EndpointCreated
    State:                 CreationCompleted

Testa l'endpoint distribuito per verificare che funzioni correttamente. Questo passaggio conferma che il modello è stato distribuito correttamente e può elaborare le richieste di inferenza.


import boto3

prompt = "{\"inputs\": \"How tall is Mt Everest?\"}}"

runtime_client = boto3.client('sagemaker-runtime', region_name=region_name, config=boto3_config)
response = runtime_client.invoke_endpoint(
    EndpointName=sagemaker_endpoint_name,
    ContentType="application/json",
    Body=prompt
)
print(response["Body"].read().decode())

[{"generated_text":"As of the last update in July 2024, Mount Everest stands at a height of **8,850 meters** (29,029 feet) above sea level. The exact elevation can vary slightly due to changes caused by tectonic activity and the melting of ice sheets."}]

Gestisci la tua implementazione

Al termine del test della distribuzione, utilizza i seguenti comandi per ripulire le risorse.

Nota

Verifica di non aver più bisogno del modello distribuito o dei dati memorizzati prima di procedere.

Pulizia delle risorse

Elimina la distribuzione dell'inferenza e le risorse Kubernetes associate. Ciò interrompe l'esecuzione dei contenitori del modello e rimuove l'endpoint. SageMaker
```
!kubectl delete inferenceendpointconfig.inference.sagemaker.aws.amazon.com/$deployment_name
```

(Facoltativo) Eliminare il FSx volume.


try:
    # Delete the file system
    response = fsx.delete_file_system(
        FileSystemId=file_system_id
    )
    
    print(f"Deleting FSx filesystem: {file_system_id}")
    
    # Optional: Wait for deletion to complete
    while True:
        try:
            response = fsx.describe_file_systems(FileSystemIds=[file_system_id])
            status = response['FileSystems'][0]['Lifecycle']
            print(f"Current status: {status}")
            time.sleep(30)
        except fsx.exceptions.FileSystemNotFound:
            print("File system deleted successfully")
            break
            
except Exception as e:
    print(f"Error deleting file system: {str(e)}")

Verifica che la pulizia sia stata eseguita correttamente.


# Check that Kubernetes resources are removed
kubectl get pods,svc,deployment,InferenceEndpointConfig,sagemakerendpointregistration -n $cluster_namespace

# Verify SageMaker endpoint is deleted (should return error or empty)
aws sagemaker describe-endpoint --endpoint-name $sagemaker_endpoint_name --region $region_name

Risoluzione dei problemi

Controlla lo stato della distribuzione di Kubernetes.


!kubectl describe deployment $deployment_name -n $cluster_namespace

Controlla lo InferenceEndpointConfig stato per vedere lo stato di implementazione di alto livello e gli eventuali problemi di configurazione.
```
kubectl describe InferenceEndpointConfig $deployment_name -n $cluster_namespace
```
Controlla lo stato di tutti gli oggetti Kubernetes. Ottieni una visione completa di tutte le risorse Kubernetes correlate nel tuo namespace. Questo ti offre una rapida panoramica di ciò che è in esecuzione e di ciò che potrebbe mancare.
```
!kubectl get pods,svc,deployment,InferenceEndpointConfig,sagemakerendpointregistration -n $cluster_namespace
```

Avvertimento JavaScript è disabilitato o non è disponibile nel tuo browser.

Per usare la documentazione AWS, JavaScript deve essere abilitato. Consulta le pagine della guida del browser per le istruzioni.

Convenzioni dei documenti

Distribuisci modelli usando kubectl JumpStart

Dimensionamento automatico