Menerapkan model penyesuaian khusus dari Amazon S3 dan Amazon menggunakan kubectl FSx - Amazon SageMaker AI

Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

Menerapkan model penyesuaian khusus dari Amazon S3 dan Amazon menggunakan kubectl FSx

Langkah-langkah berikut menunjukkan cara menerapkan model yang disimpan di Amazon S3 atau Amazon ke klaster FSx SageMaker HyperPod Amazon menggunakan kubectl.

Instruksi berikut berisi sel kode dan perintah yang dirancang untuk berjalan di lingkungan notebook Jupyter, seperti Amazon SageMaker Studio atau Instans SageMaker Notebook. Setiap blok kode mewakili sel notebook yang harus dieksekusi secara berurutan. Elemen interaktif, termasuk tabel penemuan model dan perintah pemantauan status, dioptimalkan untuk antarmuka notebook dan mungkin tidak berfungsi dengan baik di lingkungan lain. Pastikan Anda memiliki akses ke lingkungan notebook dengan AWS izin yang diperlukan sebelum melanjutkan.

Prasyarat

Verifikasi bahwa Anda telah menyiapkan kemampuan inferensi di SageMaker HyperPod kluster Amazon Anda. Untuk informasi selengkapnya, lihat Menyiapkan HyperPod cluster Anda untuk penerapan model.

Penyiapan dan konfigurasi

Ganti semua nilai placeholder dengan pengidentifikasi sumber daya Anda yang sebenarnya.

  1. Inisialisasi nama cluster Anda. Ini mengidentifikasi HyperPod cluster tempat model Anda akan digunakan.

    # Specify your hyperpod cluster name here hyperpod_cluster_name="<Hyperpod_cluster_name>" # NOTE: For sample deployment, we use g5.8xlarge for deepseek-r1 1.5b model which has sufficient memory and GPU instance_type="ml.g5.8xlarge"
  2. Inisialisasi namespace cluster Anda. Admin klaster Anda seharusnya sudah membuat akun layanan hyperpod-inference di namespace Anda.

    cluster_namespace="<namespace>"
  3. Tentukan metode helper untuk membuat file YAMM untuk penerapan

    Fungsi helper berikut menghasilkan file konfigurasi YAMB Kubernetes yang diperlukan untuk menyebarkan model Anda. Fungsi ini membuat struktur YAMB yang berbeda tergantung pada apakah model Anda disimpan di Amazon S3 atau FSx Amazon, menangani konfigurasi khusus penyimpanan secara otomatis. Anda akan menggunakan fungsi ini di bagian berikutnya untuk menghasilkan file penyebaran untuk backend penyimpanan pilihan Anda.

    def generate_inferenceendpointconfig_yaml(deployment_name, model_id, namespace, instance_type, output_file_path, region, tls_certificate_s3_location, model_location, sagemaker_endpoint_name, fsxFileSystemId="", isFsx=False, s3_bucket=None): """ Generate a InferenceEndpointConfig YAML file for S3 storage with the provided parameters. Args: deployment_name (str): The deployment name model_id (str): The model ID namespace (str): The namespace instance_type (str): The instance type output_file_path (str): Path where the YAML file will be saved region (str): Region where bucket exists tls_certificate_s3_location (str): S3 location for TLS certificate model_location (str): Location of the model sagemaker_endpoint_name (str): Name of the SageMaker endpoint fsxFileSystemId (str): FSx filesystem ID (optional) isFsx (bool): Whether to use FSx storage (optional) s3_bucket (str): S3 bucket where model exists (optional, only needed when isFsx is False) """ # Create the YAML structure model_config = { "apiVersion": "inference.sagemaker.aws.amazon.com/v1alpha1", "kind": "InferenceEndpointConfig", "metadata": { "name": deployment_name, "namespace": namespace }, "spec": { "modelName": model_id, "endpointName": sagemaker_endpoint_name, "invocationEndpoint": "invocations", "instanceType": instance_type, "modelSourceConfig": {}, "worker": { "resources": { "limits": { "nvidia.com/gpu": 1, }, "requests": { "nvidia.com/gpu": 1, "cpu": "30000m", "memory": "100Gi" } }, "image": "763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.4.0-tgi2.3.1-gpu-py311-cu124-ubuntu22.04-v2.0", "modelInvocationPort": { "containerPort": 8080, "name": "http" }, "modelVolumeMount": { "name": "model-weights", "mountPath": "/opt/ml/model" }, "environmentVariables": [ { "name": "HF_MODEL_ID", "value": "/opt/ml/model" }, { "name": "SAGEMAKER_PROGRAM", "value": "inference.py", }, { "name": "SAGEMAKER_SUBMIT_DIRECTORY", "value": "/opt/ml/model/code", }, { "name": "MODEL_CACHE_ROOT", "value": "/opt/ml/model" }, { "name": "SAGEMAKER_ENV", "value": "1", } ] }, "tlsConfig": { "tlsCertificateOutputS3Uri": tls_certificate_s3_location, } }, } if (not isFsx): if s3_bucket is None: raise ValueError("s3_bucket is required when isFsx is False") model_config["spec"]["modelSourceConfig"] = { "modelSourceType": "s3", "s3Storage": { "bucketName": s3_bucket, "region": region, }, "modelLocation": model_location } else: model_config["spec"]["modelSourceConfig"] = { "modelSourceType": "fsx", "fsxStorage": { "fileSystemId": fsxFileSystemId, }, "modelLocation": model_location } # Write to YAML file with open(output_file_path, 'w') as file: yaml.dump(model_config, file, default_flow_style=False) print(f"YAML file created successfully at: {output_file_path}")

Terapkan model Anda dari Amazon S3 atau Amazon FSx

Stage the model to Amazon S3
  1. Buat bucket Amazon S3 untuk menyimpan artefak model Anda. Bucket S3 harus berada di Wilayah yang sama dengan HyperPod cluster Anda.

    s3_client = boto3.client('s3', region_name=region_name, config=boto3_config) base_name = "hyperpod-inference-s3-beta" def get_account_id(): sts = boto3.client('sts') return sts.get_caller_identity()["Account"] account_id = get_account_id() s3_bucket = f"{base_name}-{account_id}" try: s3_client.create_bucket( Bucket=s3_bucket, CreateBucketConfiguration={"LocationConstraint": region_name} ) print(f"Bucket '{s3_bucket}' is created successfully.") except botocore.exceptions.ClientError as e: error_code = e.response["Error"]["Code"] if error_code in ("BucketAlreadyExists", "BucketAlreadyOwnedByYou"): print(f"Bucket '{s3_bucket}' already exists. Skipping creation.") else: raise # Re-raise unexpected exceptions
  2. Dapatkan Deployment YAMAL untuk menerapkan model dari data bucket S3.

    # Get current time in format suitable for endpoint name current_time = datetime.now().strftime("%Y%m%d%H%M%S") model_id = "deepseek15b" ## Can be a name of your choice deployment_name = f"{model_id}-{current_time}" model_location = "deepseek15b" ## This is the folder on your s3 file where the model is located sagemaker_endpoint_name=f"{model_id}-{current_time}" output_file_path=f"inferenceendpointconfig-s3-model-{model_id}.yaml" generate_inferenceendpointconfig_yaml( deployment_name=deployment_name, model_id=model_id, model_location=model_location, namespace=cluster_namespace, instance_type=instance_type, output_file_path=output_file_path, sagemaker_endpoint_name=sagemaker_endpoint_name, s3_bucket=s3_bucket, region=region_name, tls_certificate_s3_location=tls_certificate_s3_location ) os.environ["INFERENCE_ENDPOINT_CONFIG_YAML_FILE_PATH"]=output_file_path os.environ["MODEL_ID"]=model_id
Stage the model to Amazon FSx
  1. (Opsional) Buat FSx volume. Langkah ini bersifat opsional karena Anda mungkin sudah memiliki FSx sistem file yang ada dengan VPC, Grup Keamanan, dan Subnet ID yang sama dengan cluster yang ingin Anda gunakan. HyperPod

    # Initialize the subnet ID and Security Group for FSx. These should be the same as that of the HyperPod cluster. SUBNET_ID = "<HyperPod-subnet-id>" SECURITY_GROUP_ID = "<HyperPod-security-group-id>" # Configuration CONFIG = { 'SUBNET_ID': SUBNET_ID, 'SECURITY_GROUP_ID': SECURITY_GROUP_ID, 'STORAGE_CAPACITY': 1200, 'DEPLOYMENT_TYPE': 'PERSISTENT_2', 'THROUGHPUT': 250, 'COMPRESSION_TYPE': 'LZ4', 'LUSTRE_VERSION': '2.15' } JUMPSTART_MODEL_LOCATION_ON_S3 = "s3://jumpstart-cache-prod-us-east-2/deepseek-llm/deepseek-llm-r1-distill-qwen-1-5b/artifacts/inference-prepack/v2.0.0/" # Create FSx client fsx = boto3.client('fsx') # Create FSx for Lustre file system response = fsx.create_file_system( FileSystemType='LUSTRE', FileSystemTypeVersion=CONFIG['LUSTRE_VERSION'], StorageCapacity=CONFIG['STORAGE_CAPACITY'], SubnetIds=[CONFIG['SUBNET_ID']], SecurityGroupIds=[CONFIG['SECURITY_GROUP_ID']], LustreConfiguration={ 'DeploymentType': CONFIG['DEPLOYMENT_TYPE'], 'PerUnitStorageThroughput': CONFIG['THROUGHPUT'], 'DataCompressionType': CONFIG['COMPRESSION_TYPE'], } ) # Get the file system ID file_system_id = response['FileSystem']['FileSystemId'] print(f"Creating FSx filesystem with ID: {file_system_id}") print(f"In subnet: {CONFIG['SUBNET_ID']}") print(f"With security group: {CONFIG['SECURITY_GROUP_ID']}") # Wait for the file system to become available while True: response = fsx.describe_file_systems(FileSystemIds=[file_system_id]) status = response['FileSystems'][0]['Lifecycle'] if status == 'AVAILABLE': break print(f"Waiting for file system to become available... Current status: {status}") time.sleep(30) dns_name = response['FileSystems'][0]['DNSName'] mount_name = response['FileSystems'][0]['LustreConfiguration']['MountName'] # Print the file system details print("\nFile System Details:") print(f"File System ID: {file_system_id}") print(f"DNS Name: {dns_name}") print(f"Mount Name: {mount_name}")
  2. (Opsional) Pasang FSx dan salin data dari S3 ke FSx. Langkah ini opsional karena data model Anda mungkin sudah ada di sistem FSx file. Langkah ini hanya diperlukan jika Anda ingin menyalin data dari S3 ke FSx.

    catatan

    Ganti nilai file_system_id, dns_name dan mount_name dengan FSX ANDA JIKA TIDAK menggunakan fsx dari langkah sebelumnya dan menggunakan FSX Anda sendiri.

    ## NOTE: Replace values of file_system_id, dns_name, and mount_name with your FSx in case you are not using the FSx filesystem from the previous step and using your own FSx filesystem. # file_system_id = response['FileSystems'][0]['FileSystemId'] # dns_name = response['FileSystems'][0]['DNSName'] # mount_name = response['FileSystems'][0]['LustreConfiguration']['MountName'] # print(f"File System ID: {file_system_id}") # print(f"DNS Name: {dns_name}") # print(f"Mount Name: {mount_name}") # FSx file system details mount_point = f'/mnt/fsx_{file_system_id}' # This will create something like /mnt/fsx_20240317_123456 print(f"Creating mount point at: {mount_point}") # Create mount directory if it doesn't exist !sudo mkdir -p {mount_point} # Mount the FSx Lustre file system mount_command = f"sudo mount -t lustre {dns_name}@tcp:/{mount_name} {mount_point}" !{mount_command} # Verify the mount !df -h | grep fsx print(f"File system mounted at {mount_point}") !sudo chmod 777 {mount_point} !aws s3 cp $JUMPSTART_MODEL_LOCATION_ON_S3 $mount_point/deepseek-1-5b --recursive !ls $mount_point !sudo umount {mount_point} !sudo rm -rf {mount_point}
  3. Dapatkan Deployment YAMB untuk menerapkan model dari data. FSx

    # Get current time in format suitable for endpoint name current_time = datetime.now().strftime("%Y%m%d%H%M%S") model_id = "deepseek15b" ## Can be a name of your choice deployment_name = f"{model_id}-{current_time}" model_location = "deepseek-1-5b" ## This is the folder on your s3 file where the model is located sagemaker_endpoint_name=f"{model_id}-{current_time}" output_file_path=f"inferenceendpointconfig-fsx-model-{model_id}.yaml" generate_inferenceendpointconfig_yaml( deployment_name=deployment_name, model_id=model_id, model_location=model_location, namespace=cluster_namespace, instance_type=instance_type, output_file_path=output_file_path, region=region_name, tls_certificate_s3_location=tls_certificate_s3_location, sagemaker_endpoint_name=sagemaker_endpoint_name, fsxFileSystemId=file_system_id, isFsx=True ) os.environ["INFERENCE_ENDPOINT_CONFIG_YAML_FILE_PATH"]=output_file_path os.environ["MODEL_ID"]=model_id
Terapkan model ke cluster Anda
  1. Dapatkan nama cluster Amazon EKS dari ARN HyperPod cluster untuk otentikasi kubectl.

    cluster_arn = !aws sagemaker describe-cluster --cluster-name $hyperpod_cluster_name --query "Orchestrator.Eks.ClusterArn" --region $region_name cluster_name = cluster_arn[0].strip('"').split('/')[-1] print(cluster_name)
  2. Konfigurasikan kubectl untuk melakukan autentikasi dengan klaster Hyperpod EKS menggunakan kredensyal AWS

    !aws eks update-kubeconfig --name $cluster_name --region $region_name
  3. Terapkan InferenceEndpointConfig model Anda.

    !kubectl apply -f $INFERENCE_ENDPOINT_CONFIG_YAML_FILE_PATH

Verifikasi status penerapan Anda

  1. Periksa apakah model berhasil digunakan.

    !kubectl describe InferenceEndpointConfig $deployment_name -n $cluster_namespace

    Perintah tersebut mengembalikan output serupa dengan berikut ini:

    Name:                             deepseek15b-20250624043033
    Reason:                           NativeDeploymentObjectFound
    Status:
      Conditions:
        Last Transition Time:  2025-07-10T18:39:51Z
        Message:               Deployment, ALB Creation or SageMaker endpoint registration creation for model is in progress
        Reason:                InProgress
        Status:                True
        Type:                  DeploymentInProgress
        Last Transition Time:  2025-07-10T18:47:26Z
        Message:               Deployment and SageMaker endpoint registration for model have been created successfully
        Reason:                Success
        Status:                True
        Type:                  DeploymentComplete
  2. Periksa apakah titik akhir berhasil dibuat.

    !kubectl describe SageMakerEndpointRegistration $sagemaker_endpoint_name -n $cluster_namespace

    Perintah tersebut mengembalikan output serupa dengan berikut ini:

    Name:         deepseek15b-20250624043033
    Namespace:    ns-team-a
    Kind:         SageMakerEndpointRegistration
    
    Status:
      Conditions:
        Last Transition Time:  2025-06-24T04:33:42Z
        Message:               Endpoint created.
        Status:                True
        Type:                  EndpointCreated
        State:                 CreationCompleted
  3. Uji titik akhir yang diterapkan untuk memverifikasi itu berfungsi dengan benar. Langkah ini mengonfirmasi bahwa model Anda berhasil diterapkan dan dapat memproses permintaan inferensi.

    import boto3 prompt = "{\"inputs\": \"How tall is Mt Everest?\"}}" runtime_client = boto3.client('sagemaker-runtime', region_name=region_name, config=boto3_config) response = runtime_client.invoke_endpoint( EndpointName=sagemaker_endpoint_name, ContentType="application/json", Body=prompt ) print(response["Body"].read().decode())
    [{"generated_text":"As of the last update in July 2024, Mount Everest stands at a height of **8,850 meters** (29,029 feet) above sea level. The exact elevation can vary slightly due to changes caused by tectonic activity and the melting of ice sheets."}]

Mengelola penyebaran Anda

Setelah selesai menguji penerapan, gunakan perintah berikut untuk membersihkan sumber daya Anda.

catatan

Verifikasi bahwa Anda tidak lagi memerlukan model yang digunakan atau data yang disimpan sebelum melanjutkan.

Bersihkan sumber daya Anda
  1. Hapus penerapan inferensi dan sumber daya Kubernetes terkait. Ini menghentikan kontainer model yang sedang berjalan dan menghapus SageMaker titik akhir.

    !kubectl delete inferenceendpointconfig.inference.sagemaker.aws.amazon.com/$deployment_name
  2. (Opsional) Hapus FSx volume.

    try: # Delete the file system response = fsx.delete_file_system( FileSystemId=file_system_id ) print(f"Deleting FSx filesystem: {file_system_id}") # Optional: Wait for deletion to complete while True: try: response = fsx.describe_file_systems(FileSystemIds=[file_system_id]) status = response['FileSystems'][0]['Lifecycle'] print(f"Current status: {status}") time.sleep(30) except fsx.exceptions.FileSystemNotFound: print("File system deleted successfully") break except Exception as e: print(f"Error deleting file system: {str(e)}")
  3. Verifikasi pembersihan telah berhasil dilakukan.

    # Check that Kubernetes resources are removed kubectl get pods,svc,deployment,InferenceEndpointConfig,sagemakerendpointregistration -n $cluster_namespace # Verify SageMaker endpoint is deleted (should return error or empty) aws sagemaker describe-endpoint --endpoint-name $sagemaker_endpoint_name --region $region_name
Pemecahan Masalah
  1. Periksa status penerapan Kubernetes.

    !kubectl describe deployment $deployment_name -n $cluster_namespace
  2. Periksa InferenceEndpointConfig status untuk melihat status penerapan tingkat tinggi dan masalah konfigurasi apa pun.

    kubectl describe InferenceEndpointConfig $deployment_name -n $cluster_namespace
  3. Periksa status semua objek Kubernetes. Dapatkan tampilan komprehensif dari semua resource Kubernetes terkait di namespace Anda. Ini memberi Anda gambaran singkat tentang apa yang sedang berjalan dan apa yang mungkin hilang.

    !kubectl get pods,svc,deployment,InferenceEndpointConfig,sagemakerendpointregistration -n $cluster_namespace