本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用 kubectl 從 Amazon S3 和 Amazon FSx 部署自訂微調模型
下列步驟說明如何使用 kubectl 將存放在 Amazon S3 或 Amazon FSx 上的模型部署至 Amazon SageMaker HyperPod 叢集。
下列指示包含在 Jupyter 筆記本環境中執行的程式碼儲存格和命令,例如 Amazon SageMaker Studio 或 SageMaker 筆記本執行個體。每個程式碼區塊代表應循序執行的筆記本儲存格。互動式元素,包括模型探索資料表和狀態監控命令,已針對筆記本介面進行最佳化,在其他環境中可能無法正常運作。在繼續之前,請確定您能夠存取具有必要 AWS 許可的筆記本環境。
先決條件
確認您已在 Amazon SageMaker HyperPod 叢集上設定推論功能。如需詳細資訊,請參閱設定 HyperPod 叢集以進行模型部署。
設定和組態
將所有預留位置值取代為您的實際資源識別符。
-
初始化您的叢集名稱。這可識別要部署模型的 HyperPod 叢集。
# Specify your hyperpod cluster name here hyperpod_cluster_name="<Hyperpod_cluster_name>" # NOTE: For sample deployment, we use g5.8xlarge for deepseek-r1 1.5b model which has sufficient memory and GPU instance_type="ml.g5.8xlarge"
-
初始化叢集命名空間。您的叢集管理員應該已在命名空間中建立 Hyperpod-inference 服務帳戶。
cluster_namespace="<namespace>"
-
定義協助程式方法來建立 YAML 檔案以進行部署
下列協助程式函數會產生部署模型所需的 Kubernetes YAML 組態檔案。此函數會根據您的模型是存放在 Amazon S3 還是 Amazon FSx 上,自動處理儲存體特定的組態,來建立不同的 YAML 結構。在接下來的區段中,您將使用此函數來產生所選儲存後端的部署檔案。
def generate_inferenceendpointconfig_yaml(deployment_name, model_id, namespace, instance_type, output_file_path, region, tls_certificate_s3_location, model_location, sagemaker_endpoint_name, fsxFileSystemId="", isFsx=False, s3_bucket=None): """ Generate a InferenceEndpointConfig YAML file for S3 storage with the provided parameters. Args: deployment_name (str): The deployment name model_id (str): The model ID namespace (str): The namespace instance_type (str): The instance type output_file_path (str): Path where the YAML file will be saved region (str): Region where bucket exists tls_certificate_s3_location (str): S3 location for TLS certificate model_location (str): Location of the model sagemaker_endpoint_name (str): Name of the SageMaker endpoint fsxFileSystemId (str): FSx filesystem ID (optional) isFsx (bool): Whether to use FSx storage (optional) s3_bucket (str): S3 bucket where model exists (optional, only needed when isFsx is False) """ # Create the YAML structure model_config = { "apiVersion": "inference.sagemaker.aws.amazon.com/v1alpha1", "kind": "InferenceEndpointConfig", "metadata": { "name": deployment_name, "namespace": namespace }, "spec": { "modelName": model_id, "endpointName": sagemaker_endpoint_name, "invocationEndpoint": "invocations", "instanceType": instance_type, "modelSourceConfig": {}, "worker": { "resources": { "limits": { "nvidia.com/gpu": 1, }, "requests": { "nvidia.com/gpu": 1, "cpu": "30000m", "memory": "100Gi" } }, "image": "763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.4.0-tgi2.3.1-gpu-py311-cu124-ubuntu22.04-v2.0", "modelInvocationPort": { "containerPort": 8080, "name": "http" }, "modelVolumeMount": { "name": "model-weights", "mountPath": "/opt/ml/model" }, "environmentVariables": [ { "name": "HF_MODEL_ID", "value": "/opt/ml/model" }, { "name": "SAGEMAKER_PROGRAM", "value": "inference.py", }, { "name": "SAGEMAKER_SUBMIT_DIRECTORY", "value": "/opt/ml/model/code", }, { "name": "MODEL_CACHE_ROOT", "value": "/opt/ml/model" }, { "name": "SAGEMAKER_ENV", "value": "1", } ] }, "tlsConfig": { "tlsCertificateOutputS3Uri": tls_certificate_s3_location, } }, } if (not isFsx): if s3_bucket is None: raise ValueError("s3_bucket is required when isFsx is False") model_config["spec"]["modelSourceConfig"] = { "modelSourceType": "s3", "s3Storage": { "bucketName": s3_bucket, "region": region, }, "modelLocation": model_location } else: model_config["spec"]["modelSourceConfig"] = { "modelSourceType": "fsx", "fsxStorage": { "fileSystemId": fsxFileSystemId, }, "modelLocation": model_location } # Write to YAML file with open(output_file_path, 'w') as file: yaml.dump(model_config, file, default_flow_style=False) print(f"YAML file created successfully at: {output_file_path}")
從 Amazon S3 或 Amazon FSx 部署您的模型
將模型部署到您的叢集
-
從 HyperPod 叢集 ARN 取得 Amazon EKS 叢集名稱,以進行 kubectl 身分驗證。
cluster_arn = !aws sagemaker describe-cluster --cluster-name $hyperpod_cluster_name --query "Orchestrator.Eks.ClusterArn" --region $region_name cluster_name = cluster_arn[0].strip('"').split('/')[-1] print(cluster_name)
-
設定 kubectl 以使用 AWS 登入資料向 Hyperpod EKS 叢集進行身分驗證
!aws eks update-kubeconfig --name $cluster_name --region $region_name
-
部署您的
InferenceEndpointConfig
模型。!kubectl apply -f $INFERENCE_ENDPOINT_CONFIG_YAML_FILE_PATH
驗證部署的狀態
-
檢查模型是否已成功部署。
!kubectl describe InferenceEndpointConfig $deployment_name -n $cluster_namespace
此命令會傳回類似以下的輸出:
Name: deepseek15b-20250624043033 Reason: NativeDeploymentObjectFound Status: Conditions: Last Transition Time: 2025-07-10T18:39:51Z Message: Deployment, ALB Creation or SageMaker endpoint registration creation for model is in progress Reason: InProgress Status: True Type: DeploymentInProgress Last Transition Time: 2025-07-10T18:47:26Z Message: Deployment and SageMaker endpoint registration for model have been created successfully Reason: Success Status: True Type: DeploymentComplete
-
檢查端點是否已成功建立。
!kubectl describe SageMakerEndpointRegistration $sagemaker_endpoint_name -n $cluster_namespace
此命令會傳回類似以下的輸出:
Name: deepseek15b-20250624043033 Namespace: ns-team-a Kind: SageMakerEndpointRegistration Status: Conditions: Last Transition Time: 2025-06-24T04:33:42Z Message: Endpoint created. Status: True Type: EndpointCreated State: CreationCompleted
-
測試部署的端點以確認其正常運作。此步驟確認您的模型已成功部署,並且可以處理推論請求。
import boto3 prompt = "{\"inputs\": \"How tall is Mt Everest?\"}}" runtime_client = boto3.client('sagemaker-runtime', region_name=region_name, config=boto3_config) response = runtime_client.invoke_endpoint( EndpointName=sagemaker_endpoint_name, ContentType="application/json", Body=prompt ) print(response["Body"].read().decode())
[{"generated_text":"As of the last update in July 2024, Mount Everest stands at a height of **8,850 meters** (29,029 feet) above sea level. The exact elevation can vary slightly due to changes caused by tectonic activity and the melting of ice sheets."}]
管理您的部署
完成部署測試後,請使用下列命令來清理資源。
注意
在繼續之前,請確認您不再需要部署的模型或儲存的資料。
清除您的資源
-
刪除推論部署和相關聯的 Kubernetes 資源。這會停止執行中的模型容器,並移除 SageMaker 端點。
!kubectl delete inferenceendpointconfig.inference.sagemaker.aws.amazon.com/$deployment_name
-
(選用) 刪除 FSx 磁碟區。
try: # Delete the file system response = fsx.delete_file_system( FileSystemId=file_system_id ) print(f"Deleting FSx filesystem: {file_system_id}") # Optional: Wait for deletion to complete while True: try: response = fsx.describe_file_systems(FileSystemIds=[file_system_id]) status = response['FileSystems'][0]['Lifecycle'] print(f"Current status: {status}") time.sleep(30) except fsx.exceptions.FileSystemNotFound: print("File system deleted successfully") break except Exception as e: print(f"Error deleting file system: {str(e)}")
-
確認已成功完成清除。
# Check that Kubernetes resources are removed kubectl get pods,svc,deployment,InferenceEndpointConfig,sagemakerendpointregistration -n $cluster_namespace # Verify SageMaker endpoint is deleted (should return error or empty) aws sagemaker describe-endpoint --endpoint-name $sagemaker_endpoint_name --region $region_name
故障診斷
-
檢查 Kubernetes 部署狀態。
!kubectl describe deployment $deployment_name -n $cluster_namespace
-
檢查 InferenceEndpointConfig 狀態,以查看高階部署狀態和任何組態問題。
kubectl describe InferenceEndpointConfig $deployment_name -n $cluster_namespace
-
檢查所有 Kubernetes 物件的狀態。全面檢視命名空間中所有相關 Kubernetes 資源。這可讓您快速概觀正在執行的項目,以及可能遺漏的項目。
!kubectl get pods,svc,deployment,InferenceEndpointConfig,sagemakerendpointregistration -n $cluster_namespace