kubectl을 사용하여 JumpStart에서 모델 배포

다음 단계에서는 kubectl을 사용하여 JumpStart 모델을 HyperPod 클러스터에 배포하는 방법을 보여줍니다.

다음 지침에는 Amazon SageMaker Studio 또는 SageMaker 노트북 인스턴스와 같은 Jupyter 노트북 환경에서 실행되도록 설계된 코드 셀과 명령이 포함되어 있습니다. 각 코드 블록은 순차적으로 실행해야 하는 노트북 셀을 나타냅니다. 모델 검색 테이블 및 상태 모니터링 명령을 포함한 대화형 요소는 노트북 인터페이스에 최적화되어 있으며 다른 환경에서 제대로 작동하지 않을 수 있습니다. 계속하기 전에 필요한 AWS 권한이 있는 노트북 환경에 액세스할 수 있는지 확인합니다.

사전 조건

Amazon SageMaker HyperPod 클러스터에서 추론 기능을 설정했는지 확인합니다. 자세한 내용은 모델 배포를 위한 HyperPod 클러스터 설정 단원을 참조하십시오.

설정 및 구성

리전 선택

HyperPod 클러스터를 배포할 리전과 추론 워크로드를 실행할 리전을 선택합니다. 에 다른 사용자 지정을 추가할 수도 있습니다sagemaker_client.


region_name = <REGION>

import boto3
from botocore.config import Config
# Configure retry options
boto3_config = Config(
    retries={
        'max_attempts': 10,  # Maximum number of retry attempts
        'mode': 'adaptive'   # Use adaptive mode for exponential backoff
    }
)

sagemaker_client=boto3.client("sagemaker", region_name=region_name, config=boto3_config)

모델 및 클러스터 선택

모든 SageMaker 퍼블릭 허브 모델 및 HyperPod 클러스터를 봅니다.


interactive_view(get_all_public_hub_model_data(sagemaker_client))

interactive_view(get_all_cluster_data(sagemaker_client))

선택한 모델 ID와 클러스터 이름을 아래 변수로 구성합니다.

참고

클러스터 관리자에게 문의하여이 노트북 실행 역할에 대한 권한이 부여되었는지 확인합니다. 를 실행!aws sts get-caller-identity --query "Arn"하여 사용 중인 실행 역할을 확인할 수 있습니다.


# Change the model_id based on your requirement. A list of model IDs is available in step 1 of this notebook.
# Proprietary models are not supported
model_id = "<insert model id here>"

from sagemaker.hyperpod.inference.notebook_utils import validate_public_hub_model_is_not_proprietary

validate_public_hub_model_is_not_proprietary(sagemaker_client, model_id)


# Select the cluster name where you want to deploy the model. List of clusters is available in step 1 of this notebook.
cluster_name = "<insert cluster name here>"

from sagemaker.hyperpod.inference.notebook_utils import validate_cluster_can_support_public_hub_modelfrom sagemaker.hyperpod.inference.notebook_utils import get_public_hub_model_compatible_instances
validate_cluster_can_support_public_hub_model(sagemaker_client, model_id, cluster_name)
interactive_view(get_public_hub_model_compatible_instances(sagemaker_client, model_id))


# Select the instance that is relevant for your model deployment and exists within the selected cluster.
instance_type = "ml.g5.8xlarge"

사용할 수 있는 네임스페이스를 클러스터 관리자에게 확인합니다. 관리자는 네임스페이스에 하이퍼포드 추론 서비스 계정을 생성했어야 합니다.
```
cluster_namespace = "default"             
```

S3 버킷 이름 구성

인증서의 S3 버킷 이름을 구성합니다. 이 버킷에는 인증서를 업로드할 "인증서"라는 폴더가 있어야 합니다. 또한 버킷은 위에 정의된 동일한 리전에 있어야 합니다.


# Set the S3 bucket name where TLS certificates will be stored for secure model communication
certificate_bucket = "<insert bucket name here>"


import yaml
from datetime import datetime

# Get current time in format suitable for endpoint name
current_time = datetime.now().strftime("%Y%m%d-%H%M%S")
sagemaker_endpoint_name=f"{model_id}-{current_time}"


def generate_jumpstart_model_yaml(model_id, model_version, namespace, instance_type, output_file_path, certificate_bucket):
    """
    Generate a JumpStartModel YAML file with the provided parameters.

    Args:
        model_id (str): The model ID
        model_version (str): The model version
        namespace (str): The namespace
        instance_type (str): The instance type
        output_file_path (str): Path where the YAML file will be saved
    """

    # Create the YAML structure
    tlsCertificateOutputS3Uri = "s3://" + certificate_bucket + "/certificates/"
    model_config = {
        "apiVersion": "inference.sagemaker.aws.amazon.com/v1alpha1",
        "kind": "JumpStartModel",
        "metadata": {
            "name": model_id,
            "namespace": namespace
        },
        "spec": {
            "sageMakerEndpoint": {
                "name": sagemaker_endpoint_name
            },
            "model": {
                "modelHubName": "SageMakerPublicHub",
                "modelId": model_id,
                # modelVersion is optional
                "modelVersion": model_version
                # acceptEula is optional, set value to True when using a gated model
            },
            "server": {
                "instanceType": instance_type
            },
            "tlsConfig": {
                "tlsCertificateOutputS3Uri": tlsCertificateOutputS3Uri
            }
        }
    }

    # Write to YAML file
    with open(output_file_path, 'w') as file:
        yaml.dump(model_config, file, default_flow_style=False)

    print(f"YAML file created successfully at: {output_file_path}")


# Import JumpStart utilities to retrieve model specifications and version information
from sagemaker.jumpstart import utilsfrom sagemaker.jumpstart.enums import JumpStartScriptScope

model_specs = utils.verify_model_region_and_return_specs(
        model_id, "*", JumpStartScriptScope.INFERENCE, region=region_name)     
model_version = model_specs.version


# Generate the output filename for the Kubernetes YAML configuration
output_file_path=f"jumpstart-model-{model_id}.yaml"
generate_jumpstart_model_yaml(
    model_id=model_id,
    model_version=model_version,
    namespace=cluster_namespace,
    instance_type=instance_type,
    output_file_path=output_file_path,
    certificate_bucket=certificate_bucket
)

import os
os.environ["JUMPSTART_YAML_FILE_PATH"]=output_file_path
os.environ["MODEL_ID"]=model_id

모델 배포

kubernetes 구성 업데이트 및 모델 배포

HyperPod에서 EKS 클러스터 이름을 검색합니다.


!aws sagemaker describe-cluster --cluster-name $cluster_name --query "Orchestrator.Eks.ClusterArn"

EKS 클러스터에 연결하도록 kubectl을 구성합니다.


!aws eks update-kubeconfig --name "<insert name of eks cluster from above>" --region $region_name

JumpStart 모델을 배포합니다.


!kubectl apply -f $JUMPSTART_YAML_FILE_PATH

모델 배포 상태 모니터링

모델이 성공적으로 배포되었는지 확인합니다.


!kubectl describe JumpStartModel $model_id -n $cluster_namespace

엔드포인트가 성공적으로 생성되었는지 확인합니다.


!kubectl describe SageMakerEndPointRegistration sagemaker_endpoint_name -n $cluster_namespace

모델 엔드포인트 호출

JumpStartModel 객체에서 예제 페이로드를 프로그래밍 방식으로 검색할 수 있습니다.


import boto3

prompt = "{\"inputs\": \"What is AWS SageMaker?\"}}"

runtime_client = boto3.client('sagemaker-runtime', region_name=region_name)
response = runtime_client.invoke_endpoint(
    EndpointName=sagemaker_endpoint_name,
    ContentType="application/json",
    Body=prompt
)
print(response["Body"].read().decode())

배포 관리

리소스 정리

JumpStart 모델 배포가 더 이상 필요하지 않으면 삭제합니다.


!kubectl delete JumpStartModel $model_id -n $cluster_namespace

문제 해결

배포가 예상대로 작동하지 않는 경우 이러한 디버깅 명령을 사용합니다.

Kubernetes 배포 상태를 확인합니다. 이 명령은 모델을 실행하는 포드를 관리하는 기본 Kubernetes 배포 객체를 검사합니다. 포드 예약, 리소스 할당 및 컨테이너 시작 문제를 해결하는 데 사용합니다.
```
!kubectl describe deployment $model_id -n $cluster_namespace               
```
JumpStart 모델 리소스의 상태를 확인합니다. 이 명령은 상위 수준 모델 구성 및 배포 수명 주기를 관리하는 사용자 지정 JumpStartModel 리소스를 검사합니다. 이를 사용하여 구성 오류 또는 SageMaker 엔드포인트 생성 문제와 같은 모델별 문제를 해결할 수 있습니다.
```
!kubectl describe JumpStartModel $model_id -n $cluster_namespace               
```
모든 Kubernetes 객체의 상태를 확인합니다. 이 명령은 네임스페이스의 모든 관련 Kubernetes 리소스에 대한 포괄적인 개요를 제공합니다. 이를 사용하여 모델 배포와 관련된 포드, 서비스, 배포 및 사용자 지정 리소스의 전체 상태를 빠르게 확인할 수 있습니다.
```
!kubectl get pods,svc,deployment,JumpStartModel,sagemakerendpointregistration -n $cluster_namespace              
```

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

Studio를 사용하여 JumpStart에서 모델 배포

kubectl을 사용하여 Amazon S3 및 Amazon FSx에서 사용자 지정 미세 조정된 모델 배포