デプロイ設定について API を使用してデプロイする SageMaker AI Studio からデプロイする

生成 AI 推論レコメンデーションをデプロイする

レコメンデーションジョブが完了すると、各レコメンデーションにはデプロイ準備完了設定が含まれます。選択した設定を SageMaker AI 推論エンドポイントにデプロイするには、SageMaker AI Studio から 1 つのアクションを使用するか、 API を使用してプログラムで実行します。

デプロイ設定について

ジョブレスポンスの各レコメンデーションには、以下の情報を含む DeploymentConfiguration オブジェクトが含まれています。

ImageUri: 推奨されるインスタンスタイプに最適化されたコンテナイメージ URI。
InstanceType: デプロイに推奨されるインスタンスタイプ。
InstanceCount: パフォーマンス目標を達成するために必要なインスタンスの数。
CopyCountPerInstance: インスタンスごとに実行するモデルコピーの数。値を 1 より大きい値に設定すると、モデルの複数のコピーが各インスタンスにロードされ、スループットが向上します。
EnvironmentVariables: テンソル並列サイズや最大シーケンス長など、最適なパフォーマンスのために設定された環境変数。
S3: 最適化されたモデル出力を含む、モデルアーティファクトの S3 チャネルリファレンス。

API を使用してデプロイする

プログラムでレコメンデーションをデプロイするには、レコメンデーションのモデルパッケージを使用して SageMaker AI モデルとエンドポイントを作成します。各レコメンデーションには、モデルパッケージ ARN と推論仕様名を持つModelDetailsオブジェクトが含まれます。これは、モデルパッケージにコンテナイメージ、環境変数、モデルアーティファクトチャネルがすでに含まれているため、最も簡単なデプロイパスです。



import boto3

client = boto3.client("sagemaker", region_name="us-west-2")

# Get the recommendation from a completed job
response = client.describe_ai_recommendation_job(
    AIRecommendationJobName="my-recommendation-job"
)

# Select a recommendation (e.g., the first one)
recommendation = response["Recommendations"][0]
model_details = recommendation["ModelDetails"]
deploy_config = recommendation["DeploymentConfiguration"]

# Create a model from the model package.
# The model package already contains the container image, environment
# variables, and S3 data channels (base model + optimization artifacts).
model_name = "my-recommended-model"
container_def = {
    "ModelPackageName": model_details["ModelPackageArn"],
}
# If the recommendation uses a named inference specification (e.g., for
# a specific optimization variant), specify it so SageMaker selects the
# correct container and instance configuration from the model package.
if model_details.get("InferenceSpecificationName"):
    container_def["InferenceSpecificationName"] = model_details["InferenceSpecificationName"]

client.create_model(
    ModelName=model_name,
    PrimaryContainer=container_def,
    ExecutionRoleArn="arn:aws:iam::111122223333:role/ExampleRole",
)

# Create an endpoint configuration
endpoint_config_name = "my-recommended-endpoint-config"
production_variant = {
    "VariantName": "AllTraffic",
    "ModelName": model_name,
    "InstanceType": deploy_config["InstanceType"],
    "InitialInstanceCount": deploy_config.get("InstanceCount", 1),
}
copy_count = deploy_config.get("CopyCountPerInstance")
if copy_count and copy_count > 1:
    production_variant["InferenceAmiVersion"] = "al2-ami-sagemaker-inference-gpu-2"
    production_variant["RoutingConfig"] = {"RoutingStrategy": "LEAST_OUTSTANDING_REQUESTS"}

client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[production_variant],
)

# Create the endpoint
endpoint_name = "my-recommended-endpoint"
client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)
print(f"Endpoint {endpoint_name} is being created.")

エンドポイントが作成されたら、ステータスに達するまで DescribeEndpoint API を使用してそのInServiceステータスをモニタリングできます。



import time

while True:
    response = client.describe_endpoint(EndpointName=endpoint_name)
    status = response["EndpointStatus"]
    print(f"Endpoint status: {status}")
    if status in ("InService", "Failed"):
        break
    time.sleep(60)

SageMaker AI Studio からデプロイする

1 つのアクションで SageMaker AI Studio から直接推奨設定をデプロイすることもできます。SageMaker AI Studio で、完了したレコメンデーションジョブに移動し、レコメンデーションとそのパフォーマンスメトリクスを確認し、デプロイする設定を選択します。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

エンドポイントのベンチマーク

セキュリティ