翻訳は機械翻訳により提供されています。提供された翻訳内容と英語版の間で齟齬、不一致または矛盾がある場合、英語版が優先します。

# 大規模モデル推論用の SageMaker AI エンドポイントパラメータ
<a name="large-model-inference-hosting"></a>

 以下のパラメータをカスタマイズして、SageMaker AI を使用した低レイテンシーの大規模モデル推論 (LMI) を容易にすることができます。
+  **インスタンスの Amazon EBS ボリュームの最大サイズ (`VolumeSizeInGB`)** — モデルのサイズが 30 GB より大きく、ローカルディスクのないインスタンスを使用している場合は、このパラメータをモデルのサイズより少し大きくする必要があります。
+  **ヘルスチェックのタイムアウトクォータ (`ContainerStartupHealthCheckTimeoutInSeconds`)** — コンテナが正しく設定されていて、CloudWatch ログにヘルスチェックのタイムアウトが示されている場合は、コンテナがヘルスチェックに応答するのに十分な時間を確保できるように、このクォータを増やす必要があります。
+  **モデルダウンロードのタイムアウトクォータ (`ModelDataDownloadTimeoutInSeconds`)** — モデルのサイズが 40 GB より大きい場合は、Amazon S3 からインスタンスにモデルをダウンロードするのに十分な時間を確保するために、このクォータを増やす必要があります。

次のコードスニペットは、前述のパラメータをプログラム的に設定する方法を示しています。サンプルの*イタリック体のプレースホルダーテキスト*を独自の情報に置き換えます。

```
import boto3

aws_region = "aws-region"
sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

# The name of the endpoint. The name must be unique within an AWS Region in your AWS account.
endpoint_name = "endpoint-name"

# Create an endpoint config name.
endpoint_config_name = "endpoint-config-name"

# The name of the model that you want to host.
model_name = "the-name-of-your-model"

instance_type = "instance-type"

sagemaker_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name,
            "InstanceType": instance_type, # Specify the compute instance type.
            "InitialInstanceCount": 1, # Number of instances to launch initially.
            "VolumeSizeInGB": 256, # Specify the size of the Amazon EBS volume.
            "ModelDataDownloadTimeoutInSeconds": 1800, # Specify the model download timeout in seconds.
            "ContainerStartupHealthCheckTimeoutInSeconds": 1800, # Specify the health checkup timeout in seconds
        },
    ],
)

sagemaker_client.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name)
```

 `ProductionVariants` のキーの詳細については、「[https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProductionVariant.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProductionVariant.html)」を参照してください。

大規模なモデルで低レイテンシーの推論を実現する方法の例については、aws-samples GitHub リポジトリの「[Generative AI Inference Examples on Amazon SageMaker AI](https://github.com/aws-samples/sagemaker-genai-hosting-examples/tree/main)」を参照してください。