翻訳は機械翻訳により提供されています。提供された翻訳内容と英語版の間で齟齬、不一致または矛盾がある場合、英語版が優先します。

# マルチモデルエンドポイントを作成する
<a name="create-multi-model-endpoint"></a>

SageMaker AI コンソールまたは を使用して AWS SDK for Python (Boto) 、マルチモデルエンドポイントを作成できます。CPU または GPU ベースのエンドポイントをコンソールから作成するには、以下のセクションのコンソールの手順を参照してください。を使用してマルチモデルエンドポイントを作成する場合は AWS SDK for Python (Boto)、次のセクションの CPU または GPU の手順を使用します。CPU と GPU のワークフローは似ていますが、コンテナの要件など、いくつかの違いがあります。

**Topics**
+ [マルチモデルエンドポイントを作成する (コンソール)](#create-multi-model-endpoint-console)
+ [で CPUs を使用してマルチモデルエンドポイントを作成する AWS SDK for Python (Boto3)](#create-multi-model-endpoint-sdk-cpu)
+ [で GPUs を使用してマルチモデルエンドポイントを作成する AWS SDK for Python (Boto3)](#create-multi-model-endpoint-sdk-gpu)

## マルチモデルエンドポイントを作成する (コンソール)
<a name="create-multi-model-endpoint-console"></a>

CPU と GPU ベースのマルチモデルエンドポイントの両方をコンソールから作成できます。以下の手順に従って SageMaker AI コンソールからマルチモデルエンドポイントを作成します。

**マルチモデルエンドポイントを作成するには (コンソール)**

1. Amazon SageMaker AI コンソール ([https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/)) を開きます。

1. [**モデル**] を選択し、[**推論**] グループから [**モデルの作成**] を選択します。

1. [**モデル名**] に名前を入力します。

1. **[IAM ロール]** で、`AmazonSageMakerFullAccess` IAM ポリシーがアタッチされた IAM ロールを選択するか作成します。

1.  **[コンテナの定義]** セクションの、**[モデルアーティファクトと推論イメージオプションの提供]** で **[複数のモデルの使用]** を選択します。  
![\[モデルの作成ページで [複数のモデルを使用する] を選択するセクション。\]](http://docs.aws.amazon.com/ja_jp/sagemaker/latest/dg/images/mme-create-model-ux-2.PNG)

1. **[推論コンテナイメージ]** には、目的のコンテナイメージの Amazon ECR パスを入力します。

   GPU モデルの場合は、NVIDIA Triton Inference Server を基盤とするコンテナを使用する必要があります。GPU ベースのエンドポイントで動作するコンテナイメージのリストについては、「[NVIDIA Triton Inference Containers (SM support only)](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only)」を参照してください。NVIDIA Triton Inference Server の詳細については、「[Use Triton Inference Server with SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/triton.html)」を参照してください。

1. **[モデルの作成]** を選択します。

1. 単一モデルエンドポイントの場合と同様に、マルチモデルエンドポイントをデプロイします。手順については、「[SageMaker AI ホスティングサービスにモデルをデプロイする](ex1-model-deployment.md#ex1-deploy-model)」を参照してください。

## で CPUs を使用してマルチモデルエンドポイントを作成する AWS SDK for Python (Boto3)
<a name="create-multi-model-endpoint-sdk-cpu"></a>

以下のセクションを使用して、CPU インスタンスベースのマルチモデルエンドポイントを作成します。マルチモデルエンドポイントは、モデルが 1 つのエンドポイントを作成する場合と同様に Amazon SageMaker AI の [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model) API、[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config) API、[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint) API を使って作成しますが、相違点が 2 つあります。モデルコンテナを定義するときに、新しい `Mode` パラメータ値 `MultiModel` を渡す必要があります。また、1 つのモデルをデプロイするときは 1 つのモデルアーティファクトへのパスを渡しますが、代わりに、モデルアーティファクトが配置される Amazon S3 のプレフィックスを指定する `ModelDataUrl` フィールドを渡す必要があります。

SageMaker AI を使って複数の XGBoost モデルをエンドポイントにデプロイするサンプルノートブックについては、[マルチモデルエンドポイント (XGBoost) のサンプルノートブック](https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.html)を参照してください。

ここでは、CPU ベースのマルチモデルエンドポイントを作成するためにそのサンプルで使用される主要な手順について概説しています。

**モデルをデプロイするには (AWS SDK for Python (Boto 3))**

1. マルチモデルエンドポイントのデプロイをサポートするイメージを含んだコンテナを取得します。マルチモデルエンドポイントをサポートする組み込みアルゴリズムとフレームワークコンテナのリストについては、「[マルチモデルエンドポイントでサポートされるアルゴリズム、フレームワーク、インスタンス](multi-model-support.md)」を参照してください。この例では、組み込みのアルゴリズムの [K 最近傍 (k-NN) アルゴリズム](k-nearest-neighbors.md) を使います。[SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/v2.html) のユーティリティ関数 `image_uris.retrieve()` を呼び出して、組み込みの K 最近傍アルゴリズムイメージのアドレスを取得します。

   ```
   import sagemaker
   region = sagemaker_session.boto_region_name
   image = sagemaker.image_uris.retrieve("knn",region=region)
   container = { 
                 'Image':        image,
                 'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>',
                 'Mode':         'MultiModel'
               }
   ```

1.  AWS SDK for Python (Boto3) SageMaker AI クライアントを取得し、このコンテナを使用するモデルを作成します。

   ```
   import boto3
   sagemaker_client = boto3.client('sagemaker')
   response = sagemaker_client.create_model(
                 ModelName        = '<MODEL_NAME>',
                 ExecutionRoleArn = role,
                 Containers       = [container])
   ```

1. (オプション) シリアル推論パイプラインを使用している場合、パイプラインに含める追加のコンテナを取得し、`CreateModel` の `Containers` 引数に含めます。

   ```
   preprocessor_container = { 
                  'Image': '<ACCOUNT_ID>.dkr.ecr.<REGION_NAME>.amazonaws.com/<PREPROCESSOR_IMAGE>:<TAG>'
               }
   
   multi_model_container = { 
                 'Image': '<ACCOUNT_ID>.dkr.ecr.<REGION_NAME>.amazonaws.com/<IMAGE>:<TAG>',
                 'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>',
                 'Mode':         'MultiModel'
               }
   
   response = sagemaker_client.create_model(
                 ModelName        = '<MODEL_NAME>',
                 ExecutionRoleArn = role,
                 Containers       = [preprocessor_container, multi_model_container]
               )
   ```
**注記**  
シリアル推論パイプラインで使用できるマルチモデル対応エンドポイントは 1 つだけです。

1. (オプション) モデルのキャッシュによる利点がないユースケースの場合は、`MultiModelConfig` パラメータの `ModelCacheSetting` フィールドの値を `Disabled` に設定し、`create_model` 呼び出しの `Container` 引数に含めます。デフォルトでは、`ModelCacheSetting` フィールドの値は `Enabled` です。

   ```
   container = { 
                   'Image': image, 
                   'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>',
                   'Mode': 'MultiModel' 
                   'MultiModelConfig': {
                           // Default value is 'Enabled'
                           'ModelCacheSetting': 'Disabled'
                   }
              }
   
   response = sagemaker_client.create_model(
                 ModelName        = '<MODEL_NAME>',
                 ExecutionRoleArn = role,
                 Containers       = [container]
               )
   ```

1. モデルのマルチモデルエンドポイントを設定します。少なくとも 2 つのインスタンスでエンドポイントを設定することをお勧めします。これにより、SageMaker AI は複数のアベイラビリティーゾーン全体で可用性の高い予測のセットをモデルに提供できます。

   ```
   response = sagemaker_client.create_endpoint_config(
                   EndpointConfigName = '<ENDPOINT_CONFIG_NAME>',
                   ProductionVariants=[
                        {
                           'InstanceType':        'ml.m4.xlarge',
                           'InitialInstanceCount': 2,
                           'InitialVariantWeight': 1,
                           'ModelName':            '<MODEL_NAME>',
                           'VariantName':          'AllTraffic'
                         }
                   ]
              )
   ```
**注記**  
シリアル推論パイプラインで使用できるマルチモデル対応エンドポイントは 1 つだけです。

1. `EndpointName` および `EndpointConfigName` パラメータを使用してマルチモデルエンドポイントを作成します。

   ```
   response = sagemaker_client.create_endpoint(
                 EndpointName       = '<ENDPOINT_NAME>',
                 EndpointConfigName = '<ENDPOINT_CONFIG_NAME>')
   ```

## で GPUs を使用してマルチモデルエンドポイントを作成する AWS SDK for Python (Boto3)
<a name="create-multi-model-endpoint-sdk-gpu"></a>

次のセクションを使用して、GPU ベースのマルチモデルエンドポイントを作成します。マルチモデルエンドポイントは、モデルが 1 つのエンドポイントを作成する場合と同様に Amazon SageMaker AI の [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model) API、[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config) API、[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint) API を使って作成しますが、いくつかの相違点があります。モデルコンテナを定義するときに、新しい `Mode` パラメータ値 `MultiModel` を渡す必要があります。また、1 つのモデルをデプロイするときは 1 つのモデルアーティファクトへのパスを渡しますが、代わりに、モデルアーティファクトが配置される Amazon S3 のプレフィックスを指定する `ModelDataUrl` フィールドを渡す必要があります。GPU ベースのマルチモデルエンドポイントでは、GPU インスタンスでの実行に最適化された NVIDIA Triton Inference Server のコンテナも使用する必要があります。GPU ベースのエンドポイントで動作するコンテナイメージのリストについては、「[NVIDIA Triton Inference Containers (SM support only)](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only)」を参照してください。

GPU ベースのマルチモデルエンドポイントを作成する方法を示すノートブックの例については、「[Run mulitple deep learning models on GPUs with Amazon SageMaker AI Multi-model endpoints (MME)](https://github.com/aws/amazon-sagemaker-examples/blob/main/multi-model-endpoints/mme-on-gpu/cv/resnet50_mme_with_gpu.ipynb)」を参照してください。

ここでは、GPU ベースのマルチモデルエンドポイントを作成するための主要な手順について概説しています。

**モデルをデプロイするには (AWS SDK for Python (Boto 3))**

1. コンテナイメージを定義します。ResNet モデル用の GPU サポートを備えたマルチモデルエンドポイントを作成するには、[NVIDIA Triton Server イメージ](https://docs.aws.amazon.com/sagemaker/latest/dg/triton.html)を使用するコンテナを定義します。このコンテナはマルチモデルエンドポイントをサポートし、GPU インスタンスでの実行に最適化されています。[SageMaker AI Python SDK](https://sagemaker.readthedocs.io/en/stable/v2.html) のユーティリティ関数 `image_uris.retrieve()` を呼び出して、イメージのアドレスを取得します。例えば、次のようになります。

   ```
   import sagemaker
   region = sagemaker_session.boto_region_name
   
   // Find the sagemaker-tritonserver image at 
   // https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/resnet50/triton_resnet50.ipynb
   // Find available tags at https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only
   
   image = "<ACCOUNT_ID>.dkr.ecr.<REGION_NAME>.amazonaws.com/sagemaker-tritonserver:<TAG>".format(
       account_id=account_id_map[region], region=region
   )
   
   container = { 
                 'Image':        image,
                 'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>',
                 'Mode':         'MultiModel',
                 "Environment": {"SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": "resnet"},
               }
   ```

1.  AWS SDK for Python (Boto3) SageMaker AI クライアントを取得し、このコンテナを使用するモデルを作成します。

   ```
   import boto3
   sagemaker_client = boto3.client('sagemaker')
   response = sagemaker_client.create_model(
                 ModelName        = '<MODEL_NAME>',
                 ExecutionRoleArn = role,
                 Containers       = [container])
   ```

1. (オプション) シリアル推論パイプラインを使用している場合、パイプラインに含める追加のコンテナを取得し、`CreateModel` の `Containers` 引数に含めます。

   ```
   preprocessor_container = { 
                  'Image': '<ACCOUNT_ID>.dkr.ecr.<REGION_NAME>.amazonaws.com/<PREPROCESSOR_IMAGE>:<TAG>'
               }
   
   multi_model_container = { 
                 'Image': '<ACCOUNT_ID>.dkr.ecr.<REGION_NAME>.amazonaws.com/<IMAGE>:<TAG>',
                 'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>',
                 'Mode':         'MultiModel'
               }
   
   response = sagemaker_client.create_model(
                 ModelName        = '<MODEL_NAME>',
                 ExecutionRoleArn = role,
                 Containers       = [preprocessor_container, multi_model_container]
               )
   ```
**注記**  
シリアル推論パイプラインで使用できるマルチモデル対応エンドポイントは 1 つだけです。

1. (オプション) モデルのキャッシュによる利点がないユースケースの場合は、`MultiModelConfig` パラメータの `ModelCacheSetting` フィールドの値を `Disabled` に設定し、`create_model` 呼び出しの `Container` 引数に含めます。デフォルトでは、`ModelCacheSetting` フィールドの値は `Enabled` です。

   ```
   container = { 
                   'Image': image, 
                   'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>',
                   'Mode': 'MultiModel' 
                   'MultiModelConfig': {
                           // Default value is 'Enabled'
                           'ModelCacheSetting': 'Disabled'
                   }
              }
   
   response = sagemaker_client.create_model(
                 ModelName        = '<MODEL_NAME>',
                 ExecutionRoleArn = role,
                 Containers       = [container]
               )
   ```

1. モデルの GPU ベースのマルチモデルエンドポイントを設定します。可用性を高め、キャッシュヒット率を高めるために、エンドポイントに複数のインスタンスを設定することをお勧めします。

   ```
   response = sagemaker_client.create_endpoint_config(
                   EndpointConfigName = '<ENDPOINT_CONFIG_NAME>',
                   ProductionVariants=[
                        {
                           'InstanceType':        'ml.g4dn.4xlarge',
                           'InitialInstanceCount': 2,
                           'InitialVariantWeight': 1,
                           'ModelName':            '<MODEL_NAME>',
                           'VariantName':          'AllTraffic'
                         }
                   ]
              )
   ```

1. `EndpointName` および `EndpointConfigName` パラメータを使用してマルチモデルエンドポイントを作成します。

   ```
   response = sagemaker_client.create_endpoint(
                 EndpointName       = '<ENDPOINT_NAME>',
                 EndpointConfigName = '<ENDPOINT_CONFIG_NAME>')
   ```