**帮助改进此页面** 

要帮助改进本用户指南，请选择位于每个页面右侧窗格中的**在 GitHub 上编辑此页面**链接。

# 将 AWS Inferentia 实例与 Amazon EKS 用于机器学习
<a name="inferentia-support"></a>

本主题介绍如何创建一个包含运行 [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/) 实例的节点的 Amazon EKS 集群，并（可选地）部署示例应用程序。Amazon EC2 Inf1 实例由 [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/) 芯片提供支持，这些芯片由AWS定制构建，可在云中提供高性能和最低成本的推理。Machine Learning 模型通过 [AWS Neuron](https://aws.amazon.com/machine-learning/neuron/) 部署到容器，而 AWS Neuron 是一个专用的软件开发工具包 (SDK)，由编译器、运行时间和分析工具组成，可用于优化 Inferentia 芯片的 Machine Learning 推理性能。AWSNeuron 支持常用的 Machine Learning 框架，例如 TensorFlow、PyTorch 和 MXNet。

**注意**  
Neuron 设备逻辑 ID 必须是连续的。如果在 `inf1.6xlarge` 或 `inf1.24xlarge` 实例类型（具有多个 Neuron 设备）上调度请求多个 Neuron 设备的容器组（pod），则当 Kubernetes 调度程序选择不连续的设备 ID 时，该容器组（pod）将无法启动。有关更多信息，请参阅 GitHub 上的[设备逻辑 ID 必须是连续的](https://github.com/aws/aws-neuron-sdk/issues/110)。

## 先决条件
<a name="inferentia-prerequisites"></a>
+ 在计算机上安装 `eksctl`。如果未安装，请参阅 `eksctl` 文档中的[安装](https://eksctl.io/installation)。
+ 在计算机上安装 `kubectl`。有关更多信息，请参阅 [设置 `kubectl` 和 `eksctl`](install-kubectl.md)。
+ （可选）在计算机上安装 `python3`。如果未安装，请参阅 [Python 下载](https://www.python.org/downloads/)以查看安装说明。

## 创建集群
<a name="create-cluster-inferentia"></a>

1. 创建包含 Inf1 Amazon EC2 实例节点的集群。您可以将 *inf1.2xlarge* 替换为任何 [Inf1 实例类型](https://aws.amazon.com/ec2/instance-types/inf1/)。`eksctl` 实用程序检测到您在使用 `Inf1` 实例类型启动节点组，并将使用一个 Amazon EKS 优化版加速型 Amazon Linux AMI 来启动您的节点。
**注意**  
您不能将[服务账户的 IAM 角色](iam-roles-for-service-accounts.md)与 TensorFlow Serving 结合使用。

   ```
   eksctl create cluster \
       --name inferentia \
       --region region-code \
       --nodegroup-name ng-inf1 \
       --node-type inf1.2xlarge \
       --nodes 2 \
       --nodes-min 1 \
       --nodes-max 4 \
       --ssh-access \
       --ssh-public-key your-key \
       --with-oidc
   ```
**注意**  
请记住下一个输出行的值。在后面的（可选）步骤中将使用该值。

   ```
   [9]  adding identity "arn:aws:iam::111122223333:role/eksctl-inferentia-nodegroup-ng-in-NodeInstanceRole-FI7HIYS3BS09" to auth ConfigMap
   ```

   在启动包含 `Inf1` 实例的节点组时，`eksctl` 将自动安装 AWS Neuron Kubernetes 设备插件。此插件将 Neuron 设备作为系统资源传播到 Kubernetes 调度程序，以供容器请求。除了默认的 Amazon EKS 节点 IAM policy 之外，还添加了 Amazon S3 只读访问策略，以便下一个步骤中所述的示例应用程序能够从 Amazon S3 加载经过训练的模型。

1. 确保所有容器组（pod）已正常启动。

   ```
   kubectl get pods -n kube-system
   ```

   缩减的输出：

   ```
   NAME                                   READY   STATUS    RESTARTS   AGE
   [...]
   neuron-device-plugin-daemonset-6djhp   1/1     Running   0          5m
   neuron-device-plugin-daemonset-hwjsj   1/1     Running   0          5m
   ```

## （可选）部署 TensorFlow Serving 应用程序映像
<a name="deploy-tensorflow-serving-application"></a>

经过训练的模型必须先编译为 Inferentia 目标，才能部署在 Inferentia 实例上。要继续，您将需要一个在 Amazon S3 中保存的[经 Neuron 优化的 TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/index.html) 模型。如果您还没有 SavedModel，请按照[创建 Neuron 兼容的 Resnet50 模型](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-tf-neuron.html)教程操作，并将生成的 SavedModel 上载到 S3。ResNet-50 是一种常用的 Machine Learning 模型，用于图像识别任务。有关如何编译 Neuron 模型的更多信息，请参阅《AWS Deep Learning AMI 开发人员指南》中的[带有 DLAMI 的 AWS Inferentia 芯片](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia.html)。

示例部署清单管理一个预构建的推理服务容器，用于由 AWS Deep Learning Containers 提供的 TensorFlow。容器内部是 AWS Neuron 运行时间和 TensorFlow 服务应用程序。在 GitHub 上的[可用镜像](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-inference-containers)下，维护着一个针对 Neuron 优化的预构建 Deep Learning Containers 的完整列表。在启动时，DLC 将从 Amazon S3 中提取您的模型，使用保存的模型启动 Neuron TensorFlow Serving，然后等待预测请求。

可以通过更改部署 yaml 中的 `aws.amazon.com/neuron` 资源，来调整分配给您的服务应用程序的 Neuron 设备的数量。请注意，TensorFlow Serving 和 Neuron 运行时间之间通过 GRPC 进行通信，这需要将 `IPC_LOCK` 功能传递给容器。

1. 将 `AmazonS3ReadOnlyAccess` IAM 策略添加到已在[创建集群](#create-cluster-inferentia)的第 1 步中创建的节点实例角色。必须执行此操作，示例应用程序才能从 Amazon S3 加载经过训练的模型。

   ```
   aws iam attach-role-policy \
       --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
       --role-name eksctl-inferentia-nodegroup-ng-in-NodeInstanceRole-FI7HIYS3BS09
   ```

1. 使用以下内容创建名为 `rn50_deployment.yaml` 的文件。更新区域代码和模型路径以匹配您需要的设置。当客户端向 TensorFlow 服务器发出请求时，模型名称用于标识目的。此示例使用模型名称匹配一个示例 ResNet50 客户端脚本，后面的步骤中将使用该脚本发送预测请求。

   ```
   aws ecr list-images --repository-name neuron-rtd --registry-id 790709498068 --region us-west-2
   ```

   ```
   kind: Deployment
   apiVersion: apps/v1
   metadata:
     name: eks-neuron-test
     labels:
       app: eks-neuron-test
       role: master
   spec:
     replicas: 2
     selector:
       matchLabels:
         app: eks-neuron-test
         role: master
     template:
       metadata:
         labels:
           app: eks-neuron-test
           role: master
       spec:
         containers:
           - name: eks-neuron-test
             image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-neuron:1.15.4-neuron-py37-ubuntu18.04
             command:
               - /usr/local/bin/entrypoint.sh
             args:
               - --port=8500
               - --rest_api_port=9000
               - --model_name=resnet50_neuron
               - --model_base_path=s3://${your-bucket-of-models}/resnet50_neuron/
             ports:
               - containerPort: 8500
               - containerPort: 9000
             imagePullPolicy: IfNotPresent
             env:
               - name: AWS_REGION
                 value: "us-east-1"
               - name: S3_USE_HTTPS
                 value: "1"
               - name: S3_VERIFY_SSL
                 value: "0"
               - name: S3_ENDPOINT
                 value: s3.us-east-1.amazonaws.com
               - name: AWS_LOG_LEVEL
                 value: "3"
             resources:
               limits:
                 cpu: 4
                 memory: 4Gi
                 aws.amazon.com/neuron: 1
               requests:
                 cpu: "1"
                 memory: 1Gi
             securityContext:
               capabilities:
                 add:
                   - IPC_LOCK
   ```

1. 部署模型。

   ```
   kubectl apply -f rn50_deployment.yaml
   ```

1. 使用以下内容创建名为 `rn50_service.yaml` 的文件。这将打开 HTTP 和 gRPC 端口以接受预测请求。

   ```
   kind: Service
   apiVersion: v1
   metadata:
     name: eks-neuron-test
     labels:
       app: eks-neuron-test
   spec:
     type: ClusterIP
     ports:
       - name: http-tf-serving
         port: 8500
         targetPort: 8500
       - name: grpc-tf-serving
         port: 9000
         targetPort: 9000
     selector:
       app: eks-neuron-test
       role: master
   ```

1. 为 TensorFlow 模型 Serving 应用程序创建 Kubernetes 服务。

   ```
   kubectl apply -f rn50_service.yaml
   ```

## （可选）根据 TensorFlow Serving 服务进行预测
<a name="make-predictions-against-tensorflow-service"></a>

1. 要在本地进行测试，请将 gRPC 端口转发到 `eks-neuron-test` 服务。

   ```
   kubectl port-forward service/eks-neuron-test 8500:8500 &
   ```

1. 创建一个名为 `tensorflow-model-server-infer.py` 的 Python 脚本，其中包含以下内容。该脚本通过 gRPC（这是一个服务框架）运行推理过程。

   ```
   import numpy as np
      import grpc
      import tensorflow as tf
      from tensorflow.keras.preprocessing import image
      from tensorflow.keras.applications.resnet50 import preprocess_input
      from tensorflow_serving.apis import predict_pb2
      from tensorflow_serving.apis import prediction_service_pb2_grpc
      from tensorflow.keras.applications.resnet50 import decode_predictions
   
      if __name__ == '__main__':
          channel = grpc.insecure_channel('localhost:8500')
          stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
          img_file = tf.keras.utils.get_file(
              "./kitten_small.jpg",
              "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg")
          img = image.load_img(img_file, target_size=(224, 224))
          img_array = preprocess_input(image.img_to_array(img)[None, ...])
          request = predict_pb2.PredictRequest()
          request.model_spec.name = 'resnet50_inf1'
          request.inputs['input'].CopyFrom(
              tf.make_tensor_proto(img_array, shape=img_array.shape))
          result = stub.Predict(request)
          prediction = tf.make_ndarray(result.outputs['output'])
          print(decode_predictions(prediction))
   ```

1. 运行脚本以将预测数据提交给服务。

   ```
   python3 tensorflow-model-server-infer.py
   ```

   示例输出如下。

   ```
   [[(u'n02123045', u'tabby', 0.68817204), (u'n02127052', u'lynx', 0.12701613), (u'n02123159', u'tiger_cat', 0.08736559), (u'n02124075', u'Egyptian_cat', 0.063844085), (u'n02128757', u'snow_leopard', 0.009240591)]]
   ```