

# Custom Docker containers with SageMaker AI
<a name="docker-containers-adapt-your-own"></a>

You can adapt an existing Docker image to work with SageMaker AI. You may need to use an existing, external Docker image with SageMaker AI when you have a container that satisfies feature or safety requirements that are not currently supported by a pre-built SageMaker AI image. There are two toolkits that allow you to bring your own container and adapt it to work with SageMaker AI:
+ [SageMaker Training Toolkit](https://github.com/aws/sagemaker-training-toolkit) – Use this toolkit for training models with SageMaker AI.
+ [SageMaker AI Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) – Use this toolkit for deploying models with SageMaker AI.

The following topics show how to adapt your existing image using the SageMaker Training and Inference toolkits:

**Topics**
+ [Individual Framework Libraries](#docker-containers-adapt-your-own-frameworks)
+ [SageMaker Training and Inference Toolkits](amazon-sagemaker-toolkits.md)
+ [Adapting your own training container](adapt-training-container.md)
+ [Adapt your own inference container for Amazon SageMaker AI](adapt-inference-container.md)

## Individual Framework Libraries
<a name="docker-containers-adapt-your-own-frameworks"></a>

In addition to the SageMaker Training Toolkit and SageMaker AI Inference Toolkit, SageMaker AI also provides toolkits specialized for TensorFlow, MXNet, PyTorch, and Chainer. The following table provides links to the GitHub repositories that contain the source code for each framework and their respective serving toolkits. The instructions linked are for using the Python SDK to run training algorithms and host models on SageMaker AI. The functionality for these individual libraries is included in the SageMaker AI Training Toolkit and SageMaker AI Inference Toolkit.


| Framework | Toolkit Source Code | 
| --- | --- | 
| TensorFlow |  [SageMaker AI TensorFlow Training](https://github.com/aws/sagemaker-tensorflow-training-toolkit) [SageMaker AI TensorFlow Serving](https://github.com/aws/sagemaker-tensorflow-serving-container)  | 
| MXNet |  [SageMaker AI MXNet Training](https://github.com/aws/sagemaker-mxnet-training-toolkit) [SageMaker AI MXNet Inference](https://github.com/aws/sagemaker-mxnet-inference-toolkit)  | 
| PyTorch |  [SageMaker AI PyTorch Training](https://github.com/aws/sagemaker-pytorch-training-toolkit) [SageMaker AI PyTorch Inference](https://github.com/aws/sagemaker-pytorch-inference-toolkit)  | 
| Chainer |  [SageMaker AI Chainer SageMaker AI Containers](https://github.com/aws/sagemaker-chainer-container)  | 

# SageMaker Training and Inference Toolkits
<a name="amazon-sagemaker-toolkits"></a>

The [SageMaker Training](https://github.com/aws/sagemaker-training-toolkit) and [SageMaker AI Inference](https://github.com/aws/sagemaker-inference-toolkit) toolkits implement the functionality that you need to adapt your containers to run scripts, train algorithms, and deploy models on SageMaker AI. When installed, the library defines the following for users:
+ The locations for storing code and other resources. 
+ The entry point that contains the code to run when the container is started. Your Dockerfile must copy the code that needs to be run into the location expected by a container that is compatible with SageMaker AI. 
+ Other information that a container needs to manage deployments for training and inference. 

## SageMaker AI Toolkits Containers Structure
<a name="sagemaker-toolkits-structure"></a>

When SageMaker AI trains a model, it creates the following file folder structure in the container's `/opt/ml` directory.

```
/opt/ml
├── input
│   ├── config
│   │   ├── hyperparameters.json
│   │   └── resourceConfig.json
│   └── data
│       └── <channel_name>
│           └── <input data>
├── model
│
├── code
│
├── output
│
└── failure
```

When you run a model *training* job, the SageMaker AI container uses the `/opt/ml/input/` directory, which contains the JSON files that configure the hyperparameters for the algorithm and the network layout used for distributed training. The `/opt/ml/input/` directory also contains files that specify the channels through which SageMaker AI accesses the data, which is stored in Amazon Simple Storage Service (Amazon S3). The SageMaker AI containers library places the scripts that the container will run in the `/opt/ml/code/` directory. Your script should write the model generated by your algorithm to the `/opt/ml/model/` directory. For more information, see [Containers with custom training algorithms](your-algorithms-training-algo.md).

When you *host* a trained model on SageMaker AI to make inferences, you deploy the model to an HTTP endpoint. The model makes real-time predictions in response to inference requests. The container must contain a serving stack to process these requests.

In a hosting or batch transform container, the model files are located in the same folder to which they were written during training.

```
/opt/ml/model
│
└── <model files>
```

For more information, see [Containers with custom inference code](your-algorithms-inference-main.md).

## Single Versus Multiple Containers
<a name="sagemaker-toolkits-separate-images"></a>

You can either provide separate Docker images for the training algorithm and inference code or you can use a single Docker image for both. When creating Docker images for use with SageMaker AI, consider the following:
+ Providing two Docker images can increase storage requirements and cost because common libraries might be duplicated.
+ In general, smaller containers start faster for both training and hosting. Models train faster and the hosting service can react to increases in traffic by automatically scaling more quickly.
+ You might be able to write an inference container that is significantly smaller than the training container. This is especially common when you use GPUs for training, but your inference code is optimized for CPUs.
+ SageMaker AI requires that Docker containers run without privileged access.
+ Both Docker containers that you build and those provided by SageMaker AI can send messages to the `Stdout` and `Stderr` files. SageMaker AI sends these messages to Amazon CloudWatch logs in your AWS account.

For more information about how to create SageMaker AI containers and how scripts are executed inside them, see the [SageMaker AI Training Toolkit](https://github.com/aws/sagemaker-training-toolkit) and [SageMaker AI Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) repositories on GitHub. They also provide lists of important environmental variables and the environmental variables provided by SageMaker AI containers.

# Adapting your own training container
<a name="adapt-training-container"></a>

To run your own training model, build a Docker container using the [Amazon SageMaker Training Toolkit](https://github.com/aws/sagemaker-training-toolkit) through an Amazon SageMaker notebook instance.

## Step 1: Create a SageMaker notebook instance
<a name="byoc-training-step1"></a>

1. Open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. In the left navigation pane, choose **Notebook**, choose **Notebook instances**, and then choose **Create notebook instance**. 

1. On the **Create notebook instance** page, provide the following information: 

   1. For **Notebook instance name**, enter **RunScriptNotebookInstance**.

   1. For **Notebook Instance type**, choose **ml.t2.medium**.

   1. In the **Permissions and encryption** section, do the following:

      1. For **IAM role**, choose **Create a new role**. This opens a new window.

      1. On the **Create an IAM role** page, choose **Specific S3 buckets**, specify an Amazon S3 bucket named **sagemaker-run-script**, and then choose **Create role**.

         SageMaker AI creates an IAM role named `AmazonSageMaker-ExecutionRole-YYYYMMDDTHHmmSS`. For example, `AmazonSageMaker-ExecutionRole-20190429T110788`. Note that the execution role naming convention uses the date and time at which the role was created, separated by a `T`.

   1. For **Root Access**, choose **Enable**.

   1. Choose **Create notebook instance**. 

1. On the **Notebook instances** page, the **Status** is **Pending**. It can take a few minutes for Amazon SageMaker AI to launch a machine learning compute instance—in this case, it launches a notebook instance—and attach an ML storage volume to it. The notebook instance has a preconfigured Jupyter notebook server and a set of Anaconda libraries. For more information, see [  CreateNotebookInstance](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateNotebookInstance.html). 

   

1. Click on the **Name** of the notebook you just created. This opens a new page.

1.  In the **Permissions and encryption** section, copy **the IAM role ARN number**, and paste it into a notepad file to save it temporarily. You use this IAM role ARN number later to configure a local training estimator in the notebook instance. **The IAM role ARN number** looks like the following: `'arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190429T110788'` 

1. After the status of the notebook instance changes to **InService**, choose **Open JupyterLab**.

## Step 2: Create and upload the Dockerfile and Python training scripts
<a name="byoc-training-step2"></a>

1. After JupyterLab opens, create a new folder in the home directory of your JupyterLab. In the upper-left corner, choose the **New Folder** icon, and then enter the folder name `docker_test_folder`. 

1. Create a `Dockerfile` text file in the `docker_test_folder` directory. 

   1. Choose the **New Launcher** icon (\$1) in the upper-left corner. 

   1. In the right pane under the **Other** section, choose **Text File**.

   1. Paste the following `Dockerfile` sample code into your text file. 

      ```
      #Download an open source TensorFlow Docker image
      FROM tensorflow/tensorflow:latest-gpu-jupyter
      
      # Install sagemaker-training toolkit that contains the common functionality necessary to create a container compatible with SageMaker AI and the Python SDK.
      RUN pip3 install sagemaker-training
      
      # Copies the training code inside the container
      COPY train.py /opt/ml/code/train.py
      
      # Defines train.py as script entrypoint
      ENV SAGEMAKER_PROGRAM train.py
      ```

      The Dockerfile script performs the following tasks:
      + `FROM tensorflow/tensorflow:latest-gpu-jupyter` – Downloads the latest TensorFlow Docker base image. You can replace this with any Docker base image you want to bring to build containers, as well as with AWS pre-built container base images.
      + `RUN pip install sagemaker-training` – Installs [SageMaker AI Training Toolkit](https://github.com/aws/sagemaker-training-toolkit) that contains the common functionality necessary to create a container compatible with SageMaker AI. 
      + `COPY train.py /opt/ml/code/train.py` – Copies the script to the location inside the container that is expected by SageMaker AI. The script must be located in this folder.
      + `ENV SAGEMAKER_PROGRAM train.py` – Takes your training script `train.py` as the entrypoint script copied in the `/opt/ml/code` folder of the container. This is the only environmental variable that you must specify when you build your own container.

   1.  On the left directory navigation pane, the text file name might automatically be named `untitled.txt`. To rename the file, right-click the file, choose **Rename**, rename the file as `Dockerfile` without the `.txt` extension, and then press `Ctrl+s` or `Command+s` to save the file.

1. Upload a training script `train.py` to the `docker_test_folder`. You can use the following example script to create a model that reads handwritten digits trained on the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database) for this exercise.

   ```
   import tensorflow as tf
   import os
   
   mnist = tf.keras.datasets.mnist
   
   (x_train, y_train), (x_test, y_test) = mnist.load_data()
   x_train, x_test = x_train / 255.0, x_test / 255.0
   
   model = tf.keras.models.Sequential([
   tf.keras.layers.Flatten(input_shape=(28, 28)),
   tf.keras.layers.Dense(128, activation='relu'),
   tf.keras.layers.Dropout(0.2),
   tf.keras.layers.Dense(10, activation='softmax')
   ])
   
   model.compile(optimizer='adam',
   loss='sparse_categorical_crossentropy',
   metrics=['accuracy'])
   
   model.fit(x_train, y_train, epochs=1)
   model_save_dir = f"{os.environ.get('SM_MODEL_DIR')}/1"
   
   model.evaluate(x_test, y_test)
   tf.saved_model.save(model, model_save_dir)
   ```

## Step 3: Build the container
<a name="byoc-training-step3"></a>

1. In the JupyterLab home directory, open a Jupyter notebook. To open a new notebook, choose the **New Launch** icon and then choose the latest version of **conda\$1tensorflow2** in the **Notebook** section.

1. Run the following command in the first notebook cell to change to the `docker_test_folder` directory:

   ```
   cd ~/SageMaker/docker_test_folder
   ```

   This returns your current directory as follows:

   ```
   ! pwd
   ```

   `output: /home/ec2-user/SageMaker/docker_test_folder`

1. To build the Docker container, run the following Docker build command, including the space followed by a period at the end:

   ```
   ! docker build -t tf-custom-container-test .
   ```

   The Docker build command must be run from the Docker directory you created, in this case `docker_test_folder`.
**Note**  
If you get the following error message that Docker cannot find the Dockerfile, make sure the Dockerfile has the correct name and has been saved to the directory.  

   ```
   unable to prepare context: unable to evaluate symlinks in Dockerfile path: 
   lstat /home/ec2-user/SageMaker/docker/Dockerfile: no such file or directory
   ```
Remember that `docker` looks for a file specifically called `Dockerfile` without any extension within the current directory. If you named it something else, you can pass in the file name manually with the `-f` flag. For example, if you named your Dockerfile as `Dockerfile-text.txt`, run the following command:  

   ```
   ! docker build -t tf-custom-container-test -f Dockerfile-text.txt .
   ```

## Step 4: Test the container
<a name="byoc-training-step4"></a>

1. To test the container locally in the notebook instance, open a Jupyter notebook. Choose **New Launcher** and choose the latest version of **conda\$1tensorflow2** in the **Notebook** section. 

1. Paste the following example script into the notebook code cell to configure a SageMaker AI Estimator.

   ```
   import sagemaker
   from sagemaker.estimator import Estimator
   
   estimator = Estimator(image_uri='tf-custom-container-test',
                         role=sagemaker.get_execution_role(),
                         instance_count=1,
                         instance_type='local')
   
   estimator.fit()
   ```

   In the preceding code example, `sagemaker.get_execution_role()` is specified to the `role` argument to automatically retrieve the role set up for the SageMaker AI session. You can also replace it with the string value of **the IAM role ARN number** you used when you configured the notebook instance. The ARN should look like the following: `'arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190429T110788'`. 

1. Run the code cell. This test outputs the training environment configuration, the values used for the environmental variables, the source of the data, and the loss and accuracy obtained during training.

## Step 5: Push the container to Amazon Elastic Container Registry (Amazon ECR)
<a name="byoc-training-step5"></a>

1. After you successfully run the local mode test, you can push the Docker container to [Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) and use it to run training jobs. If you want to use a private Docker registry instead of Amazon ECR, see [Push your training container to a private registry](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-adapt-your-own-private-registry.html).

   Run the following command lines in a notebook cell.

   ```
   %%sh
   
   # Specify an algorithm name
   algorithm_name=tf-custom-container-test
   
   account=$(aws sts get-caller-identity --query Account --output text)
   
   # Get the region defined in the current configuration (default to us-west-2 if none defined)
   region=$(aws configure get region)
   region=${region:-us-west-2}
   
   fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"
   
   # If the repository doesn't exist in ECR, create it.
   
   aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1
   if [ $? -ne 0 ]
   then
   aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
   fi
   
   # Get the login command from ECR and execute it directly
   
   aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}
   
   # Build the docker image locally with the image name and then push it to ECR
   # with the full name.
   
   docker build -t ${algorithm_name} .
   docker tag ${algorithm_name} ${fullname}
   
   docker push ${fullname}
   ```
**Note**  
This bash shell script may raise a permission issue similar to the following error message:  

   ```
   "denied: User: [ARN] is not authorized to perform: ecr:InitiateLayerUpload on resource:
   arn:aws:ecr:us-east-1:[id]:repository/tf-custom-container-test"
   ```
If this error occurs, you need to attach the **AmazonEC2ContainerRegistryFullAccess** policy to your IAM role. Go to the [IAM console](https://console.aws.amazon.com/iam/home), choose **Roles** from the left navigation pane, look up the IAMrole you used for the Notebook instance. Under the **Permission** tab, choose the **Attach policies** button, and search the **AmazonEC2ContainerRegistryFullAccess** policy. Mark the check box of the policy, and choose **Add permissions** to finish.

1. Run the following code in a Studio notebook cell to call the Amazon ECR image of your training container.

   ```
   import boto3
   
   account_id = boto3.client('sts').get_caller_identity().get('Account')
   ecr_repository = 'tf-custom-container-test'
   tag = ':latest'
   
   region = boto3.session.Session().region_name
   
   uri_suffix = 'amazonaws.com'
   if region in ['cn-north-1', 'cn-northwest-1']:
       uri_suffix = 'amazonaws.com.cn'
   
   byoc_image_uri = '{}.dkr.ecr.{}.{}/{}'.format(account_id, region, uri_suffix, ecr_repository + tag)
   
   byoc_image_uri
   # This should return something like
   # 111122223333.dkr.ecr.us-east-2.amazonaws.com/sagemaker-byoc-test:latest
   ```

1. Use the `ecr_image` retrieved from the previous step to configure a SageMaker AI estimator object. The following code sample configures a SageMaker AI estimator with the `byoc_image_uri` and initiates a training job on an Amazon EC2 instance.

------
#### [ SageMaker Python SDK v1 ]

   ```
   import sagemaker
   from sagemaker import get_execution_role
   from sagemaker.estimator import Estimator
   
   estimator = Estimator(image_uri=byoc_image_uri,
                         role=get_execution_role(),
                         base_job_name='tf-custom-container-test-job',
                         instance_count=1,
                         instance_type='ml.g4dn.xlarge')
   
   #train your model
   estimator.fit()
   ```

------
#### [ SageMaker Python SDK v2 ]

   ```
   import sagemaker
   from sagemaker import get_execution_role
   from sagemaker.estimator import Estimator
   
   estimator = Estimator(image_uri=byoc_image_uri,
                         role=get_execution_role(),
                         base_job_name='tf-custom-container-test-job',
                         instance_count=1,
                         instance_type='ml.g4dn.xlarge')
   
   #train your model
   estimator.fit()
   ```

------

1. If you want to deploy your model using your own container, refer to [Adapting Your Own Inference Container](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html). You can also use an AWSframework container that can deploy a TensorFlow model. To deploy the example model to read handwritten digits, enter the following example script into the same notebook that you used to train your model in the previous sub-step to obtain the image URIs (universal resource identifiers) needed for deployment, and deploy the model.

   ```
   import boto3
   import sagemaker
   
   #obtain image uris
   from sagemaker import image_uris
   container = image_uris.retrieve(framework='tensorflow',region='us-west-2',version='2.11.0',
                       image_scope='inference',instance_type='ml.g4dn.xlarge')
   
   #create the model entity, endpoint configuration and endpoint
   predictor = estimator.deploy(1,instance_type='ml.g4dn.xlarge',image_uri=container)
   ```

   Test your model using a sample handwritten digit from the MNIST dataset using the following code example.

   ```
   #Retrieve an example test dataset to test
   import numpy as np
   import matplotlib.pyplot as plt
   from keras.datasets import mnist
   
   # Load the MNIST dataset and split it into training and testing sets
   (x_train, y_train), (x_test, y_test) = mnist.load_data()
   # Select a random example from the training set
   example_index = np.random.randint(0, x_train.shape[0])
   example_image = x_train[example_index]
   example_label = y_train[example_index]
   
   # Print the label and show the image
   print(f"Label: {example_label}")
   plt.imshow(example_image, cmap='gray')
   plt.show()
   ```

   Convert the test handwritten digit into a form that TensorFlow can ingest and make a test prediction.

   ```
   from sagemaker.serializers import JSONSerializer
   data = {"instances": example_image.tolist()}
   predictor.serializer=JSONSerializer() #update the predictor to use the JSONSerializer
   predictor.predict(data) #make the prediction
   ```

For a full example that shows how to test a custom container locally and push it to an Amazon ECR image, see the [ Building Your Own TensorFlow Container](https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/tensorflow_bring_your_own/tensorflow_bring_your_own.html) example notebook.

**Tip**  
To profile and debug training jobs to monitor system utilization issues (such as CPU bottlenecks and GPU underutilization) and identify training issues (such as overfitting, overtraining, exploding tensors, and vanishing gradients), use Amazon SageMaker Debugger. For more information, see [Use Debugger with custom training containers](debugger-bring-your-own-container.md).

## Step 6: Clean up resources
<a name="byoc-training-step6"></a>

**To clean up resources when done with the get started example**

1. Open the [SageMaker AI console](https://console.aws.amazon.com/sagemaker/), choose the notebook instance **RunScriptNotebookInstance**, choose **Actions**, and choose **Stop**. It can take a few minutes for the instance to stop. 

1. After the instance **Status** changes to **Stopped**, choose **Actions**, choose **Delete**, and then choose **Delete** in the dialog box. It can take a few minutes for the instance to be deleted. The notebook instance disappears from the table when it has been deleted. 

1. Open the [Amazon S3 console](https://console.aws.amazon.com/s3/) and delete the bucket that you created for storing model artifacts and the training dataset. 

1. Open the [IAM console](https://console.aws.amazon.com/iam/) and delete the IAM role. If you created permission policies, you can delete them, too. 
**Note**  
 The Docker container shuts down automatically after it has run. You don't need to delete it.

## Blogs and Case Studies
<a name="byoc-blogs-and-examples"></a>

The following blogs discuss case studies about using custom training containers in Amazon SageMaker AI.
+ [Why bring your own container to Amazon SageMaker AI and how to do it right](https://medium.com/@pandey.vikesh/why-bring-your-own-container-to-amazon-sagemaker-and-how-to-do-it-right-bc158fe41ed1), *Medium* (January 20th, 2023)

# Adapt your training job to access images in a private Docker registry
<a name="docker-containers-adapt-your-own-private-registry"></a>

You can use a private [Docker registry](https://docs.docker.com/registry/) instead of an Amazon Elastic Container Registry (Amazon ECR) to host your images for SageMaker AI Training. The following instructions show you how to create a Docker registry, configure your virtual private cloud (VPC) and training job, store images, and give SageMaker AI access to the training image in the private docker registry. These instructions also show you how to use a Docker registry that requires authentication for a SageMaker training job.

## Create and store your images in a private Docker registry
<a name="docker-containers-adapt-your-own-private-registry-prerequisites"></a>

Create a private Docker registry to store your images. Your registry must:
+ use the [Docker Registry HTTP API](https://docs.docker.com/registry/spec/api/) protocol
+ be accessible from the same VPC specified in the [VpcConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#API_CreateTrainingJob_RequestSyntax) parameter in the `CreateTrainingJob` API. Input `VpcConfig` when you create your training job.
+ secured with a [TLS certificate](https://aws.amazon.com/what-is/ssl-certificate/) from a known public certificate authority.

For more information about creating a Docker registry, see [Deploy a registry server](https://docs.docker.com/registry/deploying/).

## Configure your VPC and SageMaker training job
<a name="docker-containers-adapt-your-own-private-registry-configure"></a>

SageMaker AI uses a network connection within your VPC to access images in your Docker registry. To use the images in your Docker registry for training, the registry must be accessible from an Amazon VPC in your account. For more information, see [Use a Docker registry that requires authentication for training](docker-containers-adapt-your-own-private-registry-authentication.md).

You must also configure your training job to connect to the same VPC to which your Docker registry has access. For more information, see [Configure a Training Job for Amazon VPC Access](https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html#train-vpc-configure).

## Create a training job using an image from your private Docker registry
<a name="docker-containers-adapt-your-own-private-registry-create"></a>

To use an image from your private Docker registry for training, use the following guide to configure your image, configure and create a training job. The code examples that follow use the AWS SDK for Python (Boto3) client.

1. Create a training image configuration object and input `Vpc` the `TrainingRepositoryAccessMode` field as follows.

   ```
   training_image_config = {
       'TrainingRepositoryAccessMode': 'Vpc'
   }
   ```
**Note**  
If your private Docker registry requires authentication, you must add a `TrainingRepositoryAuthConfig` object to the training image configuration object. You must also specify the Amazon Resource Name (ARN) of an AWS Lambda function that provides access credentials to SageMaker AI using the `TrainingRepositoryCredentialsProviderArn` field of the `TrainingRepositoryAuthConfig` object. For more information, see the example code structure below.  

   ```
   training_image_config = {
      'TrainingRepositoryAccessMode': 'Vpc',
      'TrainingRepositoryAuthConfig': {
           'TrainingRepositoryCredentialsProviderArn': 'arn:aws:lambda:Region:Acct:function:FunctionName'
      }
   }
   ```

   For information about how to create the Lambda function to provide authentication, see [Use a Docker registry that requires authentication for training](docker-containers-adapt-your-own-private-registry-authentication.md).

1. Use a Boto3 client to create a training job and pass the correct configuration to the [create\$1training\$1job](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API. The following instructions show you how to configure the components and create a training job.

   1. Create the `AlgorithmSpecification` object that you want to pass to `create_training_job`. Use the training image configuration object that you created in the previous step, as shown in the following code example.

      ```
      algorithm_specification = {
         'TrainingImage': 'myteam.myorg.com/docker-local/my-training-image:<IMAGE-TAG>',
         'TrainingImageConfig': training_image_config,
         'TrainingInputMode': 'File'
      }
      ```
**Note**  
To use a fixed, rather than an updated version of an image, refer to the image’s [digest](https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier) instead of by name or tag.

   1. Specify the name of the training job and role that you want to pass to `create_training_job`, as shown in the following code example. 

      ```
      training_job_name = 'private-registry-job'
      execution_role_arn = 'arn:aws:iam::123456789012:role/SageMakerExecutionRole'
      ```

   1. Specify a security group and subnet for the VPC configuration for your training job. Your private Docker registry must allow inbound traffic from the security groups that you specify, as shown in the following code example.

      ```
      vpc_config = {
          'SecurityGroupIds': ['sg-0123456789abcdef0'],
          'Subnets': ['subnet-0123456789abcdef0','subnet-0123456789abcdef1']
      }
      ```
**Note**  
If your subnet is not in the same VPC as your private Docker registry, you must set up a networking connection between the two VPCs. SeeConnect VPCs using [VPC peering](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-peering.html) for more information.

   1. Specify the resource configuration, including machine learning compute instances and storage volumes to use for training, as shown in the following code example. 

      ```
      resource_config = {
          'InstanceType': 'ml.m4.xlarge',
          'InstanceCount': 1,
          'VolumeSizeInGB': 10,
      }
      ```

   1. Specify the input and output data configuration, where the training dataset is stored, and where you want to store model artifacts, as shown in the following code example.

      ```
      input_data_config = [
          {
              "ChannelName": "training",
              "DataSource":
              {
                  "S3DataSource":
                  {
                      "S3DataDistributionType": "FullyReplicated",
                      "S3DataType": "S3Prefix",
                      "S3Uri": "s3://your-training-data-bucket/training-data-folder"
                  }
              }
          }
      ]
      
      output_data_config = {
          'S3OutputPath': 's3://your-output-data-bucket/model-folder'
      }
      ```

   1. Specify the maximum number of seconds that a model training job can run as shown in the following code example.

      ```
      stopping_condition = {
          'MaxRuntimeInSeconds': 1800
      }
      ```

   1. Finally, create the training job using the parameters you specified in the previous steps as shown in the following code example.

      ```
      import boto3
      sm = boto3.client('sagemaker')
      try:
          resp = sm.create_training_job(
              TrainingJobName=training_job_name,
              AlgorithmSpecification=algorithm_specification,
              RoleArn=execution_role_arn,
              InputDataConfig=input_data_config,
              OutputDataConfig=output_data_config,
              ResourceConfig=resource_config,
              VpcConfig=vpc_config,
              StoppingCondition=stopping_condition
          )
      except Exception as e:
          print(f'error calling CreateTrainingJob operation: {e}')
      else:
          print(resp)
      ```

# Use a SageMaker AI estimator to run a training job
<a name="docker-containers-adapt-your-own-private-registry-estimator"></a>

You can also use an [estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) from the SageMaker Python SDK to handle the configuration and running of your SageMaker training job. The following code examples show how to configure and run an estimator using images from a private Docker registry.

1. Import the required libraries and dependencies, as shown in the following code example.

   ```
   import boto3
   import sagemaker
   from sagemaker.estimator import Estimator
   
   session = sagemaker.Session()
   
   role = sagemaker.get_execution_role()
   ```

1. Provide a Uniform Resource Identifier (URI) to your training image, security groups and subnets for the VPC configuration for your training job, as shown in the following code example.

   ```
   image_uri = "myteam.myorg.com/docker-local/my-training-image:<IMAGE-TAG>"
   
   security_groups = ["sg-0123456789abcdef0"]
   subnets = ["subnet-0123456789abcdef0", "subnet-0123456789abcdef0"]
   ```

   For more information about `security_group_ids` and `subnets`, see the appropriate parameter description in the [Estimators](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) section of the SageMaker Python SDK.
**Note**  
SageMaker AI uses a network connection within your VPC to access images in your Docker registry. To use the images in your Docker registry for training, the registry must be accessible from an Amazon VPC in your account.

1. Optionally, if your Docker registry requires authentication, you must also specify the Amazon Resource Name (ARN) of an AWS Lambda function that provides access credentials to SageMaker AI. The following code example shows how to specify the ARN. 

   ```
   training_repository_credentials_provider_arn = "arn:aws:lambda:us-west-2:1234567890:function:test"
   ```

   For more information about using images in a Docker registry requiring authentication, see **Use a Docker registry that requires authentication for training** below.

1. Use the code examples from the previous steps to configure an estimator, as shown in the following code example.

   ```
   # The training repository access mode must be 'Vpc' for private docker registry jobs 
   training_repository_access_mode = "Vpc"
   
   # Specify the instance type, instance count you want to use
   instance_type="ml.m5.xlarge"
   instance_count=1
   
   # Specify the maximum number of seconds that a model training job can run
   max_run_time = 1800
   
   # Specify the output path for the model artifacts
   output_path = "s3://your-output-bucket/your-output-path"
   
   estimator = Estimator(
       image_uri=image_uri,
       role=role,
       subnets=subnets,
       security_group_ids=security_groups,
       training_repository_access_mode=training_repository_access_mode,
       training_repository_credentials_provider_arn=training_repository_credentials_provider_arn,  # remove this line if auth is not needed
       instance_type=instance_type,
       instance_count=instance_count,
       output_path=output_path,
       max_run=max_run_time
   )
   ```

1. Start your training job by calling `estimator.fit` with your job name and input path as parameters, as shown in the following code example.

   ```
   input_path = "s3://your-input-bucket/your-input-path"
   job_name = "your-job-name"
   
   estimator.fit(
       inputs=input_path,
       job_name=job_name
   )
   ```

# Use a Docker registry that requires authentication for training
<a name="docker-containers-adapt-your-own-private-registry-authentication"></a>

If your Docker registry requires authentication, you must create an AWS Lambda function that provides access credentials to SageMaker AI. Then, create a training job and provide the ARN of this Lambda function inside the [create\$1training\$1job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_training_job) API. Lastly, you can optionally create an interface VPC endpoint so that your VPC can communicate with your Lambda function without sending traffic over the internet. The following guide shows how to create a Lambda function, assign it the correct role and create an interface VPC endpoint.

## Create the Lambda function
<a name="docker-containers-adapt-your-own-private-registry-authentication-create-lambda"></a>

Create an AWS Lambda function that passes access credentials to SageMaker AI and returns a response. The following code example creates the Lambda function handler, as follows.

```
def handler(event, context):
   response = {
      "Credentials": {"Username": "username", "Password": "password"}
   }
   return response
```

The type of authentication used to set up your private Docker registry determines the contents of the response returned by your Lambda function as follows.
+ If your private Docker registry uses basic authentication, the Lambda function will return the username and password needed in order to authenticate to the registry.
+ If your private Docker registry uses [bearer token authentication](https://docs.docker.com/registry/spec/auth/token/), the username and password are sent to your authorization server, which then returns a bearer token. This token is then used to authenticate to your private Docker registry.

**Note**  
If you have more than one Lambda functions for your registries in the same account, and the execution role is the same for your training jobs, then training jobs for registry one would have access to the Lambda functions for other registries.

## Grant the correct role permission to your Lambda function
<a name="docker-containers-adapt-your-own-private-registry-authentication-lambda-role"></a>

The [IAMrole](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) that you use in the `create_training_job` API must have permission to call an AWS Lambda function. The following code example shows how to extend permissions policy of an IAM role to call `myLambdaFunction`.

```
{
    "Effect": "Allow",
    "Action": [
        "lambda:InvokeFunction"
    ],
    "Resource": [
        "arn:aws:lambda:*:*:function:*myLambdaFunction*"
    ]
}
```

For information about editing a role permissions policy, see [Modifying a role permissions policy (console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy) in the *AWS Identity and Access Management User Guide*.

**Note**  
An IAM role with an attached **AmazonSageMakerFullAccess** managed policy has permission to call any Lambda function with "SageMaker AI" in its name.

## Create an interface VPC endpoint for Lambda
<a name="docker-containers-adapt-your-own-private-registry-authentication-lambda-endpoint"></a>

If you create an interface endpoint, your Amazon VPC can communicate with your Lambda function without sending traffic over the internet. For more information, see [Configuring interface VPC endpoints for Lambda](https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc-endpoints.html) in the *AWS Lambda Developer Guide*.

After your interface endpoint is created, SageMaker training will call your Lambda function by sending a request through your VPC to `lambda.region.amazonaws.com`. If you select **Enable DNS Name** when you create your interface endpoint, [Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Welcome.html) routes the call to the Lambda interface endpoint. If you use a different DNS provider, you must map `lambda.region.amazonaws.co`m, to your Lambda interface endpoint.

# Adapt your own inference container for Amazon SageMaker AI
<a name="adapt-inference-container"></a>

If you can't use any of the images listed in [Pre-built SageMaker AI Docker images](docker-containers-prebuilt.md) Amazon SageMaker AI for your use case, you can build your own Docker container and use it inside SageMaker AI for training and inference. To be compatible with SageMaker AI, your container must have the following characteristics:
+ Your container must have a web server listing on port `8080`.
+ Your container must accept `POST` requests to the `/invocations` and `/ping` real-time endpoints. The requests that you send to these endpoints must be returned with 60 seconds for regular responses and 8 minutes for streaming responses, and have a maximum size of 25 MB.

For more information and an example of how to build your own Docker container for training and inference with SageMaker AI, see [Building your own algorithm container](https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb). 

The following guide shows you how to use a `JupyterLab` space with Amazon SageMaker Studio Classic to adapt an inference container to work with SageMaker AI hosting. The example uses an NGINX web server, Gunicorn as a Python web server gateway interface, and Flask as a web application framework. You can use different applications to adapt your container as long as it meets the previous listed requirements. For more information about using your own inference code, see [Custom Inference Code with Hosting Services](your-algorithms-inference-code.md).

**Adapt your inference container**

Use the following steps to adapt your own inference container to work with SageMaker AI hosting. The example shown in the following steps uses a pre-trained [Named Entity Recognition (NER) model](https://spacy.io/universe/project/video-spacys-ner-model-alt) that uses the [spaCy](https://spacy.io/) natural language processing (NLP) library for `Python` and the following:
+ A Dockerfile to build the container that contains the NER model.
+ Inference scripts to serve the NER model.

If you adapt this example for your use case, you must use a Dockerfile and inference scripts that are needed to deploy and serve your model.

1. Create JupyterLab space with Amazon SageMaker Studio Classic (optional).

   You can use any notebook to run scripts to adapt your inference container with SageMaker AI hosting. This example shows you how to use a JupyterLab space within Amazon SageMaker Studio Classic to launch a JupyterLab application that comes with a SageMaker AI Distribution image. For more information, see [SageMaker JupyterLab](studio-updated-jl.md).

1. Upload a Docker file and inference scripts.

   1. Create a new folder in your home directory. If you’re using JupyterLab, in the upper-left corner, choose the **New Folder** icon, and enter a folder name to contain your Dockerfile. In this example, the folder is called `docker_test_folder`.

   1. Upload a Dockerfile text file into your new folder. The following is an example Dockerfile that creates a Docker container with a pre-trained [Named Entity Recognition (NER) model](https://spacy.io/universe/project/video-spacys-ner-model) from [spaCy](https://spacy.io/), the applications and environment variables needed to run the example:

      ```
      FROM python:3.8
      
      RUN apt-get -y update && apt-get install -y --no-install-recommends \
               wget \
               python3 \
               nginx \
               ca-certificates \
          && rm -rf /var/lib/apt/lists/*
      
      RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && \
          pip install flask gevent gunicorn && \
              rm -rf /root/.cache
      
      #pre-trained model package installation
      RUN pip install spacy
      RUN python -m spacy download en
      
      
      # Set environment variables
      ENV PYTHONUNBUFFERED=TRUE
      ENV PYTHONDONTWRITEBYTECODE=TRUE
      ENV PATH="/opt/program:${PATH}"
      
      COPY NER /opt/program
      WORKDIR /opt/program
      ```

      In the previous code example, the environment variable `PYTHONUNBUFFERED` keeps Python from buffering the standard output stream, which allows for faster delivery of logs to the user. The environment variable `PYTHONDONTWRITEBYTECODE` keeps Python from writing compiled bytecode `.pyc` files, which are unnecessary for this use case. The environment variable `PATH` is used to identify the location of the `train` and `serve` programs when the container is invoked.

   1. Create a new directory inside your new folder to contain scripts to serve your model. This example uses a directory called `NER`, which contains the following scripts necessary to run this example:
      + `predictor.py` – A Python script that contains the logic to load and perform inference with your model.
      + `nginx.conf` – A script to configure a web server.
      + `serve` – A script that starts an inference server.
      + `wsgi.py` – A helper script to serve a model.
**Important**  
If you copy your inference scripts into a notebook ending in `.ipynb`and rename them, your script may contain formatting characters that will prevent your endpoint from deploying. Instead, create a text file and rename them.

   1. Upload a script to make your model available for inference. The following is an example script called `predictor.py` that uses Flask to provide the `/ping` and `/invocations` endpoints:

      ```
      from flask import Flask
      import flask
      import spacy
      import os
      import json
      import logging
      
      #Load in model
      nlp = spacy.load('en_core_web_sm') 
      #If you plan to use a your own model artifacts, 
      #your model artifacts should be stored in /opt/ml/model/ 
      
      
      # The flask app for serving predictions
      app = Flask(__name__)
      @app.route('/ping', methods=['GET'])
      def ping():
          # Check if the classifier was loaded correctly
          health = nlp is not None
          status = 200 if health else 404
          return flask.Response(response= '\n', status=status, mimetype='application/json')
      
      
      @app.route('/invocations', methods=['POST'])
      def transformation():
          
          #Process input
          input_json = flask.request.get_json()
          resp = input_json['input']
          
          #NER
          doc = nlp(resp)
          entities = [(X.text, X.label_) for X in doc.ents]
      
          # Transform predictions to JSON
          result = {
              'output': entities
              }
      
          resultjson = json.dumps(result)
          return flask.Response(response=resultjson, status=200, mimetype='application/json')
      ```

      The `/ping` endpoint in the previous script example returns a status code of `200` if the model is loaded correctly, and `404` if the model is loaded incorrectly. The `/invocations` endpoint processes a request formatted in JSON, extracts the input field, and uses the NER model to identify and store entities in the variable entities. The Flask application returns the response that contains these entities. For more information about these required health requests, see [How Your Container Should Respond to Health Check (Ping) Requests](your-algorithms-inference-code.md#your-algorithms-inference-algo-ping-requests).

   1. Upload a script to start an inference server. The following script example calls `serve` using Gunicorn as an application server, and Nginx as a web server:

      ```
      #!/usr/bin/env python
      
      # This file implements the scoring service shell. You don't necessarily need to modify it for various
      # algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until
      # gunicorn exits.
      #
      # The flask server is specified to be the app object in wsgi.py
      #
      # We set the following parameters:
      #
      # Parameter                Environment Variable              Default Value
      # ---------                --------------------              -------------
      # number of workers        MODEL_SERVER_WORKERS              the number of CPU cores
      # timeout                  MODEL_SERVER_TIMEOUT              60 seconds
      
      import multiprocessing
      import os
      import signal
      import subprocess
      import sys
      
      cpu_count = multiprocessing.cpu_count()
      
      model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)
      model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS', cpu_count))
      
      def sigterm_handler(nginx_pid, gunicorn_pid):
          try:
              os.kill(nginx_pid, signal.SIGQUIT)
          except OSError:
              pass
          try:
              os.kill(gunicorn_pid, signal.SIGTERM)
          except OSError:
              pass
      
          sys.exit(0)
      
      def start_server():
          print('Starting the inference server with {} workers.'.format(model_server_workers))
      
      
          # link the log streams to stdout/err so they will be logged to the container logs
          subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
          subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])
      
          nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
          gunicorn = subprocess.Popen(['gunicorn',
                                       '--timeout', str(model_server_timeout),
                                       '-k', 'sync',
                                       '-b', 'unix:/tmp/gunicorn.sock',
                                       '-w', str(model_server_workers),
                                       'wsgi:app'])
      
          signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))
      
          # Exit the inference server upon exit of either subprocess
          pids = set([nginx.pid, gunicorn.pid])
          while True:
              pid, _ = os.wait()
              if pid in pids:
                  break
      
          sigterm_handler(nginx.pid, gunicorn.pid)
          print('Inference server exiting')
      
      # The main routine to invoke the start function.
      
      if __name__ == '__main__':
          start_server()
      ```

      The previous script example defines a signal handler function `sigterm_handler`, which shuts down the Nginx and Gunicorn sub-processes when it receives a `SIGTERM` signal. A `start_server` function starts the signal handler, starts and monitors the Nginx and Gunicorn sub-processes, and captures log streams.

   1. Upload a script to configure your web server. The following script example called `nginx.conf`, configures a Nginx web server using Gunicorn as an application server to serve your model for inference:

      ```
      worker_processes 1;
      daemon off; # Prevent forking
      
      
      pid /tmp/nginx.pid;
      error_log /var/log/nginx/error.log;
      
      events {
        # defaults
      }
      
      http {
        include /etc/nginx/mime.types;
        default_type application/octet-stream;
        access_log /var/log/nginx/access.log combined;
        
        upstream gunicorn {
          server unix:/tmp/gunicorn.sock;
        }
      
        server {
          listen 8080 deferred;
          client_max_body_size 5m;
      
          keepalive_timeout 5;
          proxy_read_timeout 1200s;
      
          location ~ ^/(ping|invocations) {
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $http_host;
            proxy_redirect off;
            proxy_pass http://gunicorn;
          }
      
          location / {
            return 404 "{}";
          }
        }
      }
      ```

      The previous script example configures Nginx to run in the foreground, sets the location to capture the `error_log`, and defines `upstream` as the Gunicorn server’s socket sock. The server configures the server block to listen on port `8080`, sets limits on client request body size and timeout values. The server block, forwards requests containing either `/ping` or `/invocations` paths to the Gunicorn `server http://gunicorn`, and returns a `404` error for other paths.

   1. Upload any other scripts needed to serve your model. This example needs the following example script called `wsgi.py` to help Gunicorn find your application:

      ```
      import predictor as myapp
      
      # This is just a simple wrapper for gunicorn to find your app.
      # If you want to change the algorithm file, simply change "predictor" above to the
      # new file.
      
      app = myapp.app
      ```

   From the folder `docker_test_folder`, your directory structure should contain a Dockerfile and the folder NER. The NER folder should contain the files `nginx.conf`, `predictor.py`, `serve`, and `wsgi.py` as follows:

    ![\[The Dockerfile structure has inference scripts under the NER directory next to the Dockerfile.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/docker-file-struct-adapt-ex.png) 

1. Build your own container.

   From the folder `docker_test_folder`, build your Docker container. The following example command will build the Docker container that is configured in your Dockerfile:

   ```
   ! docker build -t byo-container-test .
   ```

   The previous command will build a container called `byo-container-test` in the current working directory. For more information about the Docker build parameters, see [Build arguments](https://docs.docker.com/build/guide/build-args/).
**Note**  
If you get the following error message that Docker cannot find the Dockerfile, make sure the Dockerfile has the correct name and has been saved to the directory.  

   ```
   unable to prepare context: unable to evaluate symlinks in Dockerfile path:
   lstat /home/ec2-user/SageMaker/docker_test_folder/Dockerfile: no such file or directory
   ```
Docker looks for a file specifically called Dockerfile without any extension within the current directory. If you named it something else, you can pass in the file name manually with the -f flag. For example, if you named your Dockerfile as Dockerfile-text.txt, build your Docker container using the `-f` flag followed by your file as follows:  

   ```
   ! docker build -t byo-container-test -f Dockerfile-text.txt .
   ```

1. Push your Docker Image to an Amazon Elastic Container Registry (Amazon ECR)

   In a notebook cell, push your Docker image to an ECR. The following code example shows you how to build your container locally, login and push it to an ECR:

   ```
   %%sh
   # Name of algo -> ECR
   algorithm_name=sm-pretrained-spacy
   
   #make serve executable
   chmod +x NER/serve
   account=$(aws sts get-caller-identity --query Account --output text)
   # Region, defaults to us-west-2
   region=$(aws configure get region)
   region=${region:-us-east-1}
   fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"
   # If the repository doesn't exist in ECR, create it.
   aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1
   if [ $? -ne 0 ]
   then
       aws ecr create-repository --repository-name "${algorithm_name}" > /dev/nullfi
   # Get the login command from ECR and execute it directly
   aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}
   # Build the docker image locally with the image name and then push it to ECR
   # with the full name.
   
   docker build  -t ${algorithm_name} .
   docker tag ${algorithm_name} ${fullname}
   
   docker push ${fullname}
   ```

   In the previous example shows how to do the following steps necessary to push the example Docker container to an ECR:

   1. Define the algorithm name as `sm-pretrained-spacy`.

   1. Make the `serve` file inside the NER folder executable.

   1. Set the AWS Region.

   1. Create an ECR if it doesn’t already exist.

   1. Login to the ECR.

   1. Build the Docker container locally.

   1. Push the Docker image to the ECR.

1. Set up the SageMaker AI client

   If you want to use SageMaker AI hosting services for inference, you must [create a model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_model.html), create an [endpoint config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_endpoint_config.html#) and [create an endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_endpoint.html#). In order to get inferences from your endpoint, you can use the SageMaker AI boto3 Runtime client to invoke your endpoint. The following code shows you how to set up both the SageMaker AI client and the SageMaker Runtime client using the [SageMaker AI boto3 client](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html):

   ```
   import boto3
   from sagemaker import get_execution_role
   
   sm_client = boto3.client(service_name='sagemaker')
   runtime_sm_client = boto3.client(service_name='sagemaker-runtime')
   
   account_id = boto3.client('sts').get_caller_identity()['Account']
   region = boto3.Session().region_name
   
   #used to store model artifacts which SageMaker AI will extract to /opt/ml/model in the container, 
   #in this example case we will not be making use of S3 to store the model artifacts
   #s3_bucket = '<S3Bucket>'
   
   role = get_execution_role()
   ```

   In the previous code example, the Amazon S3 bucket is not used, but inserted as a comment to show how to store model artifacts.

   If you receive a permission error after you run the previous code example, you may need to add permissions to your IAM role. For more information about IAM roles, see [Amazon SageMaker Role Manager](role-manager.md). For more information about adding permissions to your current role, see [AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md).

1. Create your model.

   If you want to use SageMaker AI hosting services for inference, you must create a model in SageMaker AI. The following code example shows you how to create the spaCy NER model inside of SageMaker AI:

   ```
   from time import gmtime, strftime
   
   model_name = 'spacy-nermodel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
   # MODEL S3 URL containing model atrifacts as either model.tar.gz or extracted artifacts. 
   # Here we are not  
   #model_url = 's3://{}/spacy/'.format(s3_bucket) 
   
   container = '{}.dkr.ecr.{}.amazonaws.com/sm-pretrained-spacy:latest'.format(account_id, region)
   instance_type = 'ml.c5d.18xlarge'
   
   print('Model name: ' + model_name)
   #print('Model data Url: ' + model_url)
   print('Container image: ' + container)
   
   container = {
   'Image': container
   }
   
   create_model_response = sm_client.create_model(
       ModelName = model_name,
       ExecutionRoleArn = role,
       Containers = [container])
   
   print("Model Arn: " + create_model_response['ModelArn'])
   ```

   The previous code example shows how to define a `model_url` using the `s3_bucket` if you were to use the Amazon S3 bucket from the comments in Step 5, and defines the ECR URI for the container image. The previous code examples defines `ml.c5d.18xlarge` as the instance type. You can also choose a different instance type. For more information about available instance types, see [ Amazon EC2 instance types](https://aws.amazon.com/ec2/instance-types/).

   In the previous code example, The `Image` key points to the container image URI. The `create_model_response` definition uses the `create_model method` to create a model, and return the model name, role and a list containing the container information. 

   Example output from the previous script follows:

   ```
   Model name: spacy-nermodel-YYYY-MM-DD-HH-MM-SS
   Model data Url: s3://spacy-sagemaker-us-east-1-bucket/spacy/
   Container image: 123456789012.dkr.ecr.us-east-2.amazonaws.com/sm-pretrained-spacy:latest
   Model Arn: arn:aws:sagemaker:us-east-2:123456789012:model/spacy-nermodel-YYYY-MM-DD-HH-MM-SS
   ```

1. 

   1. 

**Configure and create an endpoint**

      To use SageMaker AI hosting for inference, you must also configure and create an endpoint. SageMaker AI will use this endpoint for inference. The following configuration example shows how to generate and configure an endpoint with the instance type and model name that you defined previously:

      ```
      endpoint_config_name = 'spacy-ner-config' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
      print('Endpoint config name: ' + endpoint_config_name)
      
      create_endpoint_config_response = sm_client.create_endpoint_config(
          EndpointConfigName = endpoint_config_name,
          ProductionVariants=[{
              'InstanceType': instance_type,
              'InitialInstanceCount': 1,
              'InitialVariantWeight': 1,
              'ModelName': model_name,
              'VariantName': 'AllTraffic'}])
              
      print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])
      ```

      In the previous configuration example, `create_endpoint_config_response` associates the `model_name` with a unique endpoint configuration name `endpoint_config_name` that is created with a timestamp.

      Example output from the previous script follows:

      ```
      Endpoint config name: spacy-ner-configYYYY-MM-DD-HH-MM-SS
      Endpoint config Arn: arn:aws:sagemaker:us-east-2:123456789012:endpoint-config/spacy-ner-config-MM-DD-HH-MM-SS
      ```

      For more information about endpoint errors, see [Why does my Amazon SageMaker AI endpoint go into the failed state when I create or update an endpoint?](https://repost.aws/knowledge-center/sagemaker-endpoint-creation-fail)

   1. 

**Create an endpoint and wait for the endpoint to be in service.**

       The following code example creates the endpoint using the configuration from the previous configuration example and deploys the model: 

      ```
      %%time
      
      import time
      
      endpoint_name = 'spacy-ner-endpoint' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
      print('Endpoint name: ' + endpoint_name)
      
      create_endpoint_response = sm_client.create_endpoint(
          EndpointName=endpoint_name,
          EndpointConfigName=endpoint_config_name)
      print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])
      
      resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
      status = resp['EndpointStatus']
      print("Endpoint Status: " + status)
      
      print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
      waiter = sm_client.get_waiter('endpoint_in_service')
      waiter.wait(EndpointName=endpoint_name)
      ```

      In the previous code example, the `create_endpoint` method creates the endpoint with the generated endpoint name created in the previous code example, and prints the Amazon Resource Name of the endpoint. The `describe_endpoint` method returns information about the endpoint and its status. A SageMaker AI waiter waits for the endpoint to be in service.

1. Test your endpoint.

   Once your endpoint is in service, send an [invocation request](https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint) to your endpoint. The following code example shows how to send a test request to your endpoint:

   ```
   import json
   content_type = "application/json"
   request_body = {"input": "This is a test with NER in America with \
       Amazon and Microsoft in Seattle, writing random stuff."}
   
   #Serialize data for endpoint
   #data = json.loads(json.dumps(request_body))
   payload = json.dumps(request_body)
   
   #Endpoint invocation
   response = runtime_sm_client.invoke_endpoint(
   EndpointName=endpoint_name,
   ContentType=content_type,
   Body=payload)
   
   #Parse results
   result = json.loads(response['Body'].read().decode())['output']
   result
   ```

   In the previous code example, the method `json.dumps` serializes the `request_body` into a string formatted in JSON and saves it in the variable payload. Then SageMaker AI Runtime client uses the [invoke endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint.html) method to send payload to your endpoint. The result contains the response from your endpoint after extracting the output field.

   The previous code example should return the following output:

   ```
   [['NER', 'ORG'],
    ['America', 'GPE'],
    ['Amazon', 'ORG'],
    ['Microsoft', 'ORG'],
    ['Seattle', 'GPE']]
   ```

1. Delete your endpoint

   After you have completed your invocations, delete your endpoint to conserve resources. The following code example shows you how to delete your endpoint:

   ```
   sm_client.delete_endpoint(EndpointName=endpoint_name)
   sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
   sm_client.delete_model(ModelName=model_name)
   ```

   For a complete notebook containing the code in this example, see [BYOC-Single-Model](https://github.com/aws-samples/sagemaker-hosting/tree/main/Bring-Your-Own-Container/BYOC-Single-Model).