

# Model performance optimization with SageMaker Neo
<a name="neo"></a>

Neo is a capability of Amazon SageMaker AI that enables machine learning models to train once and run anywhere in the cloud and at the edge. 

If you are a first time user of SageMaker Neo, we recommend you check out the [Getting Started with Edge Devices](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-getting-started-edge.html) section to get step-by-step instructions on how to compile and deploy to an edge device. 

## What is SageMaker Neo?
<a name="neo-what-it-is"></a>

Generally, optimizing machine learning models for inference on multiple platforms is difficult because you need to hand-tune models for the specific hardware and software configuration of each platform. If you want to get optimal performance for a given workload, you need to know the hardware architecture, instruction set, memory access patterns, and input data shapes, among other factors. For traditional software development, tools such as compilers and profilers simplify the process. For machine learning, most tools are specific to the framework or to the hardware. This forces you into a manual trial-and-error process that is unreliable and unproductive.

Neo automatically optimizes Gluon, Keras, MXNet, PyTorch, TensorFlow, TensorFlow-Lite, and ONNX models for inference on Android, Linux, and Windows machines based on processors from Ambarella, ARM, Intel, Nvidia, NXP, Qualcomm, Texas Instruments, and Xilinx. Neo is tested with computer vision models available in the model zoos across the frameworks. SageMaker Neo supports compilation and deployment for two main platforms: cloud instances (including Inferentia) and edge devices.

For more information about supported frameworks and cloud instance types you can deploy to, see [Supported Instance Types and Frameworks](neo-supported-cloud.md) for cloud instances.

For more information about supported frameworks, edge devices, operating systems, chip architectures, and common machine learning models tested by SageMaker AI Neo for edge devices, see [Supported Frameworks, Devices, Systems, and Architectures](neo-supported-devices-edge.md) for edge devices.

## How it Works
<a name="neo-how-it-works"></a>

Neo consists of a compiler and a runtime. First, the Neo compilation API reads models exported from various frameworks. It converts the framework-specific functions and operations into a framework-agnostic intermediate representation. Next, it performs a series of optimizations. Then it generates binary code for the optimized operations, writes them to a shared object library, and saves the model definition and parameters into separate files. Neo also provides a runtime for each target platform that loads and executes the compiled model.

![\[How Neo works in SageMaker AI.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/neo_how_it_works.png)


You can create a Neo compilation job from either the SageMaker AI console, the AWS Command Line Interface (AWS CLI), a Python notebook, or the SageMaker AI SDK.For information on how to compile a model, see [Model Compilation with Neo](neo-job-compilation.md). With a few CLI commands, an API invocation, or a few clicks, you can convert a model for your chosen platform. You can deploy the model to a SageMaker AI endpoint or on an AWS IoT Greengrass device quickly.

Neo can optimize models with parameters either in FP32 or quantized to INT8 or FP16 bit-width.

**Topics**
+ [What is SageMaker Neo?](#neo-what-it-is)
+ [How it Works](#neo-how-it-works)
+ [Model Compilation with Neo](neo-job-compilation.md)
+ [Cloud Instances](neo-cloud-instances.md)
+ [Edge Devices](neo-edge-devices.md)
+ [Troubleshoot Errors](neo-troubleshooting.md)

# Model Compilation with Neo
<a name="neo-job-compilation"></a>

This section shows how to create, describe, stop, and list compilation jobs. The following options are available in Amazon SageMaker Neo for managing the compilation jobs for machine learning models: the AWS Command Line Interface, the Amazon SageMaker AI console, or the Amazon SageMaker SDK. 

**Topics**
+ [Prepare Model for Compilation](neo-compilation-preparing-model.md)
+ [Compile a Model (AWS Command Line Interface)](neo-job-compilation-cli.md)
+ [Compile a Model (Amazon SageMaker AI Console)](neo-job-compilation-console.md)
+ [Compile a Model (Amazon SageMaker AI SDK)](neo-job-compilation-sagemaker-sdk.md)

# Prepare Model for Compilation
<a name="neo-compilation-preparing-model"></a>

SageMaker Neo requires machine learning models to satisfy specific input data shapes. The input shape required for compilation depends on the deep learning framework you use. Once your model input shape is correctly formatted, save your model according to the requirements below. Once you have a saved model, compress the model artifacts.

**Topics**
+ [What input data shapes does SageMaker Neo expect?](#neo-job-compilation-expected-inputs)
+ [Saving Models for SageMaker Neo](#neo-job-compilation-how-to-save-model)

## What input data shapes does SageMaker Neo expect?
<a name="neo-job-compilation-expected-inputs"></a>

Before you compile your model, make sure your model is formatted correctly. Neo expects the name and shape of the expected data inputs for your trained model with JSON format or list format. The expected inputs are framework specific. 

Below are the input shapes SageMaker Neo expects:

### Keras
<a name="collapsible-section-1"></a>

Specify the name and shape (NCHW format) of the expected data inputs using a dictionary format for your trained model. Note that while Keras model artifacts should be uploaded in NHWC (channel-last) format, DataInputConfig should be specified in NCHW (channel-first) format. The dictionary formats required are as follows: 
+ For one input: `{'input_1':[1,3,224,224]}`
+ For two inputs: `{'input_1': [1,3,224,224], 'input_2':[1,3,224,224]}`

### MXNet/ONNX
<a name="collapsible-section-2"></a>

Specify the name and shape (NCHW format) of the expected data inputs using a dictionary format for your trained model. The dictionary formats required are as follows:
+ For one input: `{'data':[1,3,1024,1024]}`
+ For two inputs: `{'var1': [1,1,28,28], 'var2':[1,1,28,28]}`

### PyTorch
<a name="collapsible-section-3"></a>

For a PyTorch model, you don't need to provide the name and shape of the expected data inputs if you meet both of the following conditions:
+ You created your model definition file by using PyTorch 2.0 or later. For more information about how to create the definition file, see the [PyTorch](#how-to-save-pytorch) section under *Saving Models for SageMaker Neo*.
+ You are compiling your model for a cloud instance. For more information about the instance types that SageMaker Neo supports, see [Supported Instance Types and Frameworks](neo-supported-cloud.md).

If you meet these conditions, SageMaker Neo gets the input configuration from the model definition file (.pt or .pth) that you create with PyTorch.

Otherwise, you must do the following:

Specify the name and shape (NCHW format) of the expected data inputs using a dictionary format for your trained model. Alternatively, you can specify the shape only using a list format. The dictionary formats required are as follows:
+ For one input in dictionary format: `{'input0':[1,3,224,224]}`
+ For one input in list format: `[[1,3,224,224]]`
+ For two inputs in dictionary format: `{'input0':[1,3,224,224], 'input1':[1,3,224,224]}`
+ For two inputs in list format: `[[1,3,224,224], [1,3,224,224]]`

### TensorFlow
<a name="collapsible-section-4"></a>

Specify the name and shape (NHWC format) of the expected data inputs using a dictionary format for your trained model. The dictionary formats required are as follows:
+ For one input: `{'input':[1,1024,1024,3]}`
+ For two inputs: `{'data1': [1,28,28,1], 'data2':[1,28,28,1]}`

### TFLite
<a name="collapsible-section-5"></a>

Specify the name and shape (NHWC format) of the expected data inputs using a dictionary format for your trained model. The dictionary formats required are as follows:
+ For one input: `{'input':[1,224,224,3]}`

**Note**  
SageMaker Neo only supports TensorFlow Lite for edge device targets. For a list of supported SageMaker Neo edge device targets, see the SageMaker Neo [Devices](neo-supported-devices-edge-devices.md#neo-supported-edge-devices) page. For a list of supported SageMaker Neo cloud instance targets, see the SageMaker Neo [Supported Instance Types and Frameworks](neo-supported-cloud.md) page.

### XGBoost
<a name="collapsible-section-6"></a>

An input data name and shape are not needed.

## Saving Models for SageMaker Neo
<a name="neo-job-compilation-how-to-save-model"></a>

The following code examples show how to save your model to make it compatible with Neo. Models must be packaged as compressed tar files (`*.tar.gz`).

### Keras
<a name="how-to-save-tf-keras"></a>

Keras models require one model definition file (`.h5`).

There are two options for saving your Keras model in order to make it compatible for SageMaker Neo:

1. Export to `.h5` format with `model.save("<model-name>", save_format="h5")`.

1. Freeze the `SavedModel` after exporting.

Below is an example of how to export a `tf.keras` model as a frozen graph (option two):

```
import os
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras import backend

tf.keras.backend.set_learning_phase(0)
model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3), pooling='avg')
model.summary()

# Save as a SavedModel
export_dir = 'saved_model/'
model.save(export_dir, save_format='tf')

# Freeze saved model
input_node_names = [inp.name.split(":")[0] for inp in model.inputs]
output_node_names = [output.name.split(":")[0] for output in model.outputs]
print("Input names: ", input_node_names)
with tf.Session() as sess:
    loaded = tf.saved_model.load(sess, export_dir=export_dir, tags=["serve"]) 
    frozen_graph = tf.graph_util.convert_variables_to_constants(sess,
                                                                sess.graph.as_graph_def(),
                                                                output_node_names)
    tf.io.write_graph(graph_or_graph_def=frozen_graph, logdir=".", name="frozen_graph.pb", as_text=False)

import tarfile
tar = tarfile.open("frozen_graph.tar.gz", "w:gz")
tar.add("frozen_graph.pb")
tar.close()
```

**Warning**  
Do not export your model with the `SavedModel` class using `model.save(<path>, save_format='tf')`. This format is suitable for training, but it is not suitable for inference.

### MXNet
<a name="how-to-save-mxnet"></a>

MXNet models must be saved as a single symbol file `*-symbol.json` and a single parameter `*.params files`.

------
#### [ Gluon Models ]

Define the neural network using the `HybridSequential` Class. This will run the code in the style of symbolic programming (as opposed to imperative programming).

```
from mxnet import nd, sym
from mxnet.gluon import nn

def get_net():
    net = nn.HybridSequential()  # Here we use the class HybridSequential.
    net.add(nn.Dense(256, activation='relu'),
            nn.Dense(128, activation='relu'),
            nn.Dense(2))
    net.initialize()
    return net

# Define an input to compute a forward calculation. 
x = nd.random.normal(shape=(1, 512))
net = get_net()

# During the forward calculation, the neural network will automatically infer
# the shape of the weight parameters of all the layers based on the shape of
# the input.
net(x)
                        
# hybridize model
net.hybridize()
net(x)

# export model
net.export('<model_name>') # this will create model-symbol.json and model-0000.params files

import tarfile
tar = tarfile.open("<model_name>.tar.gz", "w:gz")
for name in ["<model_name>-0000.params", "<model_name>-symbol.json"]:
    tar.add(name)
tar.close()
```

For more information about hybridizing models, see the [MXNet hybridize documentation](https://mxnet.apache.org/versions/1.7.0/api/python/docs/tutorials/packages/gluon/blocks/hybridize.html).

------
#### [ Gluon Model Zoo (GluonCV) ]

GluonCV model zoo models come pre-hybridized. So you can just export them.

```
import numpy as np
import mxnet as mx
import gluoncv as gcv
from gluoncv.utils import export_block
import tarfile

net = gcv.model_zoo.get_model('<model_name>', pretrained=True) # For example, choose <model_name> as resnet18_v1
export_block('<model_name>', net, preprocess=True, layout='HWC')

tar = tarfile.open("<model_name>.tar.gz", "w:gz")

for name in ["<model_name>-0000.params", "<model_name>-symbol.json"]:
    tar.add(name)
tar.close()
```

------
#### [ Non Gluon Models ]

All non-Gluon models when saved to disk use `*-symbol` and `*.params` files. They are therefore already in the correct format for Neo.

```
# Pass the following 3 parameters: sym, args, aux
mx.model.save_checkpoint('<model_name>',0,sym,args,aux) # this will create <model_name>-symbol.json and <model_name>-0000.params files

import tarfile
tar = tarfile.open("<model_name>.tar.gz", "w:gz")

for name in ["<model_name>-0000.params", "<model_name>-symbol.json"]:
    tar.add(name)
tar.close()
```

------

### PyTorch
<a name="how-to-save-pytorch"></a>

PyTorch models must be saved as a definition file (`.pt` or `.pth`) with input datatype of `float32`.

To save your model, use the `torch.jit.trace` method followed by the `torch.save` method. This process saves an object to a disk file and by default uses python pickle (`pickle_module=pickle`) to save the objects and some metadata. Next, convert the saved model to a compressed tar file.

```
import torchvision
import torch

model = torchvision.models.resnet18(pretrained=True)
model.eval()
inp = torch.rand(1, 3, 224, 224)
model_trace = torch.jit.trace(model, inp)

# Save your model. The following code saves it with the .pth file extension
model_trace.save('model.pth')

# Save as a compressed tar file
import tarfile
with tarfile.open('model.tar.gz', 'w:gz') as f:
    f.add('model.pth')
f.close()
```

If you save your model with PyTorch 2.0 or later, SageMaker Neo derives the input configuration for the model (the name and shape for its input) from the definition file. In that case, you don't need to specify the data input configuration to SageMaker AI when you compile the model.

If you want to prevent SageMaker Neo from deriving the input configuration, you can set the `_store_inputs` parameter of `torch.jit.trace` to `False`. If you do this, you must specify the data input configuration to SageMaker AI when you compile the model.

For more information about the `torch.jit.trace` method, see [TORCH.JIT.TRACE](https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace) in the PyTorch documentation.

### TensorFlow
<a name="how-to-save-tf"></a>

TensorFlow requires one `.pb` or one `.pbtxt` file and a variables directory that contains variables. For frozen models, only one `.pb` or `.pbtxt` file is required.

The following code example shows how to use the tar Linux command to compress your model. Run the following in your terminal or in a Jupyter notebook (if you use a Jupyter notebook, insert the `!` magic command at the beginning of the statement):

```
# Download SSD_Mobilenet trained model
!wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz

# unzip the compressed tar file
!tar xvf ssd_mobilenet_v2_coco_2018_03_29.tar.gz

# Compress the tar file and save it in a directory called 'model.tar.gz'
!tar czvf model.tar.gz ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb
```

The command flags used in this example accomplish the following:
+ `c`: Create an archive
+ `z`: Compress the archive with gzip
+ `v`: Display archive progress
+ `f`: Specify the filename of the archive

### Built-In Estimators
<a name="how-to-save-built-in"></a>

Built-in estimators are either made by framework-specific containers or algorithm-specific containers. Estimator objects for both the built-in algorithm and framework-specific estimator saves the model in the correct format for you when you train the model using the built-in `.fit` method.

For example, you can use a `sagemaker.TensorFlow` to define a TensorFlow estimator:

```
from sagemaker.tensorflow import TensorFlow

estimator = TensorFlow(entry_point='mnist.py',
                        role=role,  #param role can be arn of a sagemaker execution role
                        framework_version='1.15.3',
                        py_version='py3',
                        training_steps=1000, 
                        evaluation_steps=100,
                        instance_count=2,
                        instance_type='ml.c4.xlarge')
```

Then train the model with `.fit` built-in method:

```
estimator.fit(inputs)
```

Before finally compiling model with the build in `compile_model` method:

```
# Specify output path of the compiled model
output_path = '/'.join(estimator.output_path.split('/')[:-1])

# Compile model
optimized_estimator = estimator.compile_model(target_instance_family='ml_c5', 
                              input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.15.3')
```

You can also use the `sagemaker.estimator.Estimator` Class to initialize an estimator object for training and compiling a built-in algorithm with the `compile_model` method from the SageMaker Python SDK:

```
import sagemaker
from sagemaker.image_uris import retrieve
sagemaker_session = sagemaker.Session()
aws_region = sagemaker_session.boto_region_name

# Specify built-in algorithm training image
training_image = retrieve(framework='image-classification', 
                          region=aws_region, image_scope='training')

training_image = retrieve(framework='image-classification', region=aws_region, image_scope='training')

# Create estimator object for training
estimator = sagemaker.estimator.Estimator(image_uri=training_image,
                                          role=role,  #param role can be arn of a sagemaker execution role
                                          instance_count=1,
                                          instance_type='ml.p3.8xlarge',
                                          volume_size = 50,
                                          max_run = 360000,
                                          input_mode= 'File',
                                          output_path=s3_training_output_location,
                                          base_job_name='image-classification-training'
                                          )
                                          
# Setup the input data_channels to be used later for training.                                          
train_data = sagemaker.inputs.TrainingInput(s3_training_data_location,
                                            content_type='application/x-recordio',
                                            s3_data_type='S3Prefix')
validation_data = sagemaker.inputs.TrainingInput(s3_validation_data_location,
                                                content_type='application/x-recordio',
                                                s3_data_type='S3Prefix')
data_channels = {'train': train_data, 'validation': validation_data}


# Train model
estimator.fit(inputs=data_channels, logs=True)

# Compile model with Neo                                                                                  
optimized_estimator = estimator.compile_model(target_instance_family='ml_c5',
                                          input_shape={'data':[1, 3, 224, 224], 'softmax_label':[1]},
                                          output_path=s3_compilation_output_location,
                                          framework='mxnet',
                                          framework_version='1.7')
```

For more information about compiling models with the SageMaker Python SDK, see [Compile a Model (Amazon SageMaker AI SDK)](neo-job-compilation-sagemaker-sdk.md).

# Compile a Model (AWS Command Line Interface)
<a name="neo-job-compilation-cli"></a>

This section shows how to manage Amazon SageMaker Neo compilation jobs for machine learning models using AWS Command Line Interface (CLI). You can create, describe, stop, and list the compilation jobs. 

1. Create a Compilation Job

   With the [CreateCompilationJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCompilationJob.html) API operation, you can specify the data input format, the S3 bucket in which to store your model, the S3 bucket to which to write the compiled model, and the target hardware device or platform.

   The following table demonstrates how to configure `CreateCompilationJob` API based on whether your target is a device or a platform.

------
#### [ Device Example ]

   ```
   {
       "CompilationJobName": "neo-compilation-job-demo",
       "RoleArn": "arn:aws:iam::<your-account>:role/service-role/AmazonSageMaker-ExecutionRole-yyyymmddThhmmss",
       "InputConfig": {
           "S3Uri": "s3://<your-bucket>/sagemaker/neo-compilation-job-demo-data/train",
           "DataInputConfig":  "{'data': [1,3,1024,1024]}",
           "Framework": "MXNET"
       },
       "OutputConfig": {
           "S3OutputLocation": "s3://<your-bucket>/sagemaker/neo-compilation-job-demo-data/compile",
           # A target device specification example for a ml_c5 instance family
           "TargetDevice": "ml_c5"
       },
       "StoppingCondition": {
           "MaxRuntimeInSeconds": 300
       }
   }
   ```

   You can optionally specify the framework version you used with the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_InputConfig.html#sagemaker-Type-InputConfig-FrameworkVersion](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_InputConfig.html#sagemaker-Type-InputConfig-FrameworkVersion) field if you used the PyTorch framework to train your model and your target device is a `ml_* `target.

   ```
   {
       "CompilationJobName": "neo-compilation-job-demo",
       "RoleArn": "arn:aws:iam::<your-account>:role/service-role/AmazonSageMaker-ExecutionRole-yyyymmddThhmmss",
       "InputConfig": {
           "S3Uri": "s3://<your-bucket>/sagemaker/neo-compilation-job-demo-data/train",
           "DataInputConfig":  "{'data': [1,3,1024,1024]}",
           "Framework": "PYTORCH",
           "FrameworkVersion": "1.6"
       },
       "OutputConfig": {
           "S3OutputLocation": "s3://<your-bucket>/sagemaker/neo-compilation-job-demo-data/compile",
           # A target device specification example for a ml_c5 instance family
           "TargetDevice": "ml_c5",
           # When compiling for ml_* instances using PyTorch framework, use the "CompilerOptions" field in 
           # OutputConfig to provide the correct data type ("dtype") of the model’s input. Default assumed is "float32"
           "CompilerOptions": "{'dtype': 'long'}"
       },
       "StoppingCondition": {
           "MaxRuntimeInSeconds": 300
       }
   }
   ```

**Notes:**  
If you saved your model by using PyTorch version 2.0 or later, the `DataInputConfig` field is optional. SageMaker AI Neo gets the input configuration from the model definition file that you create with PyTorch. For more information about how to create the definition file, see the [PyTorch](neo-compilation-preparing-model.md#how-to-save-pytorch) section under *Saving Models for SageMaker AI Neo*.
This API field is only supported for PyTorch.

------
#### [ Platform Example ]

   ```
   {
       "CompilationJobName": "neo-test-compilation-job",
       "RoleArn": "arn:aws:iam::<your-account>:role/service-role/AmazonSageMaker-ExecutionRole-yyyymmddThhmmss",
       "InputConfig": {
           "S3Uri": "s3://<your-bucket>/sagemaker/neo-compilation-job-demo-data/train",
           "DataInputConfig":  "{'data': [1,3,1024,1024]}",
           "Framework": "MXNET"
       },
       "OutputConfig": {
           "S3OutputLocation": "s3://<your-bucket>/sagemaker/neo-compilation-job-demo-data/compile",
           # A target platform configuration example for a p3.2xlarge instance
           "TargetPlatform": {
               "Os": "LINUX",
               "Arch": "X86_64",
               "Accelerator": "NVIDIA"
           },
           "CompilerOptions": "{'cuda-ver': '10.0', 'trt-ver': '6.0.1', 'gpu-code': 'sm_70'}"
       },
       "StoppingCondition": {
           "MaxRuntimeInSeconds": 300
       }
   }
   ```

------
**Note**  
For the `OutputConfig` API operation, the `TargetDevice` and `TargetPlatform` API operations are mutually exclusive. You have to choose one of the two options.

   To find the JSON string examples of `DataInputConfig` depending on frameworks, see [What input data shapes Neo expects](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-compilation.html#neo-troubleshooting-errors-preventing).

   For more information about setting up the configurations, see the [InputConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_InputConfig.html), [OutputConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputConfig.html), and [TargetPlatform](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TargetPlatform.html) API operations in the SageMaker API reference.

1. After you configure the JSON file, run the following command to create the compilation job:

   ```
   aws sagemaker create-compilation-job \
   --cli-input-json file://job.json \
   --region us-west-2 
   
   # You should get CompilationJobArn
   ```

1. Describe the compilation job by running the following command:

   ```
   aws sagemaker describe-compilation-job \
   --compilation-job-name $JOB_NM \
   --region us-west-2
   ```

1. Stop the compilation job by running the following command:

   ```
   aws sagemaker stop-compilation-job \
   --compilation-job-name $JOB_NM \
   --region us-west-2
   
   # There is no output for compilation-job operation
   ```

1. List the compilation job by running the following command:

   ```
   aws sagemaker list-compilation-jobs \
   --region us-west-2
   ```

# Compile a Model (Amazon SageMaker AI Console)
<a name="neo-job-compilation-console"></a>

You can create an Amazon SageMaker Neo compilation job in the Amazon SageMaker AI console.

1. In the **Amazon SageMaker AI** console, choose **Compilation jobs**, and then choose **Create compilation job**.  
![\[Create a compilation job.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/8-create-compilation-job.png)

1. On the **Create compilation job** page, under **Job name**, enter a name. Then select an **IAM role**.  
![\[Create compilation job page.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/9-create-compilation-job-config.png)

1. If you don’t have an IAM role, choose **Create a new role**.  
![\[Create IAM role location.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/10a-create-iam-role.png)

1. On the **Create an IAM role** page, choose **Any S3 bucket**, and choose **Create role**.  
![\[Create IAM role page.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/10-create-iam-role.png)

1. 

------
#### [ Non PyTorch Frameworks ]

   Within the **Input configuration** section, enter the full path of the Amazon S3 bucket URI that contains your model artifacts in the **Location of model artifacts** input field. Your model artifacts must be in a compressed tarball file format (`.tar.gz`). 

   For the **Data input configuration** field, enter the JSON string that specifies the shape of the input data.

   For **Machine learning framework**, choose the framework of your choice.

![\[Input configuration page.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/neo-create-compilation-job-input-config.png)


   To find the JSON string examples of input data shapes depending on frameworks, see [What input data shapes Neo expects](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting.html#neo-troubleshooting-errors-preventing).

------
#### [ PyTorch Framework ]

   Similar instructions apply for compiling PyTorch models. However, if you trained with PyTorch and are trying to compile the model for `ml_*` (except `ml_inf`) target, you can optionally specify the version of PyTorch you used.

![\[Example Input configuration section showing where to choose the Framework version.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/compile_console_pytorch.png)


   To find the JSON string examples of input data shapes depending on frameworks, see [What input data shapes Neo expects](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting.html#neo-troubleshooting-errors-preventing).

**Notes**  
If you saved your model by using PyTorch version 2.0 or later, the **Data input configuration field** is optional. SageMaker Neo gets the input configuration from the model definition file that you create with PyTorch. For more information about how to create the definition file, see the [PyTorch](neo-compilation-preparing-model.md#how-to-save-pytorch) section under *Saving Models for SageMaker AI Neo*.
When compiling for `ml_*` instances using PyTorch framework, use **Compiler options** field in **Output Configuration** to provide the correct data type (`dtype`) of the model’s input. The default is set to `"float32"`. 

![\[Example Output Configuration section.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/neo_compilation_console_pytorch_compiler_options.png)


**Warning**  
 If you specify a Amazon S3 bucket URI path that leads to `.pth` file, you will receive the following error after starting compilation: `ClientError: InputConfiguration: Unable to untar input model.Please confirm the model is a tar.gz file` 

------

1.  Go to the **Output configuration** section. Choose where you want to deploy your model. You can deploy your model to a **Target device** or a **Target platform**. Target devices include cloud and edge devices. Target platforms refer to specific OS, architecture, and accelerators you want your model to run on. 

    For **S3 Output location**, enter the path to the S3 bucket where you want to store the model. You can optionally add compiler options in JSON format under the **Compiler options** section.   
![\[Output configuration page.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/neo-console-output-config.png)

1. Check the status of the compilation job when started. This status of the job can be found at the top of the **Compilation Job** page, as shown in the following screenshot. You can also check the status of it in the **Status** column.  
![\[Compilation job status.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/12-run-model-compilation.png)

1. Check the status of the compilation job when completed. You can check the status in the **Status** column as shown in the following screenshot.  
![\[Compilation job status.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo/12a-completed-model-compilation.png)

# Compile a Model (Amazon SageMaker AI SDK)
<a name="neo-job-compilation-sagemaker-sdk"></a>

 You can use the [https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html?#sagemaker.estimator.Estimator.compile_model](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html?#sagemaker.estimator.Estimator.compile_model) API in the [Amazon SageMaker AI SDK for Python](https://sagemaker.readthedocs.io/en/stable/) to compile a trained model and optimize it for specific target hardware. The API should be invoked on the estimator object used during model training. 

**Note**  
You must set `MMS_DEFAULT_RESPONSE_TIMEOUT` environment variable to `500` when compiling the model with MXNet or PyTorch. The environment variable is not needed for TensorFlow. 

 The following is an example of how you can compile a model using the `trained_model_estimator` object: 

```
# Replace the value of expected_trained_model_input below and
# specify the name & shape of the expected inputs for your trained model
# in json dictionary form
expected_trained_model_input = {'data':[1, 784]}

# Replace the example target_instance_family below to your preferred target_instance_family
compiled_model = trained_model_estimator.compile_model(target_instance_family='ml_c5',
        input_shape=expected_trained_model_input,
        output_path='insert s3 output path',
        env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'})
```

The code compiles the model, saves the optimized model at `output_path`, and creates a SageMaker AI model that can be deployed to an endpoint. 

# Cloud Instances
<a name="neo-cloud-instances"></a>

Amazon SageMaker Neo provides compilation support for popular machine learning frameworks such as TensorFlow, PyTorch, MXNet, and more. You can deploy your compiled model to cloud instances and AWS Inferentia instances. For a full list of supported frameworks and instances types, see [Supported Instances Types and Frameworks](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-cloud.html). 

You can compile your model in one of three ways: through the AWS CLI, the SageMaker AI Console, or the SageMaker AI SDK for Python. See, [Use Neo to Compile a Model](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html) for more information. Once compiled, your model artifacts are stored in the Amazon S3 bucket URI you specified during the compilation job. You can deploy your compiled model to cloud instances and AWS Inferentia instances using the SageMaker AI SDK for Python, AWS SDK for Python (Boto3), AWS CLI, or the AWS console. 

If you deploy your model using AWS CLI, the console, or Boto3, you must select a Docker image Amazon ECR URI for your primary container. See [Neo Inference Container Images](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html) for a list of Amazon ECR URIs.

**Topics**
+ [Supported Instance Types and Frameworks](neo-supported-cloud.md)
+ [Deploy a Model](neo-deployment-hosting-services.md)
+ [Inference Requests With a Deployed Service](neo-requests.md)
+ [Inference Container Images](neo-deployment-hosting-services-container-images.md)

# Supported Instance Types and Frameworks
<a name="neo-supported-cloud"></a>

Amazon SageMaker Neo supports popular deep learning frameworks for both compilation and deployment. You can deploy your model to cloud instances or AWS Inferentia instance types.

The following describes frameworks SageMaker Neo supports and the target cloud instances you can compile and deploy to. For information on how to deploy your compiled model to a cloud or Inferentia instance, see [Deploy a Model with Cloud Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services.html).

## Cloud Instances
<a name="neo-supported-cloud-instances"></a>

SageMaker Neo supports the following deep learning frameworks for CPU and GPU cloud instances: 


| Framework | Framework Version | Model Version | Models | Model Formats (packaged in \$1.tar.gz) | Toolkits | 
| --- | --- | --- | --- | --- | --- | 
| MXNet | 1.8.0 | Supports 1.8.0 or earlier | Image Classification, Object Detection, Semantic Segmentation, Pose Estimation, Activity Recognition | One symbol file (.json) and one parameter file (.params) | GluonCV v0.8.0 | 
| ONNX | 1.7.0 | Supports 1.7.0 or earlier | Image Classification, SVM | One model file (.onnx) |  | 
| Keras | 2.2.4 | Supports 2.2.4 or earlier | Image Classification | One model definition file (.h5) |  | 
| PyTorch | 1.4, 1.5, 1.6, 1.7, 1.8, 1.12, 1.13, or 2.0 | Supports 1.4, 1.5, 1.6, 1.7, 1.8, 1.12, 1.13, and 2.0 |  Image Classification Versions 1.13 and 2.0 support Object Detection, Vision Transformer, and HuggingFace  | One model definition file (.pt or .pth) with input dtype of float32 |  | 
| TensorFlow | 1.15.3 or 2.9 | Supports 1.15.3 and 2.9 | Image Classification | For saved models, one .pb or one .pbtxt file and a variables directory that contains variables For frozen models, only one .pb or .pbtxt file |  | 
| XGBoost | 1.3.3 | Supports 1.3.3 or earlier | Decision Trees | One XGBoost model file (.model) where the number of nodes in a tree is less than 2^31 |  | 

**Note**  
“Model Version” is the version of the framework used to train and export the model. 

## Instance Types
<a name="neo-supported-cloud-instances-types"></a>

 You can deploy your SageMaker AI compiled model to one of the cloud instances listed below: 


| Instance | Compute Type | 
| --- | --- | 
| `ml_c4` | Standard | 
| `ml_c5` | Standard | 
| `ml_m4` | Standard | 
| `ml_m5` | Standard | 
| `ml_p2` | Accelerated computing | 
| `ml_p3` | Accelerated computing | 
| `ml_g4dn` | Accelerated computing | 

 For information on the available vCPU, memory, and price per hour for each instance type, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/). 

**Note**  
When compiling for `ml_*` instances using PyTorch framework, use **Compiler options** field in **Output Configuration** to provide the correct data type (`dtype`) of the model’s input.  
The default is set to `"float32"`.

## AWS Inferentia
<a name="neo-supported-inferentia"></a>

 SageMaker Neo supports the following deep learning frameworks for Inf1: 


| Framework | Framework Version | Model Version | Models | Model Formats (packaged in \$1.tar.gz) | Toolkits | 
| --- | --- | --- | --- | --- | --- | 
| MXNet | 1.5 or 1.8  | Supports 1.8, 1.5 and earlier | Image Classification, Object Detection, Semantic Segmentation, Pose Estimation, Activity Recognition | One symbol file (.json) and one parameter file (.params) | GluonCV v0.8.0 | 
| PyTorch | 1.7, 1.8 or 1.9 | Supports 1.9 and earlier | Image Classification | One model definition file (.pt or .pth) with input dtype of float32 |  | 
| TensorFlow | 1.15 or 2.5 | Supports 2.5, 1.15 and earlier | Image Classification | For saved models, one .pb or one .pbtxt file and a variables directory that contains variables For frozen models, only one .pb or .pbtxt file |  | 

**Note**  
“Model Version” is the version of the framework used to train and export the model.

You can deploy your SageMaker Neo-compiled model to AWS Inferentia-based Amazon EC2 Inf1 instances. AWS Inferentia is Amazon's first custom silicon chip designed to accelerate deep learning. Currently, you can use the `ml_inf1` instance to deploy your compiled models.

### AWS Inferentia2 and AWS Trainium
<a name="neo-supported-inferentia-trainium"></a>

Currently, you can deploy your SageMaker Neo-compiled model to AWS Inferentia2-based Amazon EC2 Inf2 instances (in US East (Ohio) Region), and to AWS Trainium-based Amazon EC2 Trn1 instances (in US East (N. Virginia) Region). For more information about supported models on these instances, see [ Model Architecture Fit Guidelines](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/model-architecture-fit.html) in the AWS Neuron documentation, and the examples in the [Neuron Github repository](https://github.com/aws-neuron/aws-neuron-sagemaker-samples).

# Deploy a Model
<a name="neo-deployment-hosting-services"></a>

To deploy an Amazon SageMaker Neo-compiled model to an HTTPS endpoint, you must configure and create the endpoint for the model using Amazon SageMaker AI hosting services. Currently, developers can use Amazon SageMaker APIs to deploy modules on to ml.c5, ml.c4, ml.m5, ml.m4, ml.p3, ml.p2, and ml.inf1 instances. 

For [Inferentia](https://aws.amazon.com/machine-learning/inferentia/) and [Trainium](https://aws.amazon.com/machine-learning/trainium/) instances, models need to be compiled specifically for those instances. Models compiled for other instance types are not guaranteed to work with Inferentia or Trainium instances.

When you deploy a compiled model, you need to use the same instance for the target that you used for compilation. This creates a SageMaker AI endpoint that you can use to perform inferences. You can deploy a Neo-compiled model using any of the following: [Amazon SageMaker AI SDK for Python](https://sagemaker.readthedocs.io/en/stable/), [SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html), [AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/reference/), and the [SageMaker AI console](https://console.aws.amazon.com/sagemaker).

**Note**  
For deploying a model using AWS CLI, the console, or Boto3, see [Neo Inference Container Images](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html) to select the inference image URI for your primary container. 

**Topics**
+ [Prerequisites](neo-deployment-hosting-services-prerequisites.md)
+ [Deploy a Compiled Model Using SageMaker SDK](neo-deployment-hosting-services-sdk.md)
+ [Deploy a Compiled Model Using Boto3](neo-deployment-hosting-services-boto3.md)
+ [Deploy a Compiled Model Using the AWS CLI](neo-deployment-hosting-services-cli.md)
+ [Deploy a Compiled Model Using the Console](neo-deployment-hosting-services-console.md)

# Prerequisites
<a name="neo-deployment-hosting-services-prerequisites"></a>

**Note**  
Follow the instructions in this section if you compiled your model using AWS SDK for Python (Boto3), AWS CLI, or the SageMaker AI console. 

To create a SageMaker Neo-compiled model, you need the following:

1. A Docker image Amazon ECR URI. You can select one that meets your needs from [this list](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html). 

1. An entry point script file:

   1. **For PyTorch and MXNet models:**

      *If you trained your model using SageMaker AI*, the training script must implement the functions described below. The training script serves as the entry point script during inference. In the example detailed in [ MNIST Training, Compilation and Deployment with MXNet Module and SageMaker Neo](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_neo_compilation_jobs/mxnet_mnist/mxnet_mnist_neo.html), the training script (`mnist.py`) implements the required functions.

      * If you did not train your model using SageMaker AI*, you need to provide an entry point script (`inference.py`) file that can be used at the time of inference. Based on the framework—MXNet or PyTorch—the inference script location must conform to the SageMaker Python SDK [Model Directory Structure for MxNet](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/using_mxnet.html#model-directory-structure) or [ Model Directory Structure for PyTorch](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#model-directory-structure). 

      When using Neo Inference Optimized Container images with **PyTorch** and **MXNet** on CPU and GPU instance types, the inference script must implement the following functions: 
      + `model_fn`: Loads the model. (Optional)
      + `input_fn`: Converts the incoming request payload into a numpy array.
      + `predict_fn`: Performs the prediction.
      + `output_fn`: Converts the prediction output into the response payload.
      + Alternatively, you can define `transform_fn` to combine `input_fn`, `predict_fn`, and `output_fn`.

      The following are examples of `inference.py` script within a directory named `code` (`code/inference.py`) for **PyTorch and MXNet (Gluon and Module).** The examples first load the model and then serve it on image data on a GPU: 

------
#### [ MXNet Module ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      from collections import namedtuple
      
      Batch = namedtuple('Batch', ['data'])
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0)
          mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
          exe = mod.bind(for_training=False,
                         data_shapes=[('data', (1,3,224,224))],
                         label_shapes=mod._label_shapes)
          mod.set_params(arg_params, aux_params, allow_missing=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          mod.forward(Batch([data]))
          return mod
      
      
      def transform_fn(mod, image, input_content_type, output_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
      
          # prediction/inference
          mod.forward(Batch([processed_input]))
      
          # post-processing
          prob = mod.get_outputs()[0].asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ MXNet Gluon ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx)
          
          # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True
          block.hybridize(static_alloc=True, static_shape=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          warm_up = block(data)
          return block
      
      
      def input_fn(image, input_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
          return processed_input
      
      
      def predict_fn(processed_input_data, block):
          # prediction/inference
          prediction = block(processed_input_data)
          return prediction
      
      def output_fn(prediction, output_content_type):
          # post-processing
          prob = prediction.asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ PyTorch 1.4 and Older ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default model_fn available which will load the model
          compiled using SageMaker Neo. You can override it here.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "compiled.pt"
          model_path = os.path.join(model_dir, 'compiled.pt')
          with torch.neo.config(model_dir=model_dir, neo_runtime=True):
              model = torch.jit.load(model_path)
              device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
              model = model.to(device)
      
          # We recommend that you run warm-up inference during model load
          sample_input_path = os.path.join(model_dir, 'sample_input.pkl')
          with open(sample_input_path, 'rb') as input_file:
              model_input = pickle.load(input_file)
          if torch.is_tensor(model_input):
              model_input = model_input.to(device)
              model(model_input)
          elif isinstance(model_input, tuple):
              model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp))
              model(*model_input)
          else:
              print("Only supports a torch tensor or a tuple of torch tensors")
              return model
      
      
      def transform_fn(model, request_body, request_content_type,
                       response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
              transforms.Resize(256),
              transforms.CenterCrop(224),
              transforms.ToTensor(),
              transforms.Normalize(
                  mean=[
                      0.485, 0.456, 0.406], std=[
                      0.229, 0.224, 0.225]),
          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
      
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------
#### [ PyTorch 1.5 and Newer ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default_model_fn available, which will load the model
          compiled using SageMaker Neo. You can override the default here.
          The model_fn only needs to be defined if your model needs extra
          steps to load, and can otherwise be left undefined.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "model.pt"
          model_path = os.path.join(model_dir, 'model.pt')
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          model = torch.jit.load(model_path, map_location=device)
          model = model.to(device)
      
          return model
      
      
      def transform_fn(model, request_body, request_content_type,
                          response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
                                      transforms.Resize(256),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize(
                                          mean=[
                                              0.485, 0.456, 0.406], std=[
                                              0.229, 0.224, 0.225]),
                                          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------

   1.  **For inf1 instances or onnx, xgboost, keras container images** 

      For all other Neo Inference-optimized container images, or inferentia instance types, the entry point script must implement the following functions for Neo Deep Learning Runtime: 
      + `neo_preprocess`: Converts the incoming request payload into a numpy array.
      + `neo_postprocess`: Converts the prediction output from Neo Deep Learning Runtime into the response body.
**Note**  
The preceding two functions do not use any of the functionalities of MXNet, PyTorch, or TensorFlow.

      For examples of how to use these functions, see [Neo Model Compilation Sample Notebooks](https://docs.aws.amazon.com//sagemaker/latest/dg/neo.html#neo-sample-notebooks). 

   1. **For TensorFlow models**

      If your model requires custom pre- and post-processing logic before data is sent to the model, then you must specify an entry point script `inference.py` file that can be used at the time of inference. The script should implement either a either a pair of `input_handler` and `output_handler` functions or a single handler function. 
**Note**  
Note that if handler function is implemented, `input_handler` and `output_handler` are ignored. 

      The following is a code example of `inference.py` script that you can put together with the compile model to perform custom pre- and post-processing on an image classification model. The SageMaker AI client sends the image file as an `application/x-image` content type to the `input_handler` function, where it is converted to JSON. The converted image file is then sent to the [Tensorflow Model Server (TFX)](https://www.tensorflow.org/tfx/serving/api_rest) using the REST API. 

      ```
      import json
      import numpy as np
      import json
      import io
      from PIL import Image
      
      def input_handler(data, context):
          """ Pre-process request input before it is sent to TensorFlow Serving REST API
          
          Args:
          data (obj): the request data, in format of dict or string
          context (Context): an object containing request and configuration details
          
          Returns:
          (dict): a JSON-serializable dict that contains request body and headers
          """
          f = data.read()
          f = io.BytesIO(f)
          image = Image.open(f).convert('RGB')
          batch_size = 1
          image = np.asarray(image.resize((512, 512)))
          image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
          body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
          return body
      
      def output_handler(data, context):
          """Post-process TensorFlow Serving output before it is returned to the client.
          
          Args:
          data (obj): the TensorFlow serving response
          context (Context): an object containing request and configuration details
          
          Returns:
          (bytes, string): data to return to client, response content type
          """
          if data.status_code != 200:
              raise ValueError(data.content.decode('utf-8'))
      
          response_content_type = context.accept_header
          prediction = data.content
          return prediction, response_content_type
      ```

      If there is no custom pre- or post-processing, the SageMaker AI client converts the file image to JSON in a similar way before sending it over to the SageMaker AI endpoint. 

      For more information, see the [Deploying to TensorFlow Serving Endpoints in the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#providing-python-scripts-for-pre-pos-processing). 

1. The Amazon S3 bucket URI that contains the compiled model artifacts. 

# Deploy a Compiled Model Using SageMaker SDK
<a name="neo-deployment-hosting-services-sdk"></a>

You must satisfy the [ prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) section if the model was compiled using AWS SDK for Python (Boto3), AWS CLI, or the Amazon SageMaker AI console. Follow one of the following use cases to deploy a model compiled with SageMaker Neo based on how you compiled your model.

**Topics**
+ [If you compiled your model using the SageMaker SDK](#neo-deployment-hosting-services-sdk-deploy-sm-sdk)
+ [If you compiled your model using MXNet or PyTorch](#neo-deployment-hosting-services-sdk-deploy-sm-boto3)
+ [If you compiled your model using Boto3, SageMaker console, or the CLI for TensorFlow](#neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow)

## If you compiled your model using the SageMaker SDK
<a name="neo-deployment-hosting-services-sdk-deploy-sm-sdk"></a>

The [sagemaker.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model) object handle for the compiled model supplies the [deploy()](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model#sagemaker.model.Model.deploy) function, which enables you to create an endpoint to serve inference requests. The function lets you set the number and type of instances that are used for the endpoint. You must choose an instance for which you have compiled your model. For example, in the job compiled in [Compile a Model (Amazon SageMaker SDK)](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation-sagemaker-sdk.html) section, this is `ml_c5`. 

```
predictor = compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.c5.4xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

## If you compiled your model using MXNet or PyTorch
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3"></a>

Create the SageMaker AI model and deploy it using the deploy() API under the framework-specific Model APIs. For MXNet, it is [MXNetModel](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html?highlight=MXNetModel#mxnet-model) and for PyTorch, it is [ PyTorchModel](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html?highlight=PyTorchModel#sagemaker.pytorch.model.PyTorchModel). When you are creating and deploying an SageMaker AI model, you must set `MMS_DEFAULT_RESPONSE_TIMEOUT` environment variable to `500` and specify the `entry_point` parameter as the inference script (`inference.py`) and the `source_dir` parameter as the directory location (`code`) of the inference script. To prepare the inference script (`inference.py`) follow the Prerequisites step. 

The following example shows how to use these functions to deploy a compiled model using the SageMaker AI SDK for Python: 

------
#### [ MXNet ]

```
from sagemaker.mxnet import MXNetModel

# Create SageMaker model and deploy an endpoint
sm_mxnet_compiled_model = MXNetModel(
    model_data='insert S3 path of compiled MXNet model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.8.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for MXNet',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_mxnet_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.4 and Older ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.4.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.5 and Newer ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.5',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------

**Note**  
The `AmazonSageMakerFullAccess` and `AmazonS3ReadOnlyAccess` policies must be attached to the `AmazonSageMaker-ExecutionRole` IAM role. 

## If you compiled your model using Boto3, SageMaker console, or the CLI for TensorFlow
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow"></a>

Construct a `TensorFlowModel` object, then call deploy: 

```
role='AmazonSageMaker-ExecutionRole'
model_path='S3 path for model file'
framework_image='inference container arn'
tf_model = TensorFlowModel(model_data=model_path,
                framework_version='1.15.3',
                role=role, 
                image_uri=framework_image)
instance_type='ml.c5.xlarge'
predictor = tf_model.deploy(instance_type=instance_type,
                    initial_instance_count=1)
```

See [Deploying directly from model artifacts](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#deploying-directly-from-model-artifacts) for more information. 

You can select a Docker image Amazon ECR URI that meets your needs from [this list](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html). 

For more information on how to construct a `TensorFlowModel` object, see the [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model). 

**Note**  
Your first inference request might have high latency if you deploy your model on a GPU. This is because an optimized compute kernel is made on the first inference request. We recommend that you make a warm-up file of inference requests and store that alongside your model file before sending it off to a TFX. This is known as “warming up” the model. 

The following code snippet demonstrates how to produce the warm-up file for image classification example in the [prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) section: 

```
import tensorflow as tf
from tensorflow_serving.apis import classification_pb2
from tensorflow_serving.apis import inference_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
from tensorflow_serving.apis import regression_pb2
import numpy as np

with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:       
    img = np.random.uniform(0, 1, size=[224, 224, 3]).astype(np.float32)
    img = np.expand_dims(img, axis=0)
    test_data = np.repeat(img, 1, axis=0)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'compiled_models'
    request.model_spec.signature_name = 'serving_default'
    request.inputs['Placeholder:0'].CopyFrom(tf.compat.v1.make_tensor_proto(test_data, shape=test_data.shape, dtype=tf.float32))
    log = prediction_log_pb2.PredictionLog(
    predict_log=prediction_log_pb2.PredictLog(request=request))
    writer.write(log.SerializeToString())
```

For more information on how to “warm up” your model, see the [TensorFlow TFX page](https://www.tensorflow.org/tfx/serving/saved_model_warmup).

# Deploy a Compiled Model Using Boto3
<a name="neo-deployment-hosting-services-boto3"></a>

You must satisfy the [ prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) section if the model was compiled using AWS SDK for Python (Boto3), AWS CLI, or the Amazon SageMaker AI console. Follow the steps below to create and deploy a SageMaker Neo-compiled model using [Amazon Web Services SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html). 

**Topics**
+ [Deploy the Model](#neo-deployment-hosting-services-boto3-steps)

## Deploy the Model
<a name="neo-deployment-hosting-services-boto3-steps"></a>

After you have satisfied the [ prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites), use the `create_model`, `create_enpoint_config`, and `create_endpoint` APIs. 

The following example shows how to use these APIs to deploy a model compiled with Neo: 

```
import boto3
client = boto3.client('sagemaker')

# create sagemaker model
create_model_api_response = client.create_model(
                                    ModelName='my-sagemaker-model',
                                    PrimaryContainer={
                                        'Image': <insert the ECR Image URI>,
                                        'ModelDataUrl': 's3://path/to/model/artifact/model.tar.gz',
                                        'Environment': {}
                                    },
                                    ExecutionRoleArn='ARN for AmazonSageMaker-ExecutionRole'
                            )

print ("create_model API response", create_model_api_response)

# create sagemaker endpoint config
create_endpoint_config_api_response = client.create_endpoint_config(
                                            EndpointConfigName='sagemaker-neomxnet-endpoint-configuration',
                                            ProductionVariants=[
                                                {
                                                    'VariantName': <provide your variant name>,
                                                    'ModelName': 'my-sagemaker-model',
                                                    'InitialInstanceCount': 1,
                                                    'InstanceType': <provide your instance type here>
                                                },
                                            ]
                                       )

print ("create_endpoint_config API response", create_endpoint_config_api_response)

# create sagemaker endpoint
create_endpoint_api_response = client.create_endpoint(
                                    EndpointName='provide your endpoint name',
                                    EndpointConfigName=<insert your endpoint config name>,
                                )

print ("create_endpoint API response", create_endpoint_api_response)
```

**Note**  
The `AmazonSageMakerFullAccess` and `AmazonS3ReadOnlyAccess` policies must be attached to the `AmazonSageMaker-ExecutionRole` IAM role. 

For full syntax of `create_model`, `create_endpoint_config`, and `create_endpoint` APIs, see [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model), [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config), and [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint), respectively. 

If you did not train your model using SageMaker AI, specify the following environment variables: 

------
#### [ MXNet and PyTorch ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region",
    "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
}
```

------
#### [ TensorFlow ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region"
}
```

------

 If you trained your model using SageMaker AI, specify the environment variable `SAGEMAKER_SUBMIT_DIRECTORY` as the full Amazon S3 bucket URI that contains the training script. 

# Deploy a Compiled Model Using the AWS CLI
<a name="neo-deployment-hosting-services-cli"></a>

You must satisfy the [ prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) section if the model was compiled using AWS SDK for Python (Boto3), AWS CLI, or the Amazon SageMaker AI console. Follow the steps below to create and deploy a SageMaker Neo-compiled model using the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/). 

**Topics**
+ [Deploy the Model](#neo-deploy-cli)

## Deploy the Model
<a name="neo-deploy-cli"></a>

After you have satisfied the [ prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites), use the `create-model`, `create-enpoint-config`, and `create-endpoint` AWS CLI commands. The following steps explain how to use these commands to deploy a model compiled with Neo: 



### Create a Model
<a name="neo-deployment-hosting-services-cli-create-model"></a>

From [Neo Inference Container Images](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html), select the inference image URI and then use `create-model` API to create a SageMaker AI model. You can do this with two steps: 

1. Create a `create_model.json` file. Within the file, specify the name of the model, the image URI, the path to the `model.tar.gz` file in your Amazon S3 bucket, and your SageMaker AI execution role: 

   ```
   {
       "ModelName": "insert model name",
       "PrimaryContainer": {
           "Image": "insert the ECR Image URI",
           "ModelDataUrl": "insert S3 archive URL",
           "Environment": {"See details below"}
       },
       "ExecutionRoleArn": "ARN for AmazonSageMaker-ExecutionRole"
   }
   ```

   If you trained your model using SageMaker AI, specify the following environment variable: 

   ```
   "Environment": {
       "SAGEMAKER_SUBMIT_DIRECTORY" : "[Full S3 path for *.tar.gz file containing the training script]"
   }
   ```

   If you did not train your model using SageMaker AI, specify the following environment variables: 

------
#### [ MXNet and PyTorch ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region",
       "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
   }
   ```

------
#### [ TensorFlow ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region"
   }
   ```

------
**Note**  
The `AmazonSageMakerFullAccess` and `AmazonS3ReadOnlyAccess` policies must be attached to the `AmazonSageMaker-ExecutionRole` IAM role. 

1. Run the following command:

   ```
   aws sagemaker create-model --cli-input-json file://create_model.json
   ```

   For the full syntax of the `create-model` API, see [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html). 

### Create an Endpoint Configuration
<a name="neo-deployment-hosting-services-cli-create-endpoint-config"></a>

After creating a SageMaker AI model, create the endpoint configuration using the `create-endpoint-config` API. To do this, create a JSON file with your endpoint configuration specifications. For example, you can use the following code template and save it as `create_config.json`: 

```
{
    "EndpointConfigName": "<provide your endpoint config name>",
    "ProductionVariants": [
        {
            "VariantName": "<provide your variant name>",
            "ModelName": "my-sagemaker-model",
            "InitialInstanceCount": 1,
            "InstanceType": "<provide your instance type here>",
            "InitialVariantWeight": 1.0
        }
    ]
}
```

Now run the following AWS CLI command to create your endpoint configuration: 

```
aws sagemaker create-endpoint-config --cli-input-json file://create_config.json
```

For the full syntax of the `create-endpoint-config` API, see [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html). 

### Create an Endpoint
<a name="neo-deployment-hosting-services-cli-create-endpoint"></a>

After you have created your endpoint configuration, create an endpoint using the `create-endpoint` API: 

```
aws sagemaker create-endpoint --endpoint-name '<provide your endpoint name>' --endpoint-config-name '<insert your endpoint config name>'
```

For the full syntax of the `create-endpoint` API, see [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html). 

# Deploy a Compiled Model Using the Console
<a name="neo-deployment-hosting-services-console"></a>

You must satisfy the [ prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) section if the model was compiled using AWS SDK for Python (Boto3), the AWS CLI, or the Amazon SageMaker AI console. Follow the steps below to create and deploy a SageMaker AI Neo-compiled model using the SageMaker AI console[https://console.aws.amazon.com/ SageMaker AI](https://console.aws.amazon.com/sagemaker/).

**Topics**
+ [Deploy the Model](#deploy-the-model-console-steps)

## Deploy the Model
<a name="deploy-the-model-console-steps"></a>

 After you have satisfied the [ prerequisites](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites), use the following steps to deploy a model compiled with Neo: 

1. Choose **Models**, and then choose **Create models** from the **Inference** group. On the **Create model** page, complete the **Model name**,** IAM role**, and **VPC** fields (optional), if needed.  
![\[Create Neo model for inference\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/create-pipeline-model.png)

1. To add information about the container used to deploy your model, choose **Add container** container, then choose **Next**. Complete the **Container input options**, **Location of inference code image**, and **Location of model artifacts**, and optionally, **Container host name**, and **Environmental variables** fields.  
![\[Create Neo model for inference\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo-deploy-console-container-definition.png)

1. To deploy Neo-compiled models, choose the following:
   + **Container input options**: Choose **Provide model artifacts and inference image**.
   + **Location of inference code image**: Choose the inference image URI from [Neo Inference Container Images](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html), depending on the AWS Region and kind of application. 
   + **Location of model artifact**: Enter the Amazon S3 bucket URI of the compiled model artifact generated by the Neo compilation API.
   + **Environment variables**:
     + Leave this field blank for **SageMaker XGBoost**.
     + If you trained your model using SageMaker AI, specify the environment variable `SAGEMAKER_SUBMIT_DIRECTORY` as the Amazon S3 bucket URI that contains the training script. 
     + If you did not train your model using SageMaker AI, specify the following environment variables:     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-console.html)

1. Confirm that the information for the containers is accurate, and then choose **Create model**. On the **Create model landing page**, choose **Create endpoint**.   
![\[Create Model landing page\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo-deploy-console-create-model-land-page.png)

1. In **Create and configure endpoint** diagram, specify the **Endpoint name**. For **Attach endpoint configuration**, choose **Create a new endpoint configuration**.  
![\[Neo console create and configure endpoint UI.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo-deploy-console-config-endpoint.png)

1. In **New endpoint configuration** page, specify the **Endpoint configuration name**.   
![\[Neo console new endpoint configuration UI.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo-deploy-console-new-endpoint-config.png)

1. Choose **Edit** next to the name of the model and specify the correct **Instance type** on the **Edit Production Variant** page. It is imperative that the **Instance type** value match the one specified in your compilation job.  
![\[Neo console new endpoint configuration UI.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/neo-deploy-console-edit-production-variant.png)

1. Choose **Save**.

1. On the **New endpoint configuration** page, choose **Create endpoint configuration**, and then choose **Create endpoint**. 

# Inference Requests With a Deployed Service
<a name="neo-requests"></a>

If you have followed instructions in [Deploy a Model](neo-deployment-hosting-services.md), you should have a SageMaker AI endpoint set up and running. Regardless of how you deployed your Neo-compiled model, there are three ways you can submit inference requests: 

**Topics**
+ [Request Inferences from a Deployed Service (Amazon SageMaker SDK)](neo-requests-sdk.md)
+ [Request Inferences from a Deployed Service (Boto3)](neo-requests-boto3.md)
+ [Request Inferences from a Deployed Service (AWS CLI)](neo-requests-cli.md)

# Request Inferences from a Deployed Service (Amazon SageMaker SDK)
<a name="neo-requests-sdk"></a>

Use the following the code examples to request inferences from your deployed service based on the framework you used to train your model. The code examples for the different frameworks are similar. The main difference is that TensorFlow requires `application/json` as the content type. 

 

## PyTorch and MXNet
<a name="neo-requests-sdk-py-mxnet"></a>

 If you are using **PyTorch v1.4 or later** or **MXNet 1.7.0 or later** and you have an Amazon SageMaker AI endpoint `InService`, you can make inference requests using the `predictor` package of the SageMaker AI SDK for Python. 

**Note**  
The API varies based on the SageMaker AI SDK for Python version:  
For version 1.x, use the [https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor](https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor) and [https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor.predict](https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor.predict) API.
For version 2.x, use the [https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor) and the [https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict) API.

The following code example shows how to use these APIs to send an image for inference: 

------
#### [ SageMaker Python SDK v1.x ]

```
from sagemaker.predictor import RealTimePredictor

endpoint = 'insert name of your endpoint here'

# Read image into memory
payload = None
with open("image.jpg", 'rb') as f:
    payload = f.read()

predictor = RealTimePredictor(endpoint=endpoint, content_type='application/x-image')
inference_response = predictor.predict(data=payload)
print (inference_response)
```

------
#### [ SageMaker Python SDK v2.x ]

```
from sagemaker.predictor import Predictor

endpoint = 'insert name of your endpoint here'

# Read image into memory
payload = None
with open("image.jpg", 'rb') as f:
    payload = f.read()
    
predictor = Predictor(endpoint)
inference_response = predictor.predict(data=payload)
print (inference_response)
```

------

## TensorFlow
<a name="neo-requests-sdk-py-tf"></a>

The following code example shows how to use the SageMaker Python SDK API to send an image for inference: 

```
from sagemaker.predictor import Predictor
from PIL import Image
import numpy as np
import json

endpoint = 'insert the name of your endpoint here'

# Read image into memory
image = Image.open(input_file)
batch_size = 1
image = np.asarray(image.resize((224, 224)))
image = image / 128 - 1
image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
body = json.dumps({"instances": image.tolist()})
    
predictor = Predictor(endpoint)
inference_response = predictor.predict(data=body)
print(inference_response)
```

# Request Inferences from a Deployed Service (Boto3)
<a name="neo-requests-boto3"></a>

 You can submit inference requests using SageMaker AI SDK for Python (Boto3) client and [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint) API once you have an SageMaker AI endpoint `InService`. The following code example shows how to send an image for inference: 

------
#### [ PyTorch and MXNet ]

```
import boto3

import json
 
endpoint = 'insert name of your endpoint here'
 
runtime = boto3.Session().client('sagemaker-runtime')
 
# Read image into memory
with open(image, 'rb') as f:
    payload = f.read()
# Send image via InvokeEndpoint API
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/x-image', Body=payload)

# Unpack response
result = json.loads(response['Body'].read().decode())
```

------
#### [ TensorFlow ]

For TensorFlow submit an input with `application/json` for the content type. 

```
from PIL import Image
import numpy as np
import json
import boto3

client = boto3.client('sagemaker-runtime') 
input_file = 'path/to/image'
image = Image.open(input_file)
batch_size = 1
image = np.asarray(image.resize((224, 224)))
image = image / 128 - 1
image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
body = json.dumps({"instances": image.tolist()})
ioc_predictor_endpoint_name = 'insert name of your endpoint here'
content_type = 'application/json'   
ioc_response = client.invoke_endpoint(
    EndpointName=ioc_predictor_endpoint_name,
    Body=body,
    ContentType=content_type
 )
```

------
#### [ XGBoost ]

 For an XGBoost application, you should submit a CSV text instead: 

```
import boto3
import json
 
endpoint = 'insert your endpoint name here'
 
runtime = boto3.Session().client('sagemaker-runtime')
 
csv_text = '1,-1.0,1.0,1.5,2.6'
# Send CSV text via InvokeEndpoint API
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='text/csv', Body=csv_text)
# Unpack response
result = json.loads(response['Body'].read().decode())
```

------

 Note that BYOM allows for a custom content type. For more information, see [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html). 

# Request Inferences from a Deployed Service (AWS CLI)
<a name="neo-requests-cli"></a>

Inference requests can be made with the [https://docs.aws.amazon.com/cli/latest/reference/sagemaker-runtime/invoke-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker-runtime/invoke-endpoint.html) once you have an Amazon SageMaker AI endpoint `InService`. You can make inference requests with the AWS Command Line Interface (AWS CLI). The following example shows how to send an image for inference: 

```
aws sagemaker-runtime invoke-endpoint --endpoint-name 'insert name of your endpoint here' --body fileb://image.jpg --content-type=application/x-image output_file.txt
```

An `output_file.txt` with information about your inference requests is made if the inference was successful. 

 For TensorFlow submit an input with `application/json` as the content type. 

```
aws sagemaker-runtime invoke-endpoint --endpoint-name 'insert name of your endpoint here' --body fileb://input.json --content-type=application/json output_file.txt
```

# Inference Container Images
<a name="neo-deployment-hosting-services-container-images"></a>

SageMaker Neo now provides inference image URI information for `ml_*` targets. For more information see [DescribeCompilationJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeCompilationJob.html#sagemaker-DescribeCompilationJob-response-InferenceImage).

Based on your use case, replace the highlighted portion in the inference image URI template provided below with appropriate values. 

## Amazon SageMaker AI XGBoost
<a name="inference-container-collapse-xgboost"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/xgboost-neo:latest
```

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used.

## Keras
<a name="inference-container-collapse-keras"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-keras:fx_version-instance_type-py3
```

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used.

Replace *fx\$1version* with `2.2.4`.

Replace *instance\$1type* with either `cpu` or `gpu`.

## MXNet
<a name="inference-container-collapse-mxnet"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-mxnet:fx_version-instance_type-py3
```

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used. 

Replace *fx\$1version* with `1.8.0`. 

Replace *instance\$1type* with either `cpu` or `gpu`. 

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-mxnet:fx_version-instance_type-py3
```

Replace *aws\$1region* with either `us-east-1` or `us-west-2`. 

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used. 

Replace *fx\$1version* with `1.5.1`. 

Replace *`instance_type`* with `inf`.

------

## ONNX
<a name="inference-container-collapse-onnx"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-onnx:fx_version-instance_type-py3
```

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used.

Replace *fx\$1version* with `1.5.0`.

Replace *instance\$1type* with either `cpu` or `gpu`.

## PyTorch
<a name="inference-container-collapse-pytorch"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-pytorch:fx_version-instance_type-py3
```

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used. 

Replace *fx\$1version* with `1.4`, `1.5`, `1.6`, `1.7`, `1.8`, `1.12`, `1.13`, or `2.0`.

Replace *instance\$1type* with either `cpu` or `gpu`. 

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-pytorch:fx_version-instance_type-py3
```

Replace *aws\$1region* with either `us-east-1` or `us-west-2`. 

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used. 

Replace *fx\$1version* with `1.5.1`. 

Replace *`instance_type`* with `inf`.

------
#### [ Inferentia2 and Trainium1 ]

```
763104351884.dkr.ecr.aws_region.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py38-sdk2.10.0-ubuntu20.04
```

Replace *aws\$1region* with `us-east-2` for Inferentia2, and `us-east-1` for Trainium1.

------

## TensorFlow
<a name="inference-container-collapse-tf"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-tensorflow:fx_version-instance_type-py3
```

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used. 

Replace *fx\$1version* with `1.15.3` or `2.9`. 

Replace *instance\$1type* with either `cpu` or `gpu`. 

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-tensorflow:fx_version-instance_type-py3
```

Replace *aws\$1account\$1id* from the table at the end of this page based on the *aws\$1region* you used. Note that for instance type `inf` only `us-east-1` and `us-west-2` are supported.

Replace *fx\$1version* with `1.15.0`

Replace *instance\$1type* with `inf`.

------
#### [ Inferentia2 and Trainium1 ]

```
763104351884.dkr.ecr.aws_region.amazonaws.com/tensorflow-inference-neuronx:2.10.1-neuronx-py38-sdk2.10.0-ubuntu20.04
```

Replace *aws\$1region* with `us-east-2` for Inferentia2, and `us-east-1` for Trainium1.

------

The following table maps *aws\$1account\$1id* with *aws\$1region*. Use this table to find the correct inference image URI you need for your application. 


| aws\$1account\$1id | aws\$1region | 
| --- | --- | 
| 785573368785 | us-east-1 | 
| 007439368137 | us-east-2 | 
| 710691900526 | us-west-1 | 
| 301217895009 | us-west-2 | 
| 802834080501 | eu-west-1 | 
| 205493899709 | eu-west-2 | 
| 254080097072 | eu-west-3 | 
| 601324751636 | eu-north-1 | 
| 966458181534 | eu-south-1 | 
| 746233611703 | eu-central-1 | 
| 110948597952 | ap-east-1 | 
| 763008648453 | ap-south-1 | 
| 941853720454 | ap-northeast-1 | 
| 151534178276 | ap-northeast-2 | 
| 925152966179 | ap-northeast-3 | 
| 324986816169 | ap-southeast-1 | 
| 355873309152 | ap-southeast-2 | 
| 474822919863 | cn-northwest-1 | 
| 472730292857 | cn-north-1 | 
| 756306329178 | sa-east-1 | 
| 464438896020 | ca-central-1 | 
| 836785723513 | me-south-1 | 
| 774647643957 | af-south-1 | 
| 275950707576 | il-central-1 | 

# Edge Devices
<a name="neo-edge-devices"></a>

Amazon SageMaker Neo provides compilation support for popular machine learning frameworks. You can deploy your Neo-compiled edge devices such as the Raspberry Pi 3, Texas Instruments' Sitara, Jetson TX1, and more. For a full list of supported frameworks and edge devices, see [Supported Frameworks, Devices, Systems, and Architectures](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-devices-edge.html). 

You must configure your edge device so that it can use AWS services. One way to do this is to install DLR and Boto3 to your device. To do this, you must set up the authentication credentials. See [Boto3 AWS Configuration](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration) for more information. Once your model is compiled and your edge device is configured, you can download the model from Amazon S3 to your edge device. From there, you can use the [Deep Learning Runtime (DLR)](https://neo-ai-dlr.readthedocs.io/en/latest/index.html) to read the compiled model and make inferences. 

For first-time users, we recommend you check out the [Getting Started](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-getting-started-edge.html) guide. This guide walks you through how to set up your credentials, compile a model, deploy your model to a Raspberry Pi 3, and make inferences on images. 

**Topics**
+ [Supported Frameworks, Devices, Systems, and Architectures](neo-supported-devices-edge.md)
+ [Deploy Models](neo-deployment-edge.md)
+ [Set up Neo on Edge Devices](neo-getting-started-edge.md)

# Supported Frameworks, Devices, Systems, and Architectures
<a name="neo-supported-devices-edge"></a>

Amazon SageMaker Neo supports common machine learning frameworks, edge devices, operating systems, and chip architectures. Find out if Neo supports your framework, edge device, OS, and chip architecture by selecting one of the topics below.

You can find a list of models that have been tested by the Amazon SageMaker Neo Team in the [Tested Models](neo-supported-edge-tested-models.md) section.

**Note**  
Ambarella devices require additional files to be included within the compressed TAR file before it is sent for compilation. For more information, see [Troubleshoot Ambarella Errors](neo-troubleshooting-target-devices-ambarella.md).
TIM-VX (libtim-vx.so) is required for i.MX 8M Plus. For information on how to build TIM-VX, see the [TIM-VX GitHub repository](https://github.com/VeriSilicon/TIM-VX).

**Topics**
+ [Supported Frameworks](neo-supported-devices-edge-frameworks.md)
+ [Supported Devices, Chip Architectures, and Systems](neo-supported-devices-edge-devices.md)
+ [Tested Models](neo-supported-edge-tested-models.md)

# Supported Frameworks
<a name="neo-supported-devices-edge-frameworks"></a>

Amazon SageMaker Neo supports the following frameworks. 


| Framework | Framework Version | Model Version | Models | Model Formats (packaged in \$1.tar.gz) | Toolkits | 
| --- | --- | --- | --- | --- | --- | 
| MXNet | 1.8 | Supports 1.8 or earlier | Image Classification, Object Detection, Semantic Segmentation, Pose Estimation, Activity Recognition | One symbol file (.json) and one parameter file (.params) | GluonCV v0.8.0 | 
| ONNX | 1.7 | Supports 1.7 or earlier | Image Classification, SVM | One model file (.onnx) |  | 
| Keras | 2.2 | Supports 2.2 or earlier | Image Classification | One model definition file (.h5) |  | 
| PyTorch | 1.7, 1.8 | Supports 1.7, 1.8 or earlier | Image Classification, Object Detection | One model definition file (.pth) |  | 
| TensorFlow | 1.15, 2.4, 2.5 (only for ml.inf1.\$1 instances) | Supports 1.15, 2.4, 2.5 (only for ml.inf1.\$1 instances) or earlier | Image Classification, Object Detection | \$1For saved models, one .pb or one .pbtxt file and a variables directory that contains variables \$1For frozen models, only one .pb or .pbtxt file |  | 
| TensorFlow-Lite | 1.15 | Supports 1.15 or earlier | Image Classification, Object Detection | One model definition flatbuffer file (.tflite) |  | 
| XGBoost | 1.3 | Supports 1.3 or earlier | Decision Trees | One XGBoost model file (.model) where the number of nodes in a tree is less than 2^31 |  | 
| DARKNET |  |  | Image Classification, Object Detection (Yolo model is not supported) | One config (.cfg) file and one weights (.weights) file |  | 

# Supported Devices, Chip Architectures, and Systems
<a name="neo-supported-devices-edge-devices"></a>

Amazon SageMaker Neo supports the following devices, chip architectures, and operating systems.

## Devices
<a name="neo-supported-edge-devices"></a>

You can select a device using the dropdown list in the [Amazon SageMaker AI console](https://console.aws.amazon.com/sagemaker) or by specifying the `TargetDevice` in the output configuration of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCompilationJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCompilationJob.html) API.

You can choose from one of the following edge devices: 


| Device List | System on a Chip (SoC) | Operating System | Architecture | Accelerator | Compiler Options Example | 
| --- | --- | --- | --- | --- | --- | 
| aisage | None | Linux | ARM64 | Mali | None | 
| amba\$1cv2 | CV2 | Arch Linux | ARM64 | cvflow | None | 
| amba\$1cv22 | CV22 | Arch Linux | ARM64 | cvflow | None | 
| amba\$1cv25 | CV25 | Arch Linux | ARM64 | cvflow | None | 
| coreml | None | iOS, macOS | None | None | \$1"class\$1labels": "imagenet\$1labels\$11000.txt"\$1 | 
| imx8qm | NXP imx8 | Linux | ARM64 | None | None | 
| imx8mplus | i.MX 8M Plus | Linux | ARM64 | NPU | None | 
| jacinto\$1tda4vm | TDA4VM | Linux | ARM | TDA4VM | None | 
| jetson\$1nano | None | Linux | ARM64 | NVIDIA | \$1'gpu-code': 'sm\$153', 'trt-ver': '5.0.6', 'cuda-ver': '10.0'\$1For `TensorFlow2`, `{'JETPACK_VERSION': '4.6', 'gpu_code': 'sm_72'}` | 
| jetson\$1tx1 | None | Linux | ARM64 | NVIDIA | \$1'gpu-code': 'sm\$153', 'trt-ver': '6.0.1', 'cuda-ver': '10.0'\$1 | 
| jetson\$1tx2 | None | Linux | ARM64 | NVIDIA | \$1'gpu-code': 'sm\$162', 'trt-ver': '6.0.1', 'cuda-ver': '10.0'\$1 | 
| jetson\$1xavier | None | Linux | ARM64 | NVIDIA | \$1'gpu-code': 'sm\$172', 'trt-ver': '5.1.6', 'cuda-ver': '10.0'\$1 | 
| qcs605 | None | Android | ARM64 | Mali | \$1'ANDROID\$1PLATFORM': 27\$1 | 
| qcs603 | None | Android | ARM64 | Mali | \$1'ANDROID\$1PLATFORM': 27\$1 | 
| rasp3b | ARM A56 | Linux | ARM\$1EABIHF | None | \$1'mattr': ['\$1neon']\$1 | 
| rasp4b | ARM A72 | None | None | None | None | 
| rk3288 | None | Linux | ARM\$1EABIHF | Mali | None | 
| rk3399 | None | Linux | ARM64 | Mali | None | 
| sbe\$1c | None | Linux | x86\$164 | None | \$1'mcpu': 'core-avx2'\$1 | 
| sitara\$1am57x | AM57X | Linux | ARM64 | EVE and/or C66x DSP | None | 
| x86\$1win32 | None | Windows 10 | X86\$132 | None | None | 
| x86\$1win64 | None | Windows 10 | X86\$132 | None | None | 

For more information about JSON key-value compiler options for each target device, see the `CompilerOptions` field in the [`OutputConfig` API](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputConfig.html) data type.

## Systems and Chip Architectures
<a name="neo-supported-edge-granular"></a>

The following look-up tables provide information regarding available operating systems and architectures for Neo model compilation jobs. 

------
#### [ Linux ]


| Accelerator | X86\$164 | X86 | ARM64 | ARM\$1EABIHF | ARM\$1EABI | 
| --- | --- | --- | --- | --- | --- | 
| No accelerator (CPU) | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | 
| Nvidia GPU | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | 
| Intel\$1Graphics | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | 
| ARM Mali | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | 

------
#### [ Android ]


| Accelerator | X86\$164 | X86 | ARM64 | ARM\$1EABIHF | ARM\$1EABI | 
| --- | --- | --- | --- | --- | --- | 
| No accelerator (CPU) | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | 
| Nvidia GPU | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | 
| Intel\$1Graphics | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | 
| ARM Mali | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | 

------
#### [ Windows ]


| Accelerator | X86\$164 | X86 | ARM64 | ARM\$1EABIHF | ARM\$1EABI | 
| --- | --- | --- | --- | --- | --- | 
| No accelerator (CPU) | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | ![\[alt text not found\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/negative_icon.png) No | 

------

# Tested Models
<a name="neo-supported-edge-tested-models"></a>

The following collapsible sections provide information about machine learning models that were tested by the Amazon SageMaker Neo team. Expand the collapsible section based on your framework to check if a model was tested.

**Note**  
This is not a comprehensive list of models that can be compiled with Neo.

See [Supported Frameworks](neo-supported-devices-edge-frameworks.md) and [SageMaker AI Neo Supported Operators](https://aws.amazon.com/releasenotes/sagemaker-neo-supported-frameworks-and-operators/) to find out if you can compile your model with SageMaker Neo.

## DarkNet
<a name="collapsible-section-01"></a>


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| Alexnet |  |  |  |  |  |  |  |  |  | 
| Resnet50 | X | X |  | X | X | X |  | X | X | 
| YOLOv2 |  |  |  | X | X | X |  | X | X | 
| YOLOv2\$1tiny | X | X |  | X | X | X |  | X | X | 
| YOLOv3\$1416 |  |  |  | X | X | X |  | X | X | 
| YOLOv3\$1tiny | X | X |  | X | X | X |  | X | X | 

## MXNet
<a name="collapsible-section-02"></a>


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| Alexnet |  |  | X |  |  |  |  |  |  | 
| Densenet121 |  |  | X |  |  |  |  |  |  | 
| DenseNet201 | X | X | X | X | X | X |  | X | X | 
| GoogLeNet | X | X |  | X | X | X |  | X | X | 
| InceptionV3 |  |  |  | X | X | X |  | X | X | 
| MobileNet0.75 | X | X |  | X | X | X |  |  | X | 
| MobileNet1.0 | X | X | X | X | X | X |  |  | X | 
| MobileNetV2\$10.5 | X | X |  | X | X | X |  |  | X | 
| MobileNetV2\$11.0 | X | X | X | X | X | X | X | X | X | 
| MobileNetV3\$1Large | X | X | X | X | X | X | X | X | X | 
| MobileNetV3\$1Small | X | X | X | X | X | X | X | X | X | 
| ResNeSt50 |  |  |  | X | X |  |  | X | X | 
| ResNet18\$1v1 | X | X | X | X | X | X |  |  | X | 
| ResNet18\$1v2 | X | X |  | X | X | X |  |  | X | 
| ResNet50\$1v1 | X | X | X | X | X | X |  | X | X | 
| ResNet50\$1v2 | X | X | X | X | X | X |  | X | X | 
| ResNext101\$132x4d |  |  |  |  |  |  |  |  |  | 
| ResNext50\$132x4d | X |  | X | X | X |  |  | X | X | 
| SENet\$1154 |  |  |  | X | X | X |  | X | X | 
| SE\$1ResNext50\$132x4d | X | X |  | X | X | X |  | X | X | 
| SqueezeNet1.0 | X | X | X | X | X | X |  |  | X | 
| SqueezeNet1.1 | X | X | X | X | X | X |  | X | X | 
| VGG11 | X | X | X | X | X |  |  | X | X | 
| Xception | X | X | X | X | X | X |  | X | X | 
| darknet53 | X | X |  | X | X | X |  | X | X | 
| resnet18\$1v1b\$10.89 | X | X |  | X | X | X |  |  | X | 
| resnet50\$1v1d\$10.11 | X | X |  | X | X | X |  |  | X | 
| resnet50\$1v1d\$10.86 | X | X | X | X | X | X |  | X | X | 
| ssd\$1512\$1mobilenet1.0\$1coco | X |  | X | X | X | X |  | X | X | 
| ssd\$1512\$1mobilenet1.0\$1voc | X |  | X | X | X | X |  | X | X | 
| ssd\$1resnet50\$1v1 | X |  | X | X | X |  |  | X | X | 
| yolo3\$1darknet53\$1coco | X |  |  | X | X |  |  | X | X | 
| yolo3\$1mobilenet1.0\$1coco | X | X |  | X | X | X |  | X | X | 
| deeplab\$1resnet50 |  |  | X |  |  |  |  |  |  | 

## Keras
<a name="collapsible-section-03"></a>


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| densenet121 | X | X | X | X | X | X |  | X | X | 
| densenet201 | X | X | X | X | X | X |  |  | X | 
| inception\$1v3 | X | X |  | X | X | X |  | X | X | 
| mobilenet\$1v1 | X | X | X | X | X | X |  | X | X | 
| mobilenet\$1v2 | X | X | X | X | X | X |  | X | X | 
| resnet152\$1v1 |  |  |  | X | X |  |  |  | X | 
| resnet152\$1v2 |  |  |  | X | X |  |  |  | X | 
| resnet50\$1v1 | X | X | X | X | X |  |  | X | X | 
| resnet50\$1v2 | X | X | X | X | X | X |  | X | X | 
| vgg16 |  |  | X | X | X |  |  | X | X | 

## ONNX
<a name="collapsible-section-04"></a>


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| alexnet |  |  | X |  |  |  |  |  |  | 
| mobilenetv2-1.0 | X | X | X | X | X | X |  | X | X | 
| resnet18v1 | X |  |  | X | X |  |  |  | X | 
| resnet18v2 | X |  |  | X | X |  |  |  | X | 
| resnet50v1 | X |  | X | X | X |  |  | X | X | 
| resnet50v2 | X |  | X | X | X |  |  | X | X | 
| resnet152v1 |  |  |  | X | X | X |  |  | X | 
| resnet152v2 |  |  |  | X | X | X |  |  | X | 
| squeezenet1.1 | X |  | X | X | X | X |  | X | X | 
| vgg19 |  |  | X |  |  |  |  |  | X | 

## PyTorch (FP32)
<a name="collapsible-section-05"></a>


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Ambarella CV25 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| densenet121 | X | X | X | X | X | X | X |  | X | X | 
| inception\$1v3 |  | X |  |  | X | X | X |  | X | X | 
| resnet152 |  |  |  |  | X | X | X |  |  | X | 
| resnet18 | X | X |  |  | X | X | X |  |  | X | 
| resnet50 | X | X | X | X | X | X |  |  | X | X | 
| squeezenet1.0 | X | X |  |  | X | X | X |  |  | X | 
| squeezenet1.1 | X | X | X | X | X | X | X |  | X | X | 
| yolov4 |  |  |  |  | X | X |  |  |  |  | 
| yolov5 |  |  |  | X | X | X |  |  |  |  | 
| fasterrcnn\$1resnet50\$1fpn |  |  |  |  | X | X |  |  |  |  | 
| maskrcnn\$1resnet50\$1fpn |  |  |  |  | X | X |  |  |  |  | 

## TensorFlow
<a name="collapsible-section-06"></a>

------
#### [ TensorFlow ]


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Ambarella CV25 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| densenet201 | X | X | X | X | X | X | X |  | X | X | 
| inception\$1v3 | X | X | X |  | X | X | X |  | X | X | 
| mobilenet100\$1v1 | X | X | X |  | X | X | X |  |  | X | 
| mobilenet100\$1v2.0 | X | X | X |  | X | X | X |  | X | X | 
| mobilenet130\$1v2 | X | X |  |  | X | X | X |  |  | X | 
| mobilenet140\$1v2 | X | X | X |  | X | X | X |  | X | X | 
| resnet50\$1v1.5 | X | X |  |  | X | X | X |  | X | X | 
| resnet50\$1v2 | X | X | X | X | X | X | X |  | X | X | 
| squeezenet | X | X | X | X | X | X | X |  | X | X | 
| mask\$1rcnn\$1inception\$1resnet\$1v2 |  |  |  |  | X |  |  |  |  |  | 
| ssd\$1mobilenet\$1v2 |  |  |  |  | X | X |  |  |  |  | 
| faster\$1rcnn\$1resnet50\$1lowproposals |  |  |  |  | X |  |  |  |  |  | 
| rfcn\$1resnet101 |  |  |  |  | X |  |  |  |  |  | 

------
#### [ TensorFlow.Keras ]


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| DenseNet121  | X | X |  | X | X | X |  | X | X | 
| DenseNet201 | X | X |  | X | X | X |  |  | X | 
| InceptionV3 | X | X |  | X | X | X |  | X | X | 
| MobileNet | X | X |  | X | X | X |  | X | X | 
| MobileNetv2 | X | X |  | X | X | X |  | X | X | 
| NASNetLarge |  |  |  | X | X |  |  | X | X | 
| NASNetMobile | X | X |  | X | X | X |  | X | X | 
| ResNet101 |  |  |  | X | X | X |  |  | X | 
| ResNet101V2 |  |  |  | X | X | X |  |  | X | 
| ResNet152 |  |  |  | X | X |  |  |  | X | 
| ResNet152v2 |  |  |  | X | X |  |  |  | X | 
| ResNet50 | X | X |  | X | X |  |  | X | X | 
| ResNet50V2 | X | X |  | X | X | X |  | X | X | 
| VGG16 |  |  |  | X | X |  |  | X | X | 
| Xception | X | X |  | X | X | X |  | X | X | 

------

## TensorFlow-Lite
<a name="collapsible-section-07"></a>

------
#### [ TensorFlow-Lite (FP32) ]


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | i.MX 8M Plus | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| densenet\$12018\$104\$127 | X |  |  | X | X | X |  |  | X |  | 
| inception\$1resnet\$1v2\$12018\$104\$127 |  |  |  | X | X | X |  |  | X |  | 
| inception\$1v3\$12018\$104\$127 |  |  |  | X | X | X |  |  | X | X | 
| inception\$1v4\$12018\$104\$127 |  |  |  | X | X | X |  |  | X | X | 
| mnasnet\$10.5\$1224\$109\$107\$12018 | X |  |  | X | X | X |  |  | X |  | 
| mnasnet\$11.0\$1224\$109\$107\$12018 | X |  |  | X | X | X |  |  | X |  | 
| mnasnet\$11.3\$1224\$109\$107\$12018 | X |  |  | X | X | X |  |  | X |  | 
| mobilenet\$1v1\$10.25\$1128 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v1\$10.25\$1224 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v1\$10.5\$1128 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v1\$10.5\$1224 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v1\$10.75\$1128 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v1\$10.75\$1224 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v1\$11.0\$1128 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v1\$11.0\$1192 | X |  |  | X | X | X |  |  | X | X | 
| mobilenet\$1v2\$11.0\$1224 | X |  |  | X | X | X |  |  | X | X | 
| resnet\$1v2\$1101 |  |  |  | X | X | X |  |  | X |  | 
| squeezenet\$12018\$104\$127 | X |  |  | X | X | X |  |  | X |  | 

------
#### [ TensorFlow-Lite (INT8) ]


| Models | ARM V8 | ARM Mali | Ambarella CV22 | Nvidia | Panorama | TI TDA4VM | Qualcomm QCS603 | X86\$1Linux | X86\$1Windows | i.MX 8M Plus | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| inception\$1v1 |  |  |  |  |  |  | X |  |  | X | 
| inception\$1v2 |  |  |  |  |  |  | X |  |  | X | 
| inception\$1v3 | X |  |  |  |  | X | X |  | X | X | 
| inception\$1v4\$1299 | X |  |  |  |  | X | X |  | X | X | 
| mobilenet\$1v1\$10.25\$1128 | X |  |  |  |  | X |  |  | X | X | 
| mobilenet\$1v1\$10.25\$1224 | X |  |  |  |  | X |  |  | X | X | 
| mobilenet\$1v1\$10.5\$1128 | X |  |  |  |  | X |  |  | X | X | 
| mobilenet\$1v1\$10.5\$1224 | X |  |  |  |  | X |  |  | X | X | 
| mobilenet\$1v1\$10.75\$1128 | X |  |  |  |  | X |  |  | X | X | 
| mobilenet\$1v1\$10.75\$1224 | X |  |  |  |  | X | X |  | X | X | 
| mobilenet\$1v1\$11.0\$1128 | X |  |  |  |  | X |  |  | X | X | 
| mobilenet\$1v1\$11.0\$1224 | X |  |  |  |  | X | X |  | X | X | 
| mobilenet\$1v2\$11.0\$1224 | X |  |  |  |  | X | X |  | X | X | 
| deeplab-v3\$1513 |  |  |  |  |  |  | X |  |  |  | 

------

# Deploy Models
<a name="neo-deployment-edge"></a>

You can deploy the compute module to resource-constrained edge devices by: downloading the compiled model from Amazon S3 to your device and using [DLR](https://github.com/neo-ai/neo-ai-dlr), or you can use [AWS IoT Greengrass](https://docs.aws.amazon.com/greengrass/latest/developerguide/what-is-gg.html).

Before moving on, make sure your edge device must be supported by SageMaker Neo. See, [Supported Frameworks, Devices, Systems, and Architectures](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-devices-edge.html) to find out what edge devices are supported. Make sure that you specified your target edge device when you submitted the compilation job, see [Use Neo to Compile a Model](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html).

## Deploy a Compiled Model (DLR)
<a name="neo-deployment-dlr"></a>

[DLR](https://github.com/neo-ai/neo-ai-dlr) is a compact, common runtime for deep learning models and decision tree models. DLR uses the [TVM](https://github.com/neo-ai/tvm) runtime, [Treelite](https://treelite.readthedocs.io/en/latest/install.html) runtime, NVIDIA TensorRT™, and can include other hardware-specific runtimes. DLR provides unified Python/C\$1\$1 APIs for loading and running compiled models on various devices.

You can install latest release of DLR package using the following pip command:

```
pip install dlr
```

For installation of DLR on GPU targets or non-x86 edge devices, please refer to [Releases](https://github.com/neo-ai/neo-ai-dlr/releases) for prebuilt binaries, or [Installing DLR](https://neo-ai-dlr.readthedocs.io/en/latest/install.html) for building DLR from source. For example, to install DLR for Raspberry Pi 3, you can use: 

```
pip install https://neo-ai-dlr-release.s3-us-west-2.amazonaws.com/v1.3.0/pi-armv7l-raspbian4.14.71-glibc2_24-libstdcpp3_4/dlr-1.3.0-py3-none-any.whl
```

## Deploy a Model (AWS IoT Greengrass)
<a name="neo-deployment-greengrass"></a>

[AWS IoT Greengrass](https://docs.aws.amazon.com/greengrass/latest/developerguide/what-is-gg.html) extends cloud capabilities to local devices. It enables devices to collect and analyze data closer to the source of information, react autonomously to local events, and communicate securely with each other on local networks. With AWS IoT Greengrass, you can perform machine learning inference at the edge on locally generated data using cloud-trained models. Currently, you can deploy models on to all AWS IoT Greengrass devices based on ARM Cortex-A, Intel Atom, and Nvidia Jetson series processors. For more information on deploying a Lambda inference application to perform machine learning inferences with AWS IoT Greengrass, see [ How to configure optimized machine learning inference using the AWS Management Console](https://docs.aws.amazon.com/greengrass/latest/developerguide/ml-dlc-console.html).

# Set up Neo on Edge Devices
<a name="neo-getting-started-edge"></a>

This guide to getting started with Amazon SageMaker Neo shows you how to compile a model, set up your device, and make inferences on your device. Most of the code examples use Boto3. We provide commands using AWS CLI where applicable, as well as instructions on how to satisfy prerequisites for Neo. 

**Note**  
You can run the following code snippets on your local machine, within a SageMaker notebook, within Amazon SageMaker Studio, or (depending on your edge device) on your edge device. The setup is similar; however, there are two main exceptions if you run this guide within a SageMaker notebook instance or SageMaker Studio session:   
You do not need to install Boto3.
You do not need to add the `‘AmazonSageMakerFullAccess’` IAM policy

 This guide assumes you are running the following instructions on your edge device. 

# Prerequisites
<a name="neo-getting-started-edge-step0"></a>

SageMaker Neo is a capability that allows you to train machine learning models once and run them anywhere in the cloud and at the edge. Before you can compile and optimize your models with Neo, there are a few prerequisites you need to set up. You must install the necessary Python libraries, configure your AWS credentials, create an IAM role with the required permissions, and set up an S3 bucket for storing model artifacts. You must also have a trained machine learning model ready. The following steps guide you through the setup:

1. **Install Boto3**

   If you are running these commands on your edge device, you must install the AWS SDK for Python (Boto3). Within a Python environment (preferably a virtual environment), run the following locally on your edge device's terminal or within a Jupyter notebook instance: 

------
#### [ Terminal ]

   ```
   pip install boto3
   ```

------
#### [ Jupyter Notebook ]

   ```
   !pip install boto3
   ```

------

1.  **Set Up AWS Credentials** 

   You need to set up Amazon Web Services credentials on your device in order to run SDK for Python (Boto3). By default, the AWS credentials should be stored in the file `~/.aws/credentials` on your edge device. Within the credentials file, you should see two environment variables: `aws_access_key_id` and `aws_secret_access_key`. 

   In your terminal, run: 

   ```
   $ more ~/.aws/credentials
   
   [default]
   aws_access_key_id = YOUR_ACCESS_KEY
   aws_secret_access_key = YOUR_SECRET_KEY
   ```

   The [AWS General Reference Guide](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) has instructions on how to get the necessary `aws_access_key_id` and `aws_secret_access_key`. For more information on how to set up credentials on your device, see the [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration) documentation. 

1.  **Set up an IAM Role and attach policies.** 

   Neo needs access to your S3 bucket URI. Create an IAM role that can run SageMaker AI and has permission to access the S3 URI. You can create an IAM role either by using SDK for Python (Boto3), the console, or the AWS CLI. The following example illustrates how to create an IAM role using SDK for Python (Boto3): 

   ```
   import boto3
   
   AWS_REGION = 'aws-region'
   
   # Create an IAM client to interact with IAM
   iam_client = boto3.client('iam', region_name=AWS_REGION)
   role_name = 'role-name'
   ```

   For more information on how to create an IAM role with the console, AWS CLI, or through the AWS API, see [Creating an IAM user in your AWS account](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_api).

    Create a dictionary describing the IAM policy you are attaching. This policy is used to create a new IAM role. 

   ```
   policy = {
       'Statement': [
           {
               'Action': 'sts:AssumeRole',
               'Effect': 'Allow',
               'Principal': {'Service': 'sagemaker.amazonaws.com'},
           }],  
        'Version': '2012-10-17		 	 	 '
   }
   ```

   Create a new IAM role using the policy you defined above:

   ```
   import json 
   
   new_role = iam_client.create_role(
       AssumeRolePolicyDocument=json.dumps(policy),
       Path='/',
       RoleName=role_name
   )
   ```

   You need to know what your Amazon Resource Name (ARN) is when you create a compilation job in a later step, so store it in a variable as well. 

   ```
   role_arn = new_role['Role']['Arn']
   ```

    Now that you have created a new role, attach the permissions it needs to interact with Amazon SageMaker AI and Amazon S3: 

   ```
   iam_client.attach_role_policy(
       RoleName=role_name,
       PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess'
   )
   
   iam_client.attach_role_policy(
       RoleName=role_name,
       PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess'
   );
   ```

1. **Create an Amazon S3 bucket to store your model artifacts**

   SageMaker Neo will access your model artifacts from Amazon S3

------
#### [ Boto3 ]

   ```
   # Create an S3 client
   s3_client = boto3.client('s3', region_name=AWS_REGION)
   
   # Name buckets
   bucket='name-of-your-bucket'
   
   # Check if bucket exists
   if boto3.resource('s3').Bucket(bucket) not in boto3.resource('s3').buckets.all():
       s3_client.create_bucket(
           Bucket=bucket,
           CreateBucketConfiguration={
               'LocationConstraint': AWS_REGION
           }
       )
   else:
       print(f'Bucket {bucket} already exists. No action needed.')
   ```

------
#### [ CLI ]

   ```
   aws s3 mb s3://'name-of-your-bucket' --region specify-your-region 
   
   # Check your bucket exists
   aws s3 ls s3://'name-of-your-bucket'/
   ```

------

1. **Train a machine learning model**

   See [Train a Model with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html) for more information on how to train a machine learning model using Amazon SageMaker AI. You can optionally upload your locally trained model directly into an Amazon S3 URI bucket. 
**Note**  
 Make sure the model is correctly formatted depending on the framework you used. See [What input data shapes does SageMaker Neo expect?](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html#neo-job-compilation-expected-inputs) 

   If you do not have a model yet, use the `curl` command to get a local copy of the `coco_ssd_mobilenet` model from TensorFlow’s website. The model you just copied is an object detection model trained from the [COCO dataset](https://cocodataset.org/#home). Type the following into your Jupyter notebook:

   ```
   model_zip_filename = './coco_ssd_mobilenet_v1_1.0.zip'
   !curl http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip \
       --output {model_zip_filename}
   ```

   Note that this particular example was packaged in a .zip file. Unzip this file and repackage it as a compressed tarfile (`.tar.gz`) before using it in later steps. Type the following into your Jupyter notebook: 

   ```
   # Extract model from zip file
   !unzip -u {model_zip_filename}
   
   model_filename = 'detect.tflite'
   model_name = model_filename.split('.')[0]
   
   # Compress model into .tar.gz so SageMaker Neo can use it
   model_tar = model_name + '.tar.gz'
   !tar -czf {model_tar} {model_filename}
   ```

1. **Upload trained model to an S3 bucket**

   Once you have trained your machine learning mode, store it in an S3 bucket. 

------
#### [ Boto3 ]

   ```
   # Upload model        
   s3_client.upload_file(Filename=model_filename, Bucket=bucket, Key=model_filename)
   ```

------
#### [ CLI ]

   Replace `your-model-filename` and `amzn-s3-demo-bucket` with the name of your S3 bucket. 

   ```
   aws s3 cp your-model-filename s3://amzn-s3-demo-bucket
   ```

------

# Compile the Model
<a name="neo-getting-started-edge-step1"></a>

Once you have satisfied the [Prerequisites](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-getting-started-edge.html#neo-getting-started-edge-step0), you can compile your model with Amazon SageMaker AI Neo. You can compile your model using the AWS CLI, the console or the [Amazon Web Services SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html), see [Use Neo to Compile a Model](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html). In this example, you will compile your model with Boto3.

To compile a model, SageMaker Neo requires the following information:

1.  **The Amazon S3 bucket URI where you stored the trained model.** 

   If you followed the prerequisites, the name of your bucket is stored in a variable named `bucket`. The following code snippet shows how to list all of your buckets using the AWS CLI: 

   ```
   aws s3 ls
   ```

   For example: 

   ```
   $ aws s3 ls
   2020-11-02 17:08:50 bucket
   ```

1.  **The Amazon S3 bucket URI where you want to save the compiled model.** 

   The code snippet below concatenates your Amazon S3 bucket URI with the name of an output directory called `output`: 

   ```
   s3_output_location = f's3://{bucket}/output'
   ```

1.  **The machine learning framework you used to train your model.** 

   Define the framework you used to train your model.

   ```
   framework = 'framework-name'
   ```

   For example, if you wanted to compile a model that was trained using TensorFlow, you could either use `tflite` or `tensorflow`. Use `tflite` if you want to use a lighter version of TensorFlow that uses less storage memory. 

   ```
   framework = 'tflite'
   ```

   For a complete list of Neo-supported frameworks, see [Supported Frameworks, Devices, Systems, and Architectures](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-devices-edge.html). 

1.  **The shape of your model's input.** 

    Neo requires the name and shape of your input tensor. The name and shape are passed in as key-value pairs. `value` is a list of the integer dimensions of an input tensor and `key` is the exact name of an input tensor in the model. 

   ```
   data_shape = '{"name": [tensor-shape]}'
   ```

   For example:

   ```
   data_shape = '{"normalized_input_image_tensor":[1, 300, 300, 3]}'
   ```
**Note**  
Make sure the model is correctly formatted depending on the framework you used. See [What input data shapes does SageMaker Neo expect?](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html#neo-job-compilation-expected-inputs) The key in this dictionary must be changed to the new input tensor's name.

1.  **Either the name of the target device to compile for or the general details of the hardware platform** 

   ```
   target_device = 'target-device-name'
   ```

   For example, if you want to deploy to a Raspberry Pi 3, use: 

   ```
   target_device = 'rasp3b'
   ```

   You can find the entire list of supported edge devices in [Supported Frameworks, Devices, Systems, and Architectures](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-devices-edge.html).

 Now that you have completed the previous steps, you can submit a compilation job to Neo. 

```
# Create a SageMaker client so you can submit a compilation job
sagemaker_client = boto3.client('sagemaker', region_name=AWS_REGION)

# Give your compilation job a name
compilation_job_name = 'getting-started-demo'
print(f'Compilation job for {compilation_job_name} started')

response = sagemaker_client.create_compilation_job(
    CompilationJobName=compilation_job_name,
    RoleArn=role_arn,
    InputConfig={
        'S3Uri': s3_input_location,
        'DataInputConfig': data_shape,
        'Framework': framework.upper()
    },
    OutputConfig={
        'S3OutputLocation': s3_output_location,
        'TargetDevice': target_device 
    },
    StoppingCondition={
        'MaxRuntimeInSeconds': 900
    }
)

# Optional - Poll every 30 sec to check completion status
import time

while True:
    response = sagemaker_client.describe_compilation_job(CompilationJobName=compilation_job_name)
    if response['CompilationJobStatus'] == 'COMPLETED':
        break
    elif response['CompilationJobStatus'] == 'FAILED':
        raise RuntimeError('Compilation failed')
    print('Compiling ...')
    time.sleep(30)
print('Done!')
```

If you want additional information for debugging, include the following print statement:

```
print(response)
```

If the compilation job is successful, your compiled model isstored in the output Amazon S3 bucket you specified earlier (`s3_output_location`). Download your compiled model locally: 

```
object_path = f'output/{model}-{target_device}.tar.gz'
neo_compiled_model = f'compiled-{model}.tar.gz'
s3_client.download_file(bucket, object_path, neo_compiled_model)
```

# Set Up Your Device
<a name="neo-getting-started-edge-step2"></a>

You will need to install packages on your edge device so that your device can make inferences. You will also need to either install [AWS IoT Greengrass](https://docs.aws.amazon.com/greengrass/latest/developerguide/what-is-gg.html) core or [Deep Learning Runtime (DLR)](https://github.com/neo-ai/neo-ai-dlr). In this example, you will install packages required to make inferences for the `coco_ssd_mobilenet` object detection algorithm and you will use DLR.

1. **Install additional packages**

   In addition to Boto3, you must install certain libraries on your edge device. What libraries you install depends on your use case. 

   For example, for the `coco_ssd_mobilenet` object detection algorithm you downloaded earlier, you need to install [NumPy](https://numpy.org/) for data manipulation and statistics, [PIL](https://pillow.readthedocs.io/en/stable/) to load images, and [Matplotlib](https://matplotlib.org/) to generate plots. You also need a copy of TensorFlow if you want to gauge the impact of compiling with Neo versus a baseline. 

   ```
   !pip3 install numpy pillow tensorflow matplotlib 
   ```

1. **Install inference engine on your device**

   To run your Neo-compiled model, install the [Deep Learning Runtime (DLR)](https://github.com/neo-ai/neo-ai-dlr) on your device. DLR is a compact, common runtime for deep learning models and decision tree models. On x86\$164 CPU targets running Linux, you can install the latest release of the DLR package using the following `pip` command:

   ```
   !pip install dlr
   ```

   For installation of DLR on GPU targets or non-x86 edge devices, refer to [Releases](https://github.com/neo-ai/neo-ai-dlr/releases) for prebuilt binaries, or [Installing DLR](https://neo-ai-dlr.readthedocs.io/en/latest/install.html) for building DLR from source. For example, to install DLR for Raspberry Pi 3, you can use: 

   ```
   !pip install https://neo-ai-dlr-release.s3-us-west-2.amazonaws.com/v1.3.0/pi-armv7l-raspbian4.14.71-glibc2_24-libstdcpp3_4/dlr-1.3.0-py3-none-any.whl
   ```

# Make Inferences on Your Device
<a name="neo-getting-started-edge-step3"></a>

In this example, you will use Boto3 to download the output of your compilation job onto your edge device. You will then import DLR, download an example images from the dataset, resize this image to match the model’s original input, and then you will make a prediction.

1. **Download your compiled model from Amazon S3 to your device and extract it from the compressed tarfile.** 

   ```
   # Download compiled model locally to edge device
   object_path = f'output/{model_name}-{target_device}.tar.gz'
   neo_compiled_model = f'compiled-{model_name}.tar.gz'
   s3_client.download_file(bucket_name, object_path, neo_compiled_model)
   
   # Extract model from .tar.gz so DLR can use it
   !mkdir ./dlr_model # make a directory to store your model (optional)
   !tar -xzvf ./compiled-detect.tar.gz --directory ./dlr_model
   ```

1. **Import DLR and an initialized `DLRModel` object.**

   ```
   import dlr
   
   device = 'cpu'
   model = dlr.DLRModel('./dlr_model', device)
   ```

1. **Download an image for inferencing and format it based on how your model was trained**.

   For the `coco_ssd_mobilenet` example, you can download an image from the [COCO dataset](https://cocodataset.org/#home) and then reform the image to `300x300`: 

   ```
   from PIL import Image
   
   # Download an image for model to make a prediction
   input_image_filename = './input_image.jpg'
   !curl https://farm9.staticflickr.com/8325/8077197378_79efb4805e_z.jpg --output {input_image_filename}
   
   # Format image so model can make predictions
   resized_image = image.resize((300, 300))
   
   # Model is quantized, so convert the image to uint8
   x = np.array(resized_image).astype('uint8')
   ```

1. **Use DLR to make inferences**.

   Finally, you can use DLR to make a prediction on the image you just downloaded: 

   ```
   out = model.run(x)
   ```

For more examples using DLR to make inferences from a Neo-compiled model on an edge device, see the [neo-ai-dlr Github repository](https://github.com/neo-ai/neo-ai-dlr). 

# Troubleshoot Errors
<a name="neo-troubleshooting"></a>

This section contains information about how to understand and prevent common errors, the error messages they generate, and guidance on how to resolve these errors. Before moving on, ask yourself the following questions:

 **Did you encounter an error before you deployed your model?** If yes, see [Troubleshoot Neo Compilation Errors](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-compilation.html). 

 **Did you encounter an error after you compiled your model?** If yes, see [Troubleshoot Neo Inference Errors](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-inference.html). 

**Did you encounter an error trying to compile your model for Ambarella devices?** If yes, see [Troubleshoot Ambarella Errors](neo-troubleshooting-target-devices-ambarella.md).

## Error Classification Types
<a name="neo-error-messages"></a>

This list classifies the *user errors* you can receive from Neo. These include access and permission errors and load errors for each of the supported frameworks. All other errors are *system errors*.

### Client permission error
<a name="neo-error-client-permission"></a>

 Neo passes the errors for these straight through from the dependent service. 
+ *Access Denied* when calling sts:AssumeRole
+ *Any 400* error when calling Amazon S3 to download or upload a client model
+ *PassRole* error

### Load error
<a name="collapsible-section-2"></a>

Assuming that the Neo compiler successfully loaded .tar.gz from Amazon S3, check whether the tarball contains the necessary files for compilation. The checking criteria is framework-specific: 
+ **TensorFlow**: Expects only protobuf file (\$1.pb or \$1.pbtxt). For saved models, expects one variables folder. 
+ **Pytorch**: Expect only one pytorch file (\$1.pth).
+ **MXNET**: Expect only one symbol file (\$1.json) and one parameter file (\$1.params).
+ **XGBoost**: Expect only one XGBoost model file (\$1.model). The input model has size limitation.

### Compilation error
<a name="neo-error-compilation"></a>

Assuming that the Neo compiler successfully loaded .tar.gz from Amazon S3, and that the tarball contains necessary files for compilation. The checking criteria is: 
+ **OperatorNotImplemented**: An operator has not been implemented.
+ **OperatorAttributeNotImplemented**: The attribute in the specified operator has not been implemented. 
+ **OperatorAttributeRequired**: An attribute is required for an internal symbol graph, but it is not listed in the user input model graph. 
+ **OperatorAttributeValueNotValid**: The value of the attribute in the specific operator is not valid. 

**Topics**
+ [Error Classification Types](#neo-error-messages)
+ [Troubleshoot Neo Compilation Errors](neo-troubleshooting-compilation.md)
+ [Troubleshoot Neo Inference Errors](neo-troubleshooting-inference.md)
+ [Troubleshoot Ambarella Errors](neo-troubleshooting-target-devices-ambarella.md)

# Troubleshoot Neo Compilation Errors
<a name="neo-troubleshooting-compilation"></a>

This section contains information about how to understand and prevent common compilation errors, the error messages they generate, and guidance on how to resolve these errors. 

**Topics**
+ [How to Use This Page](#neo-troubleshooting-compilation-how-to-use)
+ [Framework-Related Errors](#neo-troubleshooting-compilation-framework-related-errors)
+ [Infrastructure-Related Errors](#neo-troubleshooting-compilation-infrastructure-errors)
+ [Check your compilation log](#neo-troubleshooting-compilation-logs)

## How to Use This Page
<a name="neo-troubleshooting-compilation-how-to-use"></a>

Attempt to resolve your error by the going through these sections in the following order:

1. Check that the input of your compilation job satisfies the input requirements. See [What input data shapes does SageMaker Neo expect?](neo-compilation-preparing-model.md#neo-job-compilation-expected-inputs)

1.  Check common [framework-specific errors](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-compilation.html#neo-troubleshooting-compilation-framework-related-errors). 

1.  Check if your error is an [infrastructure error](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-compilation.html#neo-troubleshooting-compilation-infrastructure-errors). 

1. Check your [compilation log](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-compilation.html#neo-troubleshooting-compilation-logs).

## Framework-Related Errors
<a name="neo-troubleshooting-compilation-framework-related-errors"></a>

### Keras
<a name="neo-troubleshooting-compilation-framework-related-errors-keras"></a>


| Error | Solution | 
| --- | --- | 
|   `InputConfiguration: No h5 file provided in <model path>`   |   Check your h5 file is in the Amazon S3 URI you specified.  *Or* Check that the [h5 file is correctly formatted](https://www.tensorflow.org/guide/keras/save_and_serialize#keras_h5_format).   | 
|   `InputConfiguration: Multiple h5 files provided, <model path>, when only one is allowed`   |  Check you are only providing one `h5` file.  | 
|   `ClientError: InputConfiguration: Unable to load provided Keras model. Error: 'sample_weight_mode'`   |  Check the Keras version you specified is supported. See, supported frameworks for [cloud instances](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-cloud.html) and [edge devices](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-devices-edge.html).   | 
|   `ClientError: InputConfiguration: Input input has wrong shape in Input Shape dictionary. Input shapes should be provided in NCHW format. `   |   Check that your model input follows NCHW format. See [What input data shapes does SageMaker Neo expect?](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html#neo-job-compilation-expected-inputs)   | 

### MXNet
<a name="neo-troubleshooting-compilation-framework-related-errors-mxnet"></a>


| Error | Solution | 
| --- | --- | 
|   `ClientError: InputConfiguration: Only one parameter file is allowed for MXNet model. Please make sure the framework you select is correct.`   |   SageMaker Neo will select the first parameter file given for compilation.   | 

### TensorFlow
<a name="neo-troubleshooting-compilation-framework-related-errors-tensorflow"></a>


| Error | Solution | 
| --- | --- | 
|   `InputConfiguration: Exactly one .pb file is allowed for TensorFlow models.`   |  Make sure you only provide one .pb or .pbtxt file.  | 
|  `InputConfiguration: Exactly one .pb or .pbtxt file is allowed for TensorFlow models.`  |  Make sure you only provide one .pb or .pbtxt file.  | 
|   ` ClientError: InputConfiguration: TVM cannot convert <model zoo> model. Please make sure the framework you selected is correct. The following operators are not implemented: {<operator name>} `   |   Check the operator you chose is supported. See [SageMaker Neo Supported Frameworks and Operators](https://aws.amazon.com/releasenotes/sagemaker-neo-supported-frameworks-and-operators/).   | 

### PyTorch
<a name="neo-troubleshooting-compilation-framework-related-errors-pytorch"></a>


| Error | Solution | 
| --- | --- | 
|   `InputConfiguration: We are unable to extract DataInputConfig from the model due to input_config_derivation_error. Please override by providing a DataInputConfig during compilation job creation.`  |  Do either of the following: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-compilation.html)  | 

## Infrastructure-Related Errors
<a name="neo-troubleshooting-compilation-infrastructure-errors"></a>


| Error | Solution | 
| --- | --- | 
|   `ClientError: InputConfiguration: S3 object does not exist. Bucket: <bucket>, Key: <bucket key>`   |  Check the Amazon S3 URI your provided.  | 
|   ` ClientError: InputConfiguration: Bucket <bucket name> is in region <region name> which is different from AWS Sagemaker service region <service region> `   |   Create an Amazon S3 bucket that is in the same region as the service.   | 
|   ` ClientError: InputConfiguration: Unable to untar input model. Please confirm the model is a tar.gz file `   |   Check that your model in Amazon S3 is compressed into a `tar.gz` file.   | 

## Check your compilation log
<a name="neo-troubleshooting-compilation-logs"></a>

1. Navigate to Amazon CloudWatch at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Select the region you created the compilation job from the **Region** dropdown list in the top right.

1. In the navigation pane of the Amazon CloudWatch, choose **Logs**. Select **Log groups**.

1. Search for the log group called `/aws/sagemaker/CompilationJobs`. Select the log group.

1. Search for the logstream named after the compilation job name. Select the log stream.

# Troubleshoot Neo Inference Errors
<a name="neo-troubleshooting-inference"></a>

This section contains information about how to prevent and resolve some of the common errors you might encounter upon deploying and/or invoking the endpoint. This section applies to **PyTorch 1.4.0 or later** and **MXNet v1.7.0 or later**. 
+ Make sure the first inference (warm-up inference) on a valid input data is done in `model_fn()`, if you defined a `model_fn` in your inference script, otherwise the following error message may be seen on the terminal when [https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict) is called: 

  ```
  An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."                
  ```
+ Make sure that the environment variables in the following table are set. If they are not set, the following error message might show up: 

  **On the terminal:**

  ```
  An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from <users-sagemaker-endpoint> with message "{ "code": 503, "type": "InternalServerException", "message": "Prediction failed" } ".
  ```

  **In CloudWatch:**

  ```
  W-9001-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'NoneType' object has no attribute 'transform'
  ```    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-inference.html)
+ Make sure that the `MMS_DEFAULT_RESPONSE_TIMEOUT` environment variable is set to 500 or a higher value while creating the Amazon SageMaker AI model; otherwise, the following error message may be seen on the terminal: 

  ```
  An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."
  ```

# Troubleshoot Ambarella Errors
<a name="neo-troubleshooting-target-devices-ambarella"></a>

SageMaker Neo requires models to be packaged in a compressed TAR file (`*.tar.gz`). Ambarella devices require additional files to be included within the compressed TAR file before it is sent for compilation. Include the following files within your compressed TAR file if you want to compile a model for Ambarella targets with SageMaker Neo:
+ A trained model using a framework supported by SageMaker Neo 
+ A JSON configuration file
+ Calibration images

For example, the contents of your compressed TAR file should look similar to the following example:

```
├──amba_config.json
├──calib_data
|    ├── data1
|    ├── data2
|    ├── .
|    ├── .
|    ├── .
|    └── data500
└──mobilenet_v1_1.0_0224_frozen.pb
```

The directory is configured as follows:
+ `amba_config.json` : Configuration file
+ `calib_data` : Folder containing calibration images
+ `mobilenet_v1_1.0_0224_frozen.pb` : TensorFlow model saved as a frozen graph

For information about frameworks supported by SageMaker Neo, see [Supported Frameworks](neo-supported-devices-edge-frameworks.md).

## Setting up the Configuration File
<a name="neo-troubleshooting-target-devices-ambarella-config"></a>

The configuration file provides information required by the Ambarella toolchain to compile the model. The configuration file must be saved as a JSON file and the name of the file must end with `*config.json`. The following chart shows the contents of the configuration file.


| Key | Description | Example | 
| --- | --- | --- | 
| inputs | Dictionary mapping input layers to attribute. | <pre>{inputs:{"data":{...},"data1":{...}}}</pre> | 
| "data" | Input layer name. Note: "data" is an example of the name you can use to label the input layer. | "data" | 
| shape | Describes the shape of the input to the model. This follows the same conventions that SageMaker Neo uses. | "shape": "1,3,224,224" | 
| filepath | Relative path to the directory containing calibration images. These can be binary or image files like JPG or PNG. | "filepath": "calib\$1data/" | 
| colorformat | Color format that model expects. This will be used while converting images to binary. Supported values: [RGB, BGR]. Default is RGB. | "colorformat":"RGB" | 
| mean | Mean value to be subtracted from the input. Can be a single value or a list of values. When the mean is given as a list the number of entries must match the channel dimension of the input. | "mean":128.0 | 
| scale | Scale value to be used for normalizing the input. Can be a single value or a list of values. When the scale is given as a list, the number of entries must match the channel dimension of the input. | "scale": 255.0 | 

The following is a sample configuration file: 

```
{
    "inputs": {
        "data": {
                "shape": "1, 3, 224, 224",
                "filepath": "calib_data/",
                "colorformat": "RGB",
                "mean":[128,128,128],
                "scale":[128.0,128.0,128.0]
        }
    }
}
```

## Calibration Images
<a name="neo-troubleshooting-target-devices-ambarella-calibration-images"></a>

Quantize your trained model by providing calibration images. Quantizing your model improves the performance of the CVFlow engine on an Ambarella System on a Chip (SoC). The Ambarella toolchain uses the calibration images to determine how each layer in the model should be quantized to achieve optimal performance and accuracy. Each layer is quantized independently to INT8 or INT16 formats. The final model has a mix of INT8 and INT16 layers after quantization.

**How many images should you use?**

It is recommended that you include between 100–200 images that are representative of the types of scenes the model is expected to handle. The model compilation time increases linearly to the number of calibration images in the input file.

**What are the recommended image formats?**

Calibration images can be in a raw binary format or image formats such as JPG and PNG.

Your calibration folder can contain a mixture of images and binary files. If the calibration folder contains both images and binary files, the toolchain first converts the images to binary files. Once the conversion is complete, it uses the newly generated binary files along with the binary files that were originally in the folder.

**Can I convert the images into binary format first?**

Yes. You can convert the images to the binary format with open-source packages such as [OpenCV](https://opencv.org/) or [PIL](https://python-pillow.org/). Crop and resize the images so they satisfy the input layer of your trained model.



## Mean and Scale
<a name="neo-troubleshooting-target-devices-ambarella-mean-scale"></a>

You can specify mean and scaling pre-processing options to the Amberalla toolchain. These operations are embedded into the network and are applied during inference on each input. Do not provide processed data if you specify the mean or scale. More specifically, do not provide data you have subtracted the mean from or have applied scaling to.

## Check your compilation log
<a name="neo-troubleshooting-target-devices-ambarella-compilation"></a>

For information on checking compilation log for Ambarella devices, see [Check your compilation log](neo-troubleshooting-compilation.md#neo-troubleshooting-compilation-logs).