# Custom models
<a name="canvas-custom-models"></a>

In Amazon SageMaker Canvas, you can train custom machine learning models tailored to your specific data and use case. By training a custom model on your data, you are able to capture characteristics and trends that are specific and most representative of your data. For example, you might want to create a custom time series forecasting model that you train on inventory data from your warehouse to manage your logistics operations.

Canvas supports training a range of model types. After training a custom model, you can evaluate the model's performance and accuracy. Once satisfied with a model, you can make predictions on new data, and you also have the option to share the custom model with data scientists for further analysis or to deploy it to a SageMaker AI hosted endpoint for real-time inference, all from within the Canvas application.

You can train a Canvas custom model on the following types of datasets:
+ Tabular (including numeric, categorical, timeseries, and text data)
+ Image

The following table shows the types of custom models that you can build in Canvas, along with their supported data types and data sources.


| Model type | Example use case | Supported data types | Supported data sources | 
| --- | --- | --- | --- | 
| Numeric prediction | Predicting house prices based on features like square footage | Numeric | Local upload, Amazon S3, SaaS connectors | 
| 2 category prediction | Predicting whether or not a customer is likely to churn | Binary or categorical | Local upload, Amazon S3, SaaS connectors | 
| 3\$1 category prediction | Predicting patient outcomes after being discharged from the hospital | Categorical | Local upload, Amazon S3, SaaS connectors | 
| Time series forecasting | Predicting your inventory for the next quarter | Timeseries | Local upload, Amazon S3, SaaS connectors | 
| Single-label image prediction | Predicting types of manufacturing defects in images | Image (JPG, PNG) | Local upload, Amazon S3 | 
| Multi-category text prediction | Predicting categories of products, such as clothing, electronics, or household goods, based on product descriptions |  Source column: text Target column: binary or categorical | Local upload, Amazon S3 | 

**Get started**

To get started with building and generating predictions from a custom model, do the following:
+ Determine your use case and type of model that you want to build. For more information about the custom model types, see [How custom models work](canvas-build-model.md). For more information about the data types and sources supported for custom models, see [Data import](canvas-importing-data.md).
+ [Import your data](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-importing-data.html) into Canvas. You can build a custom model with any tabular or image dataset that meets the input requirements. For more information about the input requirements, see [Create a dataset](canvas-import-dataset.md).

  To learn more about sample datasets provided by SageMaker AI with which you can experiment, see [Sample datasets in Canvas](canvas-sample-datasets.md).
+ [Build](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-build-model.html) your custom model. You can do a **Quick build** to get your model and start making predictions more quickly, or you can do a **Standard build** for greater accuracy.

  For numeric, categorical, and time series forecasting model types, you can clean and prepare your data with the [Data Wrangler feature](canvas-data-prep.md). In Data Wrangler, you can create a data flow and use various data preparation techniques, such as applying advanced transforms or joining datasets. For image prediction models, you can [Edit an image dataset](canvas-edit-image.md) to update your labels or add and delete images. Note that you can't use these features for multi-category text prediction models.
+ [Evaluate your model's performance](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-evaluate-model.html) and determine how well it might perform on real-world data.
+ [Make single or batch predictions](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-make-predictions.html) with your model.

# How custom models work
<a name="canvas-build-model"></a>

Use Amazon SageMaker Canvas to build a custom model on the dataset that you've imported. Use the model that you've built to make predictions on new data. SageMaker Canvas uses the information in the dataset to build up to 250 models and choose the one that performs the best.

When you begin building a model, Canvas automatically recommends one or more *model types*. Model types fall into one of the following categories:
+ **Numeric prediction** – This is known as *regression* in machine learning. Use the numeric prediction model type when you want to make predictions for numeric data. For example, you might want to predict the price of houses based on features such as the house’s square footage.
+ **Categorical prediction** – This is known as *classification* in machine learning. When you want to categorize data into groups, use the categorical prediction model types:
  + **2 category prediction** – Use the 2 category prediction model type (also known as *binary classification* in machine learning) when you have two categories that you want to predict for your data. For example, you might want to determine whether a customer is likely to churn.
  + **3\$1 category prediction** – Use the 3\$1 category prediction model type (also known as *multi-class classification* in machine learning) when you have three or more categories that you want to predict for your data. For example, you might want to predict a customer's loan status based on features such as previous payments.
+ **Time series forecasting** – Use time series forecasts when you want to make predictions over a period of time. For example, you might want to predict the number of items you’ll sell in the next quarter. For information about time series forecasts, see [Time Series Forecasts in Amazon SageMaker Canvas](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-time-series.html).
+ **Image prediction** – Use the single-label image prediction model type (also known as *single-label image classification* in machine learning) when you want to assign labels to images. For example, you might want to classify different types of manufacturing defects in images of your product.
+ **Text prediction** – Use the multi-category text prediction model type (also known as *multi-class text classification* in machine learning) when you want to assign labels to passages of text. For example, you might have a dataset of customer reviews for a product, and you want to determine whether customers liked or disliked the product. You might have your model predict whether a given passage of text is `Positive`, `Negative`, or `Neutral`.

For a table of the supported input data types for each model type, see [Custom models](canvas-custom-models.md).

For each tabular data model that you build (which includes numeric, categorical, time series forecasting, and text prediction models), you choose the **Target column**. The **Target column** is the column that contains the information that you want to predict. For example, if you're building a model to predict whether people have cancelled their subscriptions, the **Target column** contains data points that are either a `yes` or a `no` about someone's cancellation status.

For image prediction models, you build the model with a dataset of images that have been assigned labels. For the unlabeled images that you provide, the model predicts a label. For example, if you’re building a model to predict whether an image is a cat or a dog, you provide images labeled as cats or dogs when building the model. Then, the model can accept unlabeled images and predict them as either cats or dogs.

**What happens when you build a model**

To build your model, you can choose either a **Quick build** or a **Standard build**. The **Quick build** has a shorter build time, but the **Standard build** generally has a higher accuracy.

For tabular and time series forecasting models, Canvas uses *downsampling* to reduce the size of datasets larger than 5 GB or 30 GB, respectively. Canvas downsamples with the stratified sampling method. The table below lists the size of the downsample by model type. To control the sampling process, you can use Data Wrangler in Canvas to sample using your preferred sampling technique. For time series data, you can resample to aggregate data points. For more information about sampling, see [Sampling](canvas-transform.md#canvas-transform-sampling). For more information about resampling time series data, see [Resample Time Series Data](canvas-transform.md#canvas-resample-time-series).

If you choose to do a **Quick build** on a dataset with more than 50,000 rows, then Canvas samples your data down to 50,000 rows for a shorter model training time.

The following table summarizes key characteristics of the model building process, including average build times for each model and build type, the size of the downsample when building models with large datasets, and the minimum and maximum number of data points you should have for each build type.


| Limit | Numeric and categorical prediction | Time series forecasting | Image prediction | Text prediction | 
| --- | --- | --- | --- | --- | 
| **Quick build** time | 2‐20 minutes | 2‐20 minutes | 15‐30 minutes | 15‐30 minutes | 
| **Standard build** time | 2‐4 hours | 2‐4 hours | 2‐5 hours | 2‐5 hours | 
| Downsample size (the reduced size of a large dataset after Canvas downsamples) | 5 GB | 30 GB | N/A | N/A | 
| Minimum number of entries (rows) for **Quick builds** |  2 category: 500 rows 3\$1 category, numeric, time series: N/A  | N/A | N/A | N/A | 
| Minimum number of entries (rows, images, or documents) for **Standard builds** | 250 | 50 | 50 | N/A | 
| Maximum number of entries (rows, images, or documents) for **Quick builds** | N/A | N/A | 5000 | 7500 | 
| Maximum number of entries (rows, images, or documents) for **Standard builds** | N/A | 150,000 | 180,000 | N/A | 
| Maximum number of columns | 1,000 | 1,000 | N/A | N/A | 

Canvas predicts values by using the information in the rest of the dataset, depending on the model type:
+ For categorical prediction, Canvas puts each row into one of the categories listed in the **Target column**.
+ For numeric prediction, Canvas uses the information in the dataset to predict the numeric values in the **Target column**.
+ For time series forecasting, Canvas uses historical data to predict values for the **Target column** in the future.
+ For image prediction, Canvas uses images that have been assigned labels to predict labels for unlabeled images.
+ For text prediction, Canvas analyzes text data that has been assigned labels to predict labels for passages of unlabeled text.

**Additional features to help you build your model**

Before building your model, you can use Data Wrangler in Canvas to prepare your data using 300\$1 built-in transforms and operators. Data Wrangler supports transforms for both tabular and image datasets. Additionally, you can connect to data sources outside of Canvas, create jobs to apply transforms to your entire dataset, and export your fully prepared and cleaned data for use in ML workflows outside of Canvas. For more information, see [Data preparation](canvas-data-prep.md).

To see visualizations and analytics to explore your data and determine which features to include in your model, you can use Data Wrangler’s built-in analyses. You can also access a **Data Quality and Insights Report** that highlights potential issues with your dataset and provides recommendations for how to fix them. For more information, see [Perform exploratory data analysis (EDA)](canvas-analyses.md).

In addition to the more advanced data preparation and exploration functionality provided through Data Wrangler, Canvas provides some basic features that you can use:
+ To filter your data and access a set of basic data transforms, see [Prepare data for model building](canvas-prepare-data.md).
+ To access simple visualizations and analytics for feature exploration, see [Data exploration and analysis](canvas-explore-data.md).
+ To learn more about additional features such as previewing your model, validating your dataset, and changing the size of the random sample used to build your model, see [Preview your model](canvas-preview-model.md).

For tabular datasets with multiple columns (such as datasets for building categorical, numeric, or time series forecasting model types), you might have rows with missing data points. While Canvas builds the model, it automatically adds missing values. Canvas uses the values in your dataset to perform a mathematical approximation for the missing values. For the highest model accuracy, we recommend adding in the missing data if you can find it. Note that the missing data feature is not supported for text prediction or image prediction models.

**Get started**

To get started with building a custom model, see [Build a model](canvas-build-model-how-to.md) and follow the procedure for the type of model that you want to build.

# Preview your model
<a name="canvas-preview-model"></a>

**Note**  
The following functionality is only available for custom models built with tabular datasets. Multi-category text prediction models are also excluded.

SageMaker Canvas provides you with a tool to preview your model before you begin building. This gives you an estimated accuracy score and also gives you a preliminary idea of how each column might impact the model. 

To preview the model score, when you're on the **Build** tab of your model, choose **Preview model**.

The model preview generates an **Estimated accuracy** prediction of how well the model might analyze your data. The accuracy of a **Quick build** or a **Standard build** represents how well the model can perform on real data and is generally higher than the **Estimated accuracy**.

The model preview also provides you with the **Column Impact** scores, which can indicate the importance of each column to the model's predictions.

The following screenshot shows a model preview in the Canvas application.

![\[Screenshot of the Build tab for a model in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-build/canvas-build-preview-model.png)


Amazon SageMaker Canvas automatically handles missing values in your dataset while it builds the model. It infers the missing values by using adjacent values that are present in the dataset.

If you're satisfied with your model preview and want to proceed with building a model, then see [Build a model](canvas-build-model-how-to.md).

# Data validation
<a name="canvas-dataset-validation"></a>

Before you build your model, SageMaker Canvas checks your dataset for issues that might cause your build to fail. If SageMaker Canvas finds any issues, then it warns you on the **Build** page before you attempt to build a model.

You can choose **Validate data** to see a list of the issues with your dataset. You can then use the SageMaker Canvas [Data Wrangler data preparation features](canvas-data-prep.md), or your own tools, to fix your dataset before starting a build. If you don’t fix the issues with your dataset, then your build fails.

If you make changes to your dataset to fix the issues, you have the option to re-validate your dataset before attempting a build. We recommend that you re-validate your dataset before building.

The following table shows the issues that SageMaker Canvas checks for in your dataset and how to resolve them.


| Issue | Resolution | 
| --- | --- | 
|  Wrong model type for your data  |  Try another model type or use a different dataset.  | 
|  Missing values in your target column  |  Replace the missing values, drop rows with missing values, or use a different dataset.  | 
|  Too many unique labels in your target column  |  Verify that you've used the correct column for your target column, or use a different dataset.  | 
|  Too many non-numeric values in your target column  |  Choose a different target column, select another model type, or use a different dataset.  | 
|  One or more column names contain double underscores  |  Rename the columns to remove any double underscores, and try again.  | 
|  None of the rows in your dataset are complete  |  Replace the missing values, or use a different dataset.  | 
|  Too many unique labels for the number of rows in your data  |  Check that you're using the right target column, increase the number of rows in your dataset, consolidate similar labels, or use a different dataset.  | 

# Random sample
<a name="canvas-random-sample"></a>

SageMaker Canvas uses the random sampling method to sample your dataset. The random sample method means that each row has an equal chance of being picked for the sample. You can choose a column in the preview to get summary statistics for the random sample, such as the mean and the mode.

By default, SageMaker Canvas uses a random sample size of 20,000 rows from your dataset for datasets with more than 20,000 rows. For datasets smaller than 20,000 rows, the default sample size is the number of rows in your dataset. You can increase or decrease the sample size by choosing **Random sample** in the **Build** tab of the SageMaker Canvas application. You can use the slider to select your desired sample size, and then choose **Update** to change the sample size. The maximum sample size you can choose for a dataset is 40,000 rows, and the minimum sample size is 500 rows. If you choose a large sample size, the dataset preview and summary statistics might take a few moments to reload.

The **Build** page shows a preview of 100 rows from your dataset. If the sample size is the same size as your dataset, then the preview uses the first 100 rows of your dataset. Otherwise, the preview uses the first 100 rows of the random sample.

# Build a model
<a name="canvas-build-model-how-to"></a>

The following sections show you how to build a model for each of the main types of custom models.
+ To build numeric prediction, 2 category prediction, or 3\$1 category prediction models, see [Build a custom numeric or categorical prediction model](#canvas-build-model-numeric-categorical).
+ To build single-label image prediction models, see [Build a custom image prediction model](#canvas-build-model-image).
+ To build multi-category text prediction models, see [Build a custom text prediction model](#canvas-build-model-text).
+ To build time series forecasting models, see [Build a time series forecasting model](#canvas-build-model-forecasting).

**Note**  
If you encounter an error during post-building analysis that tells you to increase your quota for `ml.m5.2xlarge` instances, see [Request a Quota Increase](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-requesting-quota-increases.html).

## Build a custom numeric or categorical prediction model
<a name="canvas-build-model-numeric-categorical"></a>

Numeric and categorical prediction models support both **Quick builds** and **Standard builds**.

To build a numeric or categorical prediction model, use the following procedure:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose **New model**.

1. In the **Create new model** dialog box, do the following:

   1. Enter a name in the **Model name** field.

   1. Select the **Predictive analysis** problem type.

   1. Choose **Create**.

1. For **Select dataset**, select your dataset from the list of datasets. If you haven’t already imported your data, choose **Import** to be directed through the import data workflow.

1. When you’re ready to begin building your model, choose **Select dataset**.

1. On the **Build** tab, for the **Target column** dropdown list, select the target for your model that you would like to predict.

1. For **Model type**, Canvas automatically detects the problem type for you. If you want to change the type or configure advanced model settings, choose **Configure model**.

   When the **Configure model** dialog box opens, do the following:

   1. For **Model type**, choose the model type that you want to build.

   1. After you choose the model type, there are additional **Advanced settings**. For more information about each of the advanced settings, see [Advanced model building configurations](canvas-advanced-settings.md). To configure the advanced settings, do the following:

      1. (Optional) For the **Objective metric** dropdown menu, select the metric that you want Canvas to optimize while building your model. If you don’t select a metric, Canvas chooses one for you by default. For descriptions of the available metrics, see [Metrics reference](canvas-metrics.md).

      1. For **Training method**, choose **Auto**, **Ensemble**, or **Hyperparameter optimization (HPO) mode**.

      1. For **Algorithms**, select the algorithms that you want to include for building model candidates.

      1. For **Data split**, specify in percentages how you want to split your data between the **Training set** and the **Validation set**. The training set is used for building the model, while the validation set is used for testing accuracy of model candidates.

      1. For **Max candidates and runtime**, do the following:

         1. Set the **Max candidates** value, or the maximum number of model candidates that Canvas can generate. Note that **Max candidates** is only available in HPO mode.

         1. Set the hour and minute values for **Max job runtime**, or the maximum amount of time that Canvas can spend building your model. After the maximum time, Canvas stops building and selects the best model candidate.

   1. After configuring the advanced settings, choose **Save**.

1. Select or deselect columns in your data to include or drop them from your build.
**Note**  
If you make batch predictions with your model after building, Canvas adds dropped columns to your prediction results. However, Canvas does not add the dropped columns to your batch predictions for time series models.

1. (Optional) Use the visualization and analytics tools that Canvas provides to visualize your data and determine which features you might want to include in your model. For more information, see [Explore and analyze your data](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-explore-data.html).

1. (Optional) Use data transformations to clean, transform, and prepare your data for model building. For more information, see [ Prepare your data with advanced transformations](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-prepare-data.html). You can view and remove your transforms by choosing **Model recipe** to open the **Model recipe** side panel.

1. (Optional) For additional features such as previewing the accuracy of your model, validating your dataset, and changing the size of the random sample that Canvas takes from your dataset, see [Preview your model](canvas-preview-model.md).

1. After reviewing your data and making any changes to your dataset, choose **Quick build** or **Standard build** to begin a build for your model. The following screenshot shows the **Build** page and the **Quick build** and **Standard build** options.  
![\[The Build page for a 2 category model showing the Quick build and Standard build options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/build-page-tabular-quick-standard-options.png)

After your model begins building, you can leave the page. When the model shows as **Ready** on the **My models** page, it’s ready for analysis and predictions.

## Build a custom image prediction model
<a name="canvas-build-model-image"></a>

Single-label image prediction models support both **Quick builds** and **Standard builds**.

To build a single-label image prediction model, use the following procedure:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose **New model**.

1. In the **Create new model** dialog box, do the following:

   1. Enter a name in the **Model name** field.

   1. Select the **Image analysis** problem type.

   1. Choose **Create**.

1. For **Select dataset**, select your dataset from the list of datasets. If you haven’t already imported your data, choose **Import** to be directed through the import data workflow.

1. When you’re ready to begin building your model, choose **Select dataset**.

1. On the **Build** tab, you see the **Label distribution** for the images in your dataset. The **Model type** is set to **Single-label image prediction**.

1. On this page, you can preview your images and edit the dataset. If you have any unlabeled images, choose **Edit dataset** and [Assign labels to unlabeled images](canvas-edit-image.md#canvas-edit-image-assign). You can also perform other tasks when you [Edit an image dataset](canvas-edit-image.md), such as renaming labels and adding images to the dataset.

1. After reviewing your data and making any changes to your dataset, choose **Quick build** or **Standard build** to begin a build for your model. The following screenshot shows the **Build** page of an image prediction model that is ready to be built.  
![\[The Build page for a single-label image prediction model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/build-page-image-model.png)

After your model begins building, you can leave the page. When the model shows as **Ready** on the **My models** page, it’s ready for analysis and predictions.

## Build a custom text prediction model
<a name="canvas-build-model-text"></a>

Multi-category text prediction models support both **Quick builds** and **Standard builds**.

To build a text prediction model, use the following procedure:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose **New model**.

1. In the **Create new model** dialog box, do the following:

   1. Enter a name in the **Model name** field.

   1. Select the **Text analysis** problem type.

   1. Choose **Create**.

1. For **Select dataset**, select your dataset from the list of datasets. If you haven’t already imported your data, choose **Import** to be directed through the import data workflow.

1. When you’re ready to begin building your model, choose **Select dataset**.

1. On the **Build** tab, for the **Target column** dropdown list, select the target for your model that you would like to predict. The target column must have a binary or categorical data type, and there must be at least 25 entries (or rows of data) for each unique label in the target column.

1. For **Model type**, confirm that the model type is automatically set to **Multi-category text prediction**.

1. For the training column, select your source column of text data. This should be the column containing the text that you want to analyze.

1. Choose **Quick build** or **Standard build** to begin building your model. The following screenshot shows the **Build** page of a text prediction model that is ready to be built.  
![\[The Build page for a multi-category text prediction model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/build-page-text-model.png)

After your model begins building, you can leave the page. When the model shows as **Ready** on the **My models** page, it’s ready for analysis and predictions.

## Build a time series forecasting model
<a name="canvas-build-model-forecasting"></a>

Time series forecasting models support both **Quick builds** and **Standard builds**.

To build a time series forecasting model, use the following procedure:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose **New model**.

1. In the **Create new model** dialog box, do the following:

   1. Enter a name in the **Model name** field.

   1. Select the **Time series forecasting** problem type.

   1. Choose **Create**.

1. For **Select dataset**, select your dataset from the list of datasets. If you haven’t already imported your data, choose **Import** to be directed through the import data workflow.

1. When you’re ready to begin building your model, choose **Select dataset**.

1. On the **Build** tab, for the **Target column** dropdown list, select the target for your model that you would like to predict.

1. In the **Model type** section, choose **Configure model**.

1. The **Configure model** box opens. For the **Time series configuration** section, fill out the following fields:

   1. For **Item ID column**, choose a column in your dataset that uniquely identifies each row. The column should have a data type of `Text`.

   1. (Optional) For **Group column**, choose one or more categorical columns (with a data type of `Text`) that you want to use for grouping your forecasting values.

   1. For **Time stamp column**, select the column with timestamps (in datetime format). For more information about the accepted datetime formats, see [Time Series Forecasts in Amazon SageMaker Canvas](canvas-time-series.md).

   1. For the **Forecast length** field, enter the period of time for which you want to forecast values. Canvas automatically detects the units of time in your data.

   1. (Optional) Turn on the **Use holiday schedule** toggle to select a holiday schedule from various countries and make your forecasts with holiday data more accurate.

1. In the **Configure model** box, there are additional settings in the **Advanced** section. For more information about each of the advanced settings, see [Advanced model building configurations](canvas-advanced-settings.md). To configure the **Advanced** settings, do the following:

   1. For the **Objective metric** dropdown menu, select the metric that you want Canvas to optimize while building your model. If you don’t select a metric, Canvas chooses one for you by default. For descriptions of the available metrics, see [Metrics reference](canvas-metrics.md).

   1. If you’re running a standard build, you’ll see the **Algorithms** section. This section is for selecting the time series forecasting algorithms that you’d like to use for building your model. You can select a subset of the available algorithms, or you can select all of them if you aren’t sure which ones to try.

      When you run your standard build, Canvas builds an ensemble model that combines all of the algorithms together to optimize prediction accuracy.
**Note**  
If you’re running a quick build, Canvas uses a single tree-based learning algorithm to train your model, and you don’t have to select any algorithms.

   1. For **Forecast quantiles**, enter up to 5 comma-separated quantile values to specify the upper and lower bounds of your forecast.

   1. After configuring the **Advanced** settings, choose **Save**.

1. Select or deselect columns in your data to include or drop them from your build.
**Note**  
If you make batch predictions with your model after building, Canvas adds dropped columns to your prediction results. However, Canvas does not add the dropped columns to your batch predictions for time series models.

1. (Optional) Use the visualization and analytics tools that Canvas provides to visualize your data and determine which features you might want to include in your model. For more information, see [Explore and analyze your data](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-explore-data.html).

1. (Optional) Use data transformations to clean, transform, and prepare your data for model building. For more information, see [ Prepare your data with advanced transformations](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-prepare-data.html). You can view and remove your transforms by choosing **Model recipe** to open the **Model recipe** side panel.

1. (Optional) For additional features such as previewing the accuracy of your model, validating your dataset, and changing the size of the random sample that Canvas takes from your dataset, see [Preview your model](canvas-preview-model.md).

1. After reviewing your data and making any changes to your dataset, choose **Quick build** or **Standard build** to begin a build for your model.

After your model begins building, you can leave the page. When the model shows as **Ready** on the **My models** page, it’s ready for analysis and predictions.

# Advanced model building configurations
<a name="canvas-advanced-settings"></a>

Amazon SageMaker Canvas supports various advanced settings that you can configure when building a model. The following page lists all of the advanced settings along with additional information about their options and configurations.

**Note**  
The following advanced settings are currently only supported for numeric, categorical, and time series forecasting model types.

## Advanced numeric and categorical prediction model settings
<a name="canvas-advanced-settings-predictive"></a>

Canvas supports the following advanced settings for numeric and categorical prediction model types.

### Objective metric
<a name="canvas-advanced-settings-predictive-obj-metric"></a>

The objective metric is the metric that you want Canvas to optimize while building your model. If you don’t select a metric, Canvas chooses one for you by default. For descriptions of the available metrics, see the [Metrics reference](canvas-metrics.md).

### Training method
<a name="canvas-advanced-settings-predictive-method"></a>

Canvas can automatically select the training method based on the dataset size, or you can select it manually. The following training methods are available for you to choose from:
+ **Ensembling** – SageMaker AI leverages the AutoGluon library to train several base models. To find the best combination for your dataset, ensemble mode runs 5–10 trials with different model and meta parameter settings. Then, these models are combined using a stacking ensemble method to create an optimal predictive model. For a list of algorithms supported by ensemble mode for tabular data, see the following [Algorithms](#canvas-advanced-settings-predictive-algos) section.
+ **Hyperparameter optimization (HPO)** – SageMaker AI finds the best version of a model by tuning hyperparameters using Bayesian optimization or multi-fidelity optimization while running training jobs on your dataset. HPO mode selects the algorithms that are most relevant to your dataset and selects the best range of hyperparameters to tune your models. To tune your models, HPO mode runs up to 100 trials (default) to find the optimal hyperparameters settings within the selected range. If your dataset size is less than 100 MB, SageMaker AI uses Bayesian optimization. SageMaker AI chooses multi-fidelity optimization if your dataset is larger than 100 MB.

  For a list of algorithms supported by HPO mode for tabular data, see the following [Algorithms](#canvas-advanced-settings-predictive-algos) section.
+ **Auto** – SageMaker AI automatically chooses either ensembling mode or HPO mode based on your dataset size. If your dataset is larger than 100 MB, SageMaker AI chooses HPO mode. Otherwise, it chooses ensembling mode.

### Algorithms
<a name="canvas-advanced-settings-predictive-algos"></a>

In **Ensembling** mode, Canvas supports the following machine learning algorithms:
+ [LightGBM](https://docs.aws.amazon.com/sagemaker/latest/dg/lightgbm.html) – An optimized framework that uses tree-based algorithms with gradient boosting. This algorithm uses trees that grow in breadth, rather than depth, and is highly optimized for speed.
+ [CatBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/catboost.html) – A framework that uses tree-based algorithms with gradient boosting. Optimized for handling categorical variables.
+ [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) – A framework that uses tree-based algorithms with gradient boosting that grows in depth, rather than breadth.
+ [Random Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) – A tree-based algorithm that uses several decision trees on random sub-samples of the data with replacement. The trees are split into optimal nodes at each level. The decisions of each tree are averaged together to prevent overfitting and improve predictions.
+ [Extra Trees](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html#sklearn.ensemble.ExtraTreesClassifier) – A tree-based algorithm that uses several decision trees on the entire dataset. The trees are split randomly at each level. The decisions of each tree are averaged to prevent overfitting and to improve predictions. Extra trees add a degree of randomization in comparison to the random forest algorithm.
+ [Linear Models](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model) – A framework that uses a linear equation to model the relationship between two variables in observed data.
+ Neural network torch – A neural network model that's implemented using [Pytorch](https://pytorch.org/).
+ Neural network fast.ai – A neural network model that's implemented using [fast.ai](https://www.fast.ai/).

In **HPO mode**, Canvas supports the following machine learning algorithms:
+ [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) – A supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models.
+ Deep learning algorithm – A multilayer perceptron (MLP) and feedforward artificial neural network. This algorithm can handle data that is not linearly separable.

### Data split
<a name="canvas-advanced-settings-predictive-split"></a>

You have the option to specify how you want to split your dataset between the training set (the portion of your dataset used for building the model) and the validation set, (the portion of your dataset used for verifying the model’s accuracy). For example, a common split ratio is 80% training and 20% validation, where 80% of your data is used to build the model while 20% is saved for measuring model performance. If you don’t specify a custom ratio, then Canvas splits your dataset automatically.

### Max candidates
<a name="canvas-advanced-settings-predictive-candidates"></a>

**Note**  
This feature is only available in the HPO training mode.

You can specify the maximum number of model candidates that Canvas generates while building your model. We recommend that you use the default number of candidates, which is 100, to build the most accurate models. The maximum number you can specify is 250. Decreasing the number of model candidates may impact your model’s accuracy.

### Max job runtime
<a name="canvas-advanced-settings-predictive-runtime"></a>

You can specify the maximum job runtime, or the maximum amount of time that Canvas spends building your model. After the time limit, Canvas stops building and selects the best model candidate.

The maximum time that you can specify is 720 hours. We highly recommend that you keep the maximum job runtime greater than 30 minutes to ensure that Canvas has enough time to generate model candidates and finish building your model.

## Advanced time series forecasting model settings
<a name="canvas-advanced-settings-time-series"></a>

For time series forecasting models, Canvas supports the Objective metric, which is listed in the previous section.

Time series forecasting models also support the following advanced setting:

### Algorithm selection
<a name="canvas-advanced-settings-time-series-algos"></a>

When you build a time series forecasting model, Canvas uses an *ensemble* (or a combination) of statistical and machine learning algorithms to deliver highly accurate time series forecasts. By default, Canvas selects the optimal combination of all the available algorithms based on the time series in your dataset. However, you have the option to specify one or more algorithms to use for your forecasting model. In this case, Canvas determines the best blend using only your selected algorithms. If you're uncertain about which algorithm to select for training your model, we recommend that you choose all of the available algorithms.

**Note**  
Algorithm selection is only supported for standard builds. If you don’t select any algorithms in the advanced settings, then by default SageMaker AI runs a quick build and trains model candidates using a single tree-based learning algorithm. For more information about the difference between quick builds and standard builds, see [How custom models work](canvas-build-model.md).

Canvas supports the following time series forecasting algorithms:
+ [ Autoregressive Integrated Moving Average (ARIMA)](https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average) – A simple stochastic time series model that uses statistical analysis to interpret the data and make future predictions. This algorithm is useful for simple datasets with fewer than 100 time series.
+ [ Convolutional Neural Network - Quantile Regression (CNN-QR)](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-algo-cnnqr.html) – A proprietary, supervised learning algorithm that trains one global model from a large collection of time series and uses a quantile decoder to make predictions. CNN-QR works best with large datasets containing hundreds of time series.
+ [ DeepAR\$1](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-deeparplus.html) – A proprietary, supervised learning algorithm for forecasting scalar time series using recurrent neural networks (RNNs) to train a single model jointly over all of the time series. DeepAR\$1 works best with large datasets containing hundreds of feature time series.
+ [ Non-Parametric Time Series (NPTS)](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-npts.html) – A scalable, probabilistic baseline forecaster that predicts the future value distribution of a given time series by sampling from past observations. NPTS is useful when working with sparse or intermittent time series (for example, forecasting demand for individual items where the time series has many 0s or low counts).
+ [Exponential Smoothing (ETS)](https://en.wikipedia.org/wiki/Exponential_smoothing) – A forecasting method that produces forecasts which are weighted averages of past observations where the weights of older observations exponentially decrease. The algorithm is useful for simple datasets with fewer than 100 time series and datasets with seasonality patterns.
+ [Prophet](https://facebook.github.io/prophet/) – An additive regression model that works best with time series that have strong seasonal effects and several seasons of historical data. The algorithm is useful for datasets with non-linear growth trends that approach a limit.

### Forecast quantiles
<a name="canvas-advanced-settings-time-series-quantiles"></a>

For time series forecasting, SageMaker AI trains 6 model candidates with your target time series. Then, SageMaker AI combines these models using a stacking ensemble method to create an optimal forecasting model for a given objective metric. Each forecasting model generates a probabilistic forecast by producing forecasts at quantiles between P1 and P99. These quantiles are used to account for forecast uncertainty. By default, forecasts are generated for 0.1 (`p10`), 0.5 (`p50`), and 0.9 (`p90`). You can choose to specify up to five of your own quantiles from 0.01 (`p1`) to 0.99 (`p99`), by increments of 0.01 or higher.

# Edit an image dataset
<a name="canvas-edit-image"></a>

In Amazon SageMaker Canvas, you can edit your image datasets and review your labels before building a model. You might want to perform tasks such as assigning labels to unlabeled images or adding more images to the dataset. These tasks can all be done in the Canvas application, providing you with one place to modify your dataset and build a model.

**Note**  
Before building a model, you must assign labels to all images in your dataset. Also, you must have at least 25 images per label and a minimum of two labels. For more information about assigning labels, see the section on this page called **Assign labels to unlabeled images**. If you can’t determine a label for an image, you should delete it from your dataset. For more information about deleting images, see the section on this page [Add or delete images from the dataset](#canvas-edit-image-add-delete).

To begin editing your image dataset, you should be on the **Build** tab while building your single-label image prediction model.

A new page opens that shows the images in your dataset along with their labels. This page categorizes your image dataset into **Total images**, **Labeled images**, and **Unlabeled images**. You can also review the **Dataset preparation guide** for best practices on building a more accurate image prediction model.

The following screenshot shows the page for editing your image dataset.

![\[Screenshot of the image dataset management page in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/dataset-management-page.png)


From this page, you can do the following actions.

## View the properties for each image (label, size, dimensions)
<a name="canvas-edit-image-view"></a>

To view an individual image, you can search for it by file name in the search bar. Then, choose the image to open the full view. You can view the image properties and reassign the image’s label. Choose **Save** when you’re doing viewing the image.

## Add, rename, or delete labels in the dataset
<a name="canvas-edit-image-labels"></a>

Canvas lists the labels for your dataset in the left navigation pane. You can add new labels to the dataset by entering a label in the **Add label** text field.

To rename or delete a label from your dataset, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) next to the label and select either **Rename** or **Delete**. If you rename the label, you can enter the new label name and choose **Confirm**. If you delete the label, the label is removed from all images in your dataset that have that label. Any images with that label are left unlabeled.

## Assign labels to unlabeled images
<a name="canvas-edit-image-assign"></a>

To view the unlabeled images in your dataset, choose **Unlabeled** in the left navigation pane. For each image, select it and open the label titled **Unlabeled** and select a label to assign to the image from the dropdown list. You can also select more than one image and perform this action, and all selected images are assigned the label you chose.

## Reassign labels to images
<a name="canvas-edit-image-reassign"></a>

You can reassign labels to images by selecting the image (or multiple images at a time) and opening the dropdown titled with the current label. Select your desired label, and the image or images are updated with the new label.

## Sort your images by label
<a name="canvas-edit-image-sort"></a>

You can view all the images for a given label by choosing the label in the left navigation pane.

## Add or delete images from the dataset
<a name="canvas-edit-image-add-delete"></a>

You can add more images to your dataset by choosing **Add images** in the top navigation pane. You’ll be taken through the workflow to import more images. The images you import are added to your existing dataset.

You can delete images from your dataset by selecting them and then choosing **Delete** in the top navigation pane.

**Note**  
After making any changes to your dataset, choose **Save dataset** to make sure that you don’t lose your changes.

# Data exploration and analysis
<a name="canvas-explore-data"></a>

**Note**  
You can only use SageMaker Canvas visualizations and analytics for models built on tabular datasets. Multi-category text prediction models are also excluded.

In Amazon SageMaker Canvas, you can explore the variables in your dataset using visualizations and analytics and create in-application visualizations and analytics. You can use these explorations to uncover relationships between your variables before building your model.

For more information about visualization techniques in Canvas, see [Explore your data using visualization techniques](canvas-explore-data-visualization.md).

For more information about analytics in Canvas, see [Explore your data using analytics](canvas-explore-data-analytics.md).

# Explore your data using visualization techniques
<a name="canvas-explore-data-visualization"></a>

**Note**  
You can only use SageMaker Canvas visualizations for models built on tabular datasets. Multi-category text prediction models are also excluded.

With Amazon SageMaker Canvas, you can explore and visualize your data to gain advanced insights into your data before building your ML models. You can visualize using scatter plots, bar charts, and box plots, which can help you understand your data and discover the relationships between features that could affect the model accuracy.

In the **Build** tab of the SageMaker Canvas application, choose **Data visualizer** to begin creating your visualizations.

You can change the visualization sample size to adjust the size of the random sample taken from your dataset. A sample size that is too large might affect the performance of your data visualizations, so we recommend that you choose an appropriate sample size. To change the sample size, use the following procedure.

1. Choose **Visualization sample**.

1. Use the slider to select your desired sample size.

1. Choose **Update** to confirm the change to your sample size.

**Note**  
Certain visualization techniques require columns of a specific data type. For example, you can only use numeric columns for the x and y-axes of scatter plots.

## Scatter plot
<a name="canvas-explore-data-scatterplot"></a>

To create a scatter plot with your dataset, choose **Scatter plot** in the **Visualization** panel. Choose the features you want to plot on the x and y-axes from the **Columns** section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.

You can use **Color by** to color the data points on the plot with a third feature. You can also use **Group by** to group the data into separate plots based on a fourth feature.

The following image shows a scatter plot that uses **Color by** and **Group by**. In this example, each data point is colored by the `MaritalStatus` feature, and grouping by the `Department` feature results in a scatter plot for the data points of each department.

![\[Screenshot of a scatter plot in the Data visualizer view of the Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-eda-scatter-plot.png)


## Bar chart
<a name="canvas-explore-data-barchart"></a>

To create a bar chart with your dataset, choose **Bar chart** in the **Visualization** panel. Choose the features you want to plot on the x and y-axes from the **Columns** section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.

You can use **Group by** to group the bar chart by a third feature. You can use **Stack by** to vertically shade each bar based on the unique values of a fourth feature.

The following image shows a bar chart that uses **Group by** and **Stack by**. In this example, the bar chart is grouped by the `MaritalStatus` feature and stacked by the `JobLevel` feature. For each `JobRole` on the x axis, there is a separate bar for the unique categories in the `MaritalStatus` feature, and every bar is vertically stacked by the `JobLevel` feature.

![\[Screenshot of a bar chart in the Data visualizer view of the Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-eda-bar-chart.png)


## Box plot
<a name="canvas-explore-data-boxplot"></a>

To create a box plot with your dataset, choose **Box plot** in the **Visualization** panel. Choose the features you want to plot on the x and y-axes from the **Columns** section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.

You can use **Group by** to group the box plots by a third feature.

The following image shows a box plot that uses **Group by**. In this example, the x and y-axes show `JobLevel` and `JobSatisfaction`, respectively, and the colored box plots are grouped by the `Department` feature.

![\[Screenshot of a box plot in the Data visualizer view of the Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-eda-box-plot.png)


# Explore your data using analytics
<a name="canvas-explore-data-analytics"></a>

**Note**  
You can only use SageMaker Canvas analytics for models built on tabular datasets. Multi-category text prediction models are also excluded.

With analytics in Amazon SageMaker Canvas, you can explore your dataset and gain insight on all of your variables before building a model. You can determine the relationships between features in your dataset using correlation matrices. You can use this technique to summarize your dataset into a matrix that shows the correlations between two or more values. This helps you identify and visualize patterns in a given dataset for advanced data analysis.

The matrix shows the correlation between each feature as positive, negative, or neutral. You might want to include features that have a high correlation with each other when building your model. Features that have little to no correlation might be irrelevant to your model, and you can drop those features when building your model.

To get started with correlation matrices in SageMaker Canvas, see the following section.

## Create a correlation matrix
<a name="canvas-explore-data-analytics-correlation-matrix"></a>

You can create a correlation matrix when you are preparing to build a model in the **Build** tab of the SageMaker Canvas application.

For instructions on how to begin creating a model, see [Build a model](canvas-build-model-how-to.md).

After you’ve started preparing a model in the SageMaker Canvas application, do the following:

1. In the **Build** tab, choose **Data visualizer**.

1. Choose **Analytics**.

1. Choose **Correlation matrix**.

You should see a visualization similar to the following screenshot, which shows up to 15 columns of the dataset organized into a correlation matrix.

![\[Screenshot of a correlation matrix in the Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-correlation-matrix-2.png)


After you’ve created the correlation matrix, you can customize it by doing the following:

### 1. Choose your columns
<a name="canvas-explore-data-analytics-correlation-matrix-columns"></a>

For **Columns**, you can select the columns that you want to include in the matrix. You can compare up to 15 columns from your dataset.

**Note**  
You can use numeric, categorical, or binary column types for a correlation matrix. The correlation matrix doesn’t support datetime or text data column types.

To add or remove columns from the correlation matrix, select and deselect columns from the **Columns** panel. You can also drag and drop columns from the panel directly onto the matrix. If your dataset has a lot of columns, you can search for the columns you want in the **Search columns** bar.

To filter the columns by data type, choose the dropdown list and select **All**, **Numeric**, or **Categorical**. Selecting **All** shows you all of the columns from your dataset, whereas the **Numeric** and **Categorical** filters only show you the numeric or categorical columns in your dataset. Note that binary column types are included in the numeric or categorical filters.

For the best data insights, include your target column in the correlation matrix. When you include your target column in the correlation matrix, it appears as the last feature on the matrix with a target symbol.

### 2. Choose your correlation type
<a name="canvas-explore-data-analytics-correlation-matrix-cor-type"></a>

SageMaker Canvas supports different *correlation types*, or methods for calculating the correlation between your columns.

To change the correlation type, use the **Columns** filter mentioned in the preceding section to filter for your desired column type and columns. You should see the **Correlation type** in the side panel. For numeric comparisons, you have the option to select either **Pearson** or **Spearman**. For categorical comparisons, the correlation type is set as **MI**. For categorical and mixed comparisons, the correlation type is set as **Spearman & MI**.

For matrices that only compare numeric columns, the correlation type is either Pearson or Spearman. The Pearson measure evaluates the linear relationship between two continuous variables. The Spearman measure evaluates the monotonic relationship between two variables. For both Pearson and Spearman, the scale of correlation ranges from -1 to 1, with either end of the scale indicating a perfect correlation (a direct 1:1 relationship) and 0 indicating no correlation. You might want to select Pearson if your data has more linear relationships (as revealed by a [scatter plot visualization](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-explore-data.html#canvas-explore-data-scatterplot)). If your data is not linear, or contains a mixture of linear and monotonic relationships, then you might want to select Spearman.

For matrices that only compare categorical columns, the correlation type is set to Mutual Information Classification (MI). The MI value is a measure of the mutual dependence between two random variables. The MI measure is on a scale of 0 to 1, with 0 indicating no correlation and 1 indicating a perfect correlation.

For matrices that compare a mix of numeric and categorical columns, the correlation type **Spearman & MI** is a combination of the Spearman and MI correlation types. For correlations between two numeric columns, the matrix shows the Spearman value. For correlations between a numeric and categorical column or two categorical columns, the matrix shows the MI value.

Lastly, remember that correlation does not necessarily indicate causation. A strong correlation value only indicates that there is a relationship between two variables, but the variables might not have a causal relationship. Carefully review your columns of interest to avoid bias when building your model.

### 3. Filter your correlations
<a name="canvas-explore-data-analytics-correlation-matrix-filter"></a>

In the side panel, you can use the **Filter correlations** feature to filter for the range of correlation values that you want to include in the matrix. For example, if you want to filter for features that only have positive or neutral correlation, you can set the **Min** to 0 and the **Max** to 1 (valid values are -1 to 1).

For Spearman and Pearson comparisons, you can set the **Filter correlations** range anywhere from -1 to 1, with 0 meaning that there is no correlation. -1 and 1 mean that the variables have a strong negative or positive correlation, respectively.

For MI comparisons, the correlation range only goes from 0 to 1, with 0 meaning that there is no correlation and 1 meaning that the variables have a strong correlation, either positive or negative.

Each feature has a perfect correlation (1) with itself. Therefore, you might notice that the top row of the correlation matrix is always 1. If you want to exclude these values, you can use the filter to set the **Max** less than 1.

Keep in mind that if your matrix compares a mix of numeric and categorical columns and uses the **Spearman & MI** correlation type, then the *categorical x numeric* and *categorical x categorical* correlations (which use the MI measure) are on a scale of 0 to 1, whereas the *numeric x numeric* correlations (which use the Spearman measure) are on a scale of -1 to 1. Review your correlations of interest carefully to ensure that you know the correlation type being used to calculate each value.

### 4. Choose the visualization method
<a name="canvas-explore-data-analytics-correlation-matrix-viz-method"></a>

In the side panel, you can use **Visualize by** to change the visualization method of the matrix. Choose the **Numeric** visualization method to show the correlation (Pearson, Spearman, or MI) value, or choose the **Size** visualization method to visualize the correlation with differently sized and colored dots. If you choose **Size**, you can hover over a specific dot on the matrix to see the actual correlation value.

### 5. Choose a color palette
<a name="canvas-explore-data-analytics-correlation-matrix-color"></a>

In the side panel, you can use **Color selection** to change the color palette used for the scale of negative to positive correlation in the matrix. Select one of the alternative color palettes to change the colors used in the matrix.

# Prepare data for model building
<a name="canvas-prepare-data"></a>

**Note**  
You can now do advanced data preparation in SageMaker Canvas with Data Wrangler, which provides you with a natural language interface and over 300 built-in transformations. For more information, see [Data preparation](canvas-data-prep.md).

Your machine learning dataset might require data preparation before you build your model. You might want to clean your data due to various issues, which might include missing values or outliers, and perform feature engineering to improve the accuracy of your model. Amazon SageMaker Canvas provides ML data transforms with which you can clean, transform, and prepare your data for model building. You can use these transforms on your datasets without any code. SageMaker Canvas adds the transforms you use to the **Model recipe**, which is a record of the data preparation done on your data before building the model. Any data transforms you use only modify the input data for model building and do not modify your original data source.

The preview of your dataset shows the first 100 rows of the dataset. If your dataset has more than 20,000 rows, Canvas takes a random sample of 20,000 rows and previews the first 100 rows from that sample. You can only search for and specify values from the previewed rows, and the filter functionality only filters the previewed rows and not the entire dataset.

The following transforms are available in SageMaker Canvas for you to prepare your data for building.

**Note**  
You can only use advanced transformations for models built on tabular datasets. Multi-category text prediction models are also excluded.

## Drop columns
<a name="canvas-prepare-data-drop"></a>

You can exclude a column from your model build by dropping it in the **Build** tab of the SageMaker Canvas application. Deselect the column you want to drop, and it isn't included when building the model.

**Note**  
If you drop columns and then make [batch predictions](canvas-make-predictions.md) with your model, SageMaker Canvas adds the dropped columns back to the ouput dataset available for you to download. However, SageMaker Canvas does not add the dropped columns back for time series models.

## Filter rows
<a name="canvas-prepare-data-filter"></a>

The filter functionality filters the previewed rows (the first 100 rows of your dataset) according to conditions that you specify. Filtering rows creates a temporary preview of the data and does not impact the model building. You can filter to preview rows that have missing values, contain outliers, or meet custom conditions in a column you choose.

### Filter rows by missing values
<a name="canvas-prepare-data-filter-missing"></a>

Missing values are a common occurrence in machine learning datasets. If you have rows with null or empty values in certain columns, you might want to filter for and preview those rows.

To filter missing values from your previewed data, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Filter by rows ** (![\[Filter icon in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/filter-icon.png)).

1. Choose the **Column** you want to check for missing values.

1. For the **Operation**, choose **Is missing**.

SageMaker Canvas filters for rows that contain missing values in the **Column** you selected and provides a preview of the filtered rows.

![\[Screenshot of the filter by missing values operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-filter-missing.png)


### Filter rows by outliers
<a name="canvas-prepare-data-filter-outliers"></a>

Outliers, or rare values in the distribution and range of your data, can negatively impact model accuracy and lead to longer building times. SageMaker Canvas enables you to detect and filter rows that contain outliers in numeric columns. You can choose to define outliers with either standard deviations or a custom range.

To filter for outliers in your data, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Filter by rows ** (![\[Filter icon in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/filter-icon.png)).

1. Choose the **Column** you want to check for outliers.

1. For the **Operation**, choose **Is outlier**.

1. Set the **Outlier range** to either **Standard deviation** or **Custom range**.

1. If you choose **Standard deviation**, specify a **SD** (standard deviation) value from 1–3. If you choose **Custom range**, select either **Percentile** or **Number**, and then specify the **Min** and **Max** values.

The **Standard deviation** option detects and filters for outliers in numeric columns using the mean and standard deviation. You specify the number of standard deviations a value must vary from the mean to be considered an outlier. For example, if you specify `3` for **SD**, a value must fall more than 3 standard deviations from the mean to be considered an outlier.

The **Custom range** option detects and filters for outliers in numeric columns using minimum and maximum values. Use this method if you know your threshold values that delimit outliers. You can set the **Type** of the range to either **Percentile** or **Number**. If you choose **Percentile**, the **Min** and **Max** values should be the minimum and maximum of the percentile range (0-100) that you want to allow. If you choose **Number**, the **Min** and **Max** values should be the minimum and maximum numeric values that you want to filter in the data.

![\[Screenshot of the filter by outliers operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-filter-outlier.png)


### Filter rows by custom values
<a name="canvas-prepare-data-filter-custom"></a>

You can filter for rows with values that meet custom conditions. For example, you might want to preview rows that have a price value greater than 100 before removing them. With this functionality, you can filter rows that exceed the threshold you set and preview the filtered data.

To use the custom filter functionality, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Filter by rows** (![\[Filter icon in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/filter-icon.png)).

1. Choose the **Column** you want to check.

1. Select the type of **Operation** you want to use, and then specify the values for the selected condition.

For the **Operation**, you can choose one of the following options. Note that the available operations depend on the data type of the column you choose. For example, you cannot create a `is greater than` operation for a column containing text values.


| Operation | Supported data type | Supported feature type | Function | 
| --- | --- | --- | --- | 
|  Is equal to  |  Numeric, Text  | Binary, Categorical |  Filters rows where the value in **Column** equals the values you specify.  | 
|  Is not equal to  |  Numeric, Text  | Binary, Categorical |  Filters rows where the value in **Column** doesn't equal the values you specify.  | 
|  Is less than  |  Numeric  | N/A |  Filters rows where the value in **Column** is less than the value you specify.  | 
|  Is less than or equal to  |  Numeric  | N/A |  Filters rows where the value in **Column** is less than or equal to the value you specify.  | 
|  Is greater than  |  Numeric  | N/A |  Filters rows where the value in **Column** is greater than the value you specify.  | 
|  Is greater than or equal to  |  Numeric  | N/A |  Filters rows where the value in **Column** is greater than or equal to the value you specify.  | 
|  Is between  |  Numeric  | N/A |  Filters rows where the value in **Column** is between or equal to two values you specify.  | 
|  Contains  |  Text  | Categorical |  Filters rows where the value in **Column** contains a values you specify.  | 
|  Starts with  |  Text  | Categorical |  Filters rows where the value in **Column** begins with a value you specify.  | 
|  Ends with  |  Categorical  | Categorical |  Filters rows where the value in **Column** ends with a value you specify.  | 

After you set the filter operation, SageMaker Canvas updates the preview of the dataset to show you the filtered data.

![\[Screenshot of the filter by custom values operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-filter-custom.png)


## Functions and operators
<a name="canvas-prepare-data-custom-formula"></a>

You can use mathematical functions and operators to explore and distribute your data. You can use the SageMaker Canvas supported functions or create your own formula with your existing data and create a new column with the result of the formula. For example, you can add the corresponding values of two columns and save the result to a new column.

You can nest statements to create more complex functions. The following are some examples of nested functions that you might use.
+ To calculate BMI, you could use the function `weight / (height ^ 2)`.
+ To classify ages, you could use the function `Case(age < 18, 'child', age < 65, 'adult', 'senior')`.

You can specify functions in the data preparation stage before you build your model. To use a function, do the following.
+ In the **Build** tab of the SageMaker Canvas application, choose **View all** and then choose **Custom formula** to open the **Custom formula** panel.
+ In the **Custom formula** panel, you can choose a **Formula** to add to your **Model Recipe**. Each formula is applied to all of the values in the columns you specify. For formulas that accept two or more columns as arguments, use columns with matching data types; otherwise, you get an error or `null` values in the new column. 
+ After you’ve specified a **Formula**, add a column name in the **New Column Name** field. SageMaker Canvas uses this name for the new column that is created.
+ (Optional) Choose **Preview** to preview your transform.
+ To add the function to your **Model Recipe**, choose **Add**.

SageMaker Canvas saves the result of your function to a new column using the name you specified in **New Column Name**. You can view or remove functions from the **Model Recipe** panel.

SageMaker Canvas supports the following operators for functions. You can use either the text format or the in-line format to specify your function.


| Operator | Description | Supported data types | Text format | In-line format | 
| --- | --- | --- | --- | --- | 
|  Add  |  Returns the sum of the values  |  Numeric  | Add(sales1, sales2) | sales1 \$1 sales2 | 
|  Subtract  |  Returns the difference between the values  |  Numeric  | Subtract(sales1, sales2) | sales1 ‐ sales2 | 
|  Multiply  |  Returns the product of the values  |  Numeric  | Multiply(sales1, sales2) | sales1 \$1 sales2 | 
|  Divide  |  Returns the quotient of the values  |  Numeric  | Divide(sales1, sales2) | sales1 / sales2 | 
|  Mod  |  Returns the result of the modulo operator (the remainder after dividing the two values)  |  Numeric  | Mod(sales1, sales2) | sales1 % sales2 | 
|  Abs  | Returns the absolute value of the value |  Numeric  | Abs(sales1) | N/A | 
|  Negate  | Returns the negative of the value |  Numeric  | Negate(c1) | ‐c1 | 
|  Exp  |  Returns e (Euler's number) raised to the power of the value  |  Numeric  | Exp(sales1) | N/A | 
|  Log  |  Returns the logarithm (base 10) of the value  |  Numeric  | Log(sales1) | N/A | 
|  Ln  |  Returns the natural logarithm (base e) of the value  |  Numeric  | Ln(sales1) | N/A | 
|  Pow  |  Returns the value raised to a power  |  Numeric  | Pow(sales1, 2) | sales1 ^ 2 | 
|  If  |  Returns a true or false label based on a condition you specify  |  Boolean, Numeric, Text  | If(sales1>7000, 'truelabel, 'falselabel') | N/A | 
|  Or  |  Returns a Boolean value of whether one of the specified values or conditions is true or not  |  Boolean  | Or(fullprice, discount) | fullprice \$1\$1 discount | 
|  And  |  Returns a Boolean value of whether two of the specified values or conditions are true or not  |  Boolean  | And(sales1,sales2) | sales1 && sales2 | 
|  Not  |  Returns a Boolean value that is the opposite of the specified value or conditions  |  Boolean  | Not(sales1) | \$1sales1 | 
|  Case  |  Returns a Boolean value based on conditional statements (returns c1 if cond1 is true, returns c2 if cond2 is true, else returns c3)  |  Boolean, Numeric, Text  | Case(cond1, c1, cond2, c2, c3) | N/A | 
|  Equal  |  Returns a Boolean value of whether two values are equal  |  Boolean, Numeric, Text  | N/A | c1 = c2c1 == c2 | 
|  Not equal  |  Returns a Boolean value of whether two values are not equal  |  Boolean, Numeric, Text  | N/A | c1 \$1= c2 | 
|  Less than  |  Returns a Boolean value of whether c1 is less than c2  |  Boolean, Numeric, Text  | N/A | c1 < c2 | 
|  Greater than  |  Returns a Boolean value of whether c1 is greater than c2  |  Boolean, Numeric, Text  | N/A | c1 > c2 | 
|  Less than or equal  |  Returns a Boolean value of whether c1 is less than or equal to c2  |  Boolean, Numeric, Text  | N/A | c1 <= c2 | 
|  Greater than or equal  |  Returns a Boolean value of whether c1 is greater than or equal to c2  |  Boolean, Numeric, Text  | N/A | c1 >= c2 | 

SageMaker Canvas also supports aggregate operators, which can perform operations such as calculating the sum of all the values or finding the minimum value in a column. You can use aggregate operators in combination with standard operators in your functions. For example, to calculate the difference of values from the mean, you could use the function `Abs(height – avg(height))`. SageMaker Canvas supports the following aggregate operators.


| Aggregate operator | Description | Format | Example | 
| --- | --- | --- | --- | 
|  sum  |  Returns the sum of all the values in a column  | sum | sum(c1) | 
|  minimum  |  Returns the minimum value of a column  | min | min(c2) | 
|  maximum  |  Returns the maximum value of a column  | max | max(c3) | 
|  average  |  Returns the average value of a column  | avg | avg(c4) | 
|  std  | Returns the sample standard deviation of a column | std | std(c1) | 
|  stddev  | Returns the standard deviation of the values in a column | stddev | stddev(c1) | 
|  variance  | Returns the unbiased variance of the values in a column | variance | variance(c1) | 
|  approx\$1count\$1distinct  | Returns the approximate number of distinct items in a column | approx\$1count\$1distinct | approx\$1count\$1distinct(c1) | 
|  count  | Returns the number of items in a column | count | count(c1) | 
|  first  |  Returns the first value of a column  | first | first(c1) | 
|  last  |  Returns the last value of a column  | last | last(c1) | 
|  stddev\$1pop  | Returns the population standard deviation of a column | stddev\$1pop | stddev\$1pop(c1) | 
|  variance\$1pop  |  Returns the population variance of the values in a column  | variance\$1pop | variance\$1pop(c1) | 

## Manage rows
<a name="canvas-prepare-data-manage"></a>

With the Manage rows transform, you can perform sort, random shuffle, and remove rows of data from the dataset.

### Sort rows
<a name="canvas-prepare-data-manage-sort"></a>

To sort the rows in a dataset by a given column, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage rows** and then choose **Sort rows**.

1. For **Sort Column**, choose the column you want to sort by.

1. For **Sort Order**, choose either **Ascending** or **Descending**.

1. Choose **Add** to add the transform to the **Model recipe**.

### Shuffle rows
<a name="canvas-prepare-data-manage-shuffle"></a>

To randomly shuffle the rows in a dataset, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage rows** and then choose **Shuffle rows**.

1. Choose **Add** to add the transform to the **Model recipe**.

### Drop duplicate rows
<a name="canvas-prepare-data-manage-drop-duplicate"></a>

To remove duplicate rows in a dataset, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage rows** and then choose **Drop duplicate rows**.

1. Choose **Add** to add the transform to the **Model recipe**.

### Remove rows by missing values
<a name="canvas-prepare-data-remove-missing"></a>

Missing values are a common occurrence in machine learning datasets and can impact model accuracy. Use this transform if you want to drop rows with null or empty values in certain columns.

To remove rows that contain missing values in a specified column, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage rows**.

1. Choose **Drop rows by missing values**.

1. Choose **Add** to add the transform to the **Model recipe**.

SageMaker Canvas drops rows that contain missing values in the **Column** you selected. After removing the rows from the dataset, SageMaker Canvas adds the transform in the **Model recipe** section. If you remove the transform from the **Model recipe** section, the rows return to your dataset.

![\[Screenshot of the remove rows by missing values operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-remove-missing.png)


### Remove rows by outliers
<a name="canvas-prepare-data-remove-outliers"></a>

Outliers, or rare values in the distribution and range of your data, can negatively impact model accuracy and lead to longer building times. With SageMaker Canvas, you can detect and remove rows that contain outliers in numeric columns. You can choose to define outliers with either standard deviations or a custom range.

To remove outliers from your data, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage rows**.

1. Choose **Drop rows by outlier values**.

1. Choose the **Column** you want to check for outliers.

1. Set the **Operator** to **Standard deviation**, **Custom numeric range**, or **Custom quantile range**.

1. If you choose **Standard deviation**, specify a **Standard deviations** (standard deviation) value from 1–3. If you choose **Custom numeric range** or **Custom quantile range**, specify the **Min** and **Max** values (numbers for numeric ranges, or percentiles between 0–100% for quantile ranges).

1. Choose **Add** to add the transform to the **Model recipe**.

The **Standard deviation** option detects and removes outliers in numeric columns using the mean and standard deviation. You specify the number of standard deviations a value must vary from the mean to be considered an outlier. For example, if you specify `3` for **Standard deviations**, a value must fall more than 3 standard deviations from the mean to be considered an outlier.

The **Custom numeric range** and **Custom quantile range** options detect and remove outliers in numeric columns using minimum and maximum values. Use this method if you know your threshold values that delimit outliers. If you choose a numeric range, the **Min** and **Max** values should be the minimum and maximum numeric values that you want to allow in the data. If you choose a quantile range, the **Min** and **Max** values should be the minimum and maximum of the percentile range (0–100) that you want to allow.

After removing the rows from the dataset, SageMaker Canvas adds the transform in the **Model recipe** section. If you remove the transform from the **Model recipe** section, the rows return to your dataset.

![\[Screenshot of the remove rows by outliers operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-remove-outlier.png)


### Remove rows by custom values
<a name="canvas-prepare-data-remove-custom"></a>

You can remove rows with values that meet custom conditions. For example, you might want to exclude all of the rows with a price value greater than 100 when building your model. With this transform, you can create a rule that removes all rows that exceed the threshold you set.

To use the custom remove transform, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage rows**.

1. Choose **Drop rows by formula**.

1. Choose the **Column** you want to check.

1. Select the type of **Operation** you want to use, and then specify the values for the selected condition.

1. Choose **Add** to add the transform to the **Model recipe**.

For the **Operation**, you can choose one of the following options. Note that the available operations depend on the data type of the column you choose. For example, you cannot create a `is greater than` operation for a column containing text values.


| Operation | Supported data type | Supported feature type | Function | 
| --- | --- | --- | --- | 
|  Is equal to  |  Numeric, Text  |  Binary, Categorical  |  Removes rows where the value in **Column** equals the values you specify.  | 
|  Is not equal to  |  Numeric, Text  |  Binary, Categorical  |  Removes rows where the value in **Column** doesn't equal the values you specify.  | 
|  Is less than  |  Numeric  | N/A |  Removes rows where the value in **Column** is less than the value you specify.  | 
|  Is less than or equal to  |  Numeric  | N/A |  Removes rows where the value in **Column** is less than or equal to the value you specify.  | 
|  Is greater than  |  Numeric  | N/A |  Removes rows where the value in **Column** is greater than the value you specify.  | 
|  Is greater than or equal to  | Numeric | N/A |  Removes rows where the value in **Column** is greater than or equal to the value you specify.  | 
|  Is between  | Numeric | N/A |  Removes rows where the value in **Column** is between or equal to two values you specify.  | 
|  Contains  |  Text  | Categorical |  Removes rows where the value in **Column** contains a values you specify.  | 
|  Starts with  |  Text  | Categorical |  Removes rows where the value in **Column** begins with a value you specify.  | 
|  Ends with  |  Text  | Categorical |  Removes rows where the value in **Column** ends with a value you specify.  | 

After removing the rows from the dataset, SageMaker Canvas adds the transform in the **Model recipe** section. If you remove the transform from the **Model recipe** section, the rows return to your dataset.

![\[Screenshot of the remove rows by custom values operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-remove-custom.png)


## Rename columns
<a name="canvas-prepare-data-rename"></a>

With the rename columns transform, you can rename columns in your data. When you rename a column, SageMaker Canvas changes the column name in the model input.

You can rename a column in your dataset by double-clicking on the column name in the **Build** tab of the SageMaker Canvas application and entering a new name. Pressing the **Enter** key submits the change, and clicking anywhere outside the input cancels the change. You can also rename a column by clicking the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), located at the end of the row in list view or at the end of the header cell in grid view, and choosing **Rename**.

Your column name can’t be longer than 32 characters or have double underscores (\$1\$1), and you can’t rename a column to the same name as another column. You also can’t rename a dropped column.

The following screenshot shows how to rename a column by double-clicking the column name.

![\[Screenshot of renaming a column with the double-click method in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-rename-column.png)


When you rename a column, SageMaker Canvas adds the transform in the **Model recipe** section. If you remove the transform from the **Model recipe** section, the column reverts to its original name.

## Manage columns
<a name="canvas-prepare-data-manage-cols"></a>

With the following transforms, you can change the data type of columns and replace missing values or outliers for specific columns. SageMaker Canvas uses the updated data types or values when building your model but doesn’t change your original dataset. Note that if you've dropped a column from your dataset using the [Drop columns](#canvas-prepare-data-drop) transform, you can't replace values in that column.

### Replace missing values
<a name="canvas-prepare-data-replace-missing"></a>

Missing values are a common occurrence in machine learning datasets and can impact model accuracy. You can choose to drop rows that have missing values, but your model is more accurate if you choose to replace the missing values instead. With this transform, you can replace missing values in numeric columns with the mean or median of the data in a column, or you can also specify a custom value with which to replace missing values. For non-numeric columns, you can replace missing values with the mode (most common value) of the column or a custom value.

Use this transform if you want to replace the null or empty values in certain columns. To replace missing values in a specified column, do the following. 

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage columns**.

1. Choose **Replace missing values**.

1. Choose the **Column** in which you want to replace missing values.

1. Set **Mode** to **Manual** to replace missing values with values that you specify. With the **Automatic (default)** setting, SageMaker Canvas replaces missing values with imputed values that best fit your data. This imputation method is done automatically for each model build, unless you specify the **Manual** mode.

1. Set the **Replace with** value:
   + If your column is numeric, then select **Mean**, **Median**, or **Custom**. **Mean** replaces missing values with the mean for the column, and **Median** replaces missing values with the median for the column. If you choose **Custom**, then you must specify a custom value that you want to use to replace missing values.
   + If your column is non-numeric, then select **Mode** or **Custom**. **Mode** replaces missing values with the mode, or the most common value, for the column. For **Custom**, specify a custom value. that you want to use to replace missing values.

1. Choose **Add** to add the transform to the **Model recipe**.

After replacing the missing values in the dataset, SageMaker Canvas adds the transform in the **Model recipe** section. If you remove the transform from the **Model recipe** section, the missing values return to the dataset.

![\[Screenshot of the replace missing values operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-replace-missing.png)


### Replace outliers
<a name="canvas-prepare-data-replace-outliers"></a>

Outliers, or rare values in the distribution and range of your data, can negatively impact model accuracy and lead to longer building times. SageMaker Canvas enables you to detect outliers in numeric columns and replace the outliers with values that lie within an accepted range in your data. You can choose to define outliers with either standard deviations or a custom range, and you can replace outliers with the minimum and maximum values in the accepted range.

To replace outliers in your data, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Manage columns**.

1. Choose **Replace outlier values**.

1. Choose the **Column** in which you want to replace outliers.

1. For **Define outliers**, choose **Standard deviation**, **Custom numeric range**, or **Custom quantile range**.

1. If you choose **Standard deviation**, specify a **Standard deviations** (standard deviation) value from 1–3. If you choose **Custom numeric range** or **Custom quantile range**, specify the **Min** and **Max** values (numbers for numeric ranges, or percentiles between 0–100% for quantile ranges).

1. For **Replace with**, select **Min/max range**.

1. Choose **Add** to add the transform to the **Model recipe**.

The **Standard deviation** option detects outliers in numeric columns using the mean and standard deviation. You specify the number of standard deviations a value must vary from the mean to be considered an outlier. For example, if you specify 3 for **Standard deviations**, a value must fall more than 3 standard deviations from the mean to be considered an outlier. SageMaker Canvas replaces outliers with the minimum value or maximum value in the accepted range. For example, if you configure the standard deviations to only include values from 200–300, then SageMaker Canvas changes a value of 198 to 200 (the minimum).

The **Custom numeric range** and **Custom quantile range** options detect outliers in numeric columns using minimum and maximum values. Use this method if you know your threshold values that delimit outliers. If you choose a numeric range, the **Min** and **Max** values should be the minimum and maximum numeric values that you want to allow. SageMaker Canvas replaces any values that fall outside of the minimum and maximum to the minimum and maximum values. For example, if your range only allows values from 1–100, then SageMaker Canvas changes a value of 102 to 100 (the maximum). If you choose a quantile range, the **Min** and **Max** values should be the minimum and maximum of the percentile range (0–100) that you want to allow.

After replacing the values in the dataset, SageMaker Canvas adds the transform in the **Model recipe** section. If you remove the transform from the **Model recipe** section, the original values return to the dataset.

![\[Screenshot of the replace outliers operation in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-replace-outlier.png)


### Change data type
<a name="canvas-prepare-data-change-type"></a>

SageMaker Canvas provides you with the ability to change the *data type* of your columns between numeric, text, and datetime, while also displaying the associated *feature type* for that data type. A *data type* refers to the format of the data and how it is stored, while the *feature type* refers to the characteristic of the data used in machine learning algorithms, such as binary or categorical. This gives you the flexibility to manually change the type of data in your columns based on the features. The ability to choose the right data type ensures data integrity and accuracy prior to building models. These data types are used when building models.

**Note**  
Currently, changing the feature type (for example, from binary to categorical) is not supported.

The following table lists all of the supported data types in Canvas.


| Data type | Description | Example | 
| --- | --- | --- | 
| Numeric | Numeric data represents numerical values | 1, 2, 31.1, 1.2. 1.3 | 
| Text | Text data represents sequences of characters, like names or descriptions | A, B, C, Dapple, banana, orange1A\$1, 2A\$1, 3A\$1 | 
| Datetime | Datetime data represents dates and times in timestamp format | 2019-07-01 01:00:00, 2019-07-01 02:00:00, 2019-07-01 03:00:00 | 

The following table lists all of the supported feature types in Canvas.


| Feature type | Description | Example | 
| --- | --- | --- | 
| Binary | Binary features represent two possible values | 0, 1, 0, 1, 0 (2 distinct values)true, false, true (2 distinct values) | 
| Categorical | Categorical features represent distinct categories or groups | apple, banana, orange, apple (3 distinct values)A, B, C, D, E, A, D, C (5 distinct values) | 

To modify data type of a column in a dataset, do the following.

1. In the **Build** tab of the SageMaker Canvas application, go to the **Column view** or **Grid view** and select the **Data type** dropdown for the specific column.

1. In the **Data type** dropdown, choose the data type to convert to. The following screenshot shows the dropdown menu.  
![\[The data type conversion dropdown menu for a column, shown in the Build tab.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-prepare-data-change.png)

1. For **Column**, choose or verify the column you want to change the data type for.

1. For **New data type**, choose or verify the new data type you want to convert to.

1. If the **New data type** is `Datetime` or `Numeric`, choose one of the following options under **Handle invalid values**:

   1. **Replace with empty value** – Invalid values are substituted with an empty value

   1. **Delete rows** – Rows with an invalid value are removed from the dataset

   1. **Replace with custom value** – Invalid values are substituted with the **Custom Value** that you specify.

1. Choose **Add** to add the transform to the **Model recipe**.

The data type for your column should now be updated.

## Prepare time series data
<a name="canvas-prepare-data-timeseries"></a>

Use the following functionalities to prepare your time series data for building time series forecasting models.

### Resample time series data
<a name="canvas-prepare-data-resample"></a>

By resampling time-series data, you can establish regular intervals for the observations in your time series dataset. This is particularly useful when working with time series data containing irregularly spaced observations. For instance, you can use resampling to transform a dataset with observations recorded every one hour, two hour and three hour intervals into a regular one hour interval between observations. Forecasting algorithms require the observations to be taken at regular intervals.

To resample time series data, do the following.

1. In the **Build** tab of the SageMaker Canvas application, choose **Time series**.

1. Choose **Resample**.

1. For **Timestamp column**, choose the column you want to apply the transform to. You can only select columns of the **Datetime** type.

1. In the **Frequency settings** section, choose a **Frequency** and **Rate**. **Frequency** is the unit of frequency and **Rate** is the interval of the unit of frequency to be applied to the column. For example, choosing `Calendar Day` for **Frequency value** and `1` for **Rate** sets the interval to increase every 1 calendar day, such as `2023-03-26 00:00:00`, `2023-03-27 00:00:00`, `2023-03-28 00:00:00`. See the table after this procedure for a complete list of **Frequency value**. 

1. Choose **Add** to add the transform to the **Model recipe**.

The following table lists all of the **Frequency** types you can select when resampling time series data.


| Frequency | Description | Example values (assuming Rate is 1) | 
| --- | --- | --- | 
|  Business Day  |  Resample observations in the datetime column to 5 business days of the week (Monday, Tuesday, Wednesday, Thursday, Friday)  |  2023-03-24 00:00:00 2023-03-27 00:00:00 2023-03-28 00:00:00 2023-03-29 00:00:00 2023-03-30 00:00:00 2023-03-31 00:00:00 2023-04-03 00:00:00  | 
|  Calendar Day  |  Resample observations in the datetime column to all 7 days of the week (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)  |  2023-03-26 00:00:00 2023-03-27 00:00:00 2023-03-28 00:00:00 2023-03-29 00:00:00 2023-03-30 00:00:00 2023-03-31 00:00:00 2023-04-01 00:00:00  | 
|  Week  |  Resample observations in the datetime column to the first day of each week  |  2023-03-13 00:00:00 2023-03-20 00:00:00 2023-03-27 00:00:00 2023-04-03 00:00:00  | 
|  Month  |  Resample observations in the datetime column to the first day of each month  |  2023-03-01 00:00:00 2023-04-01 00:00:00 2023-05-01 00:00:00 2023-06-01 00:00:00  | 
|  Annual Quarter  |  Resample observations in the datetime column to the last day of each quarter  |  2023-03-31 00:00:00 2023-06-30 00:00:00 2023-09-30 00:00:00 2023-12-31 00:00:00  | 
|  Year  |  Resample observations in the datetime column to the last day of each year  |  2022-12-31 0:00:00 2023-12-31 00:00:00 2024-12-31 00:00:00  | 
|  Hour  |  Resample observations in the datetime column to each hour of each day  |  2023-03-24 00:00:00 2023-03-24 01:00:00 2023-03-24 02:00:00 2023-03-24 03:00:00  | 
|  Minute  |  Resample observations in the datetime column to each minute of each hour  |  2023-03-24 00:00:00 2023-03-24 00:01:00 2023-03-24 00:02:00 2023-03-24 00:03:00  | 
|  Second  |  Resample observations in the datetime column to each second of each minute  |  2023-03-24 00:00:00 2023-03-24 00:00:01 2023-03-24 00:00:02 2023-03-24 00:00:03  | 

When applying the resampling transform, you can use the **Advanced** option to specify how the resulting values of the rest of the columns (other than the timestamp column) in your dataset are modified. This can be achieved by specifying the resampling methodology, which can either be downsampling or upsampling for both numeric and non-numeric columns.

*Downsampling* increases the interval between observations in the dataset. For example, if you downsample observations that are taken either every hour or every two hours, each observation in your dataset is taken every two hours. The values of other columns of the hourly observations are aggregated into a single value using a combination method. The following tables show an example of downsampling time series data by using mean as the combination method. The data is downsampled from every two hours to every hour.

The following table shows the hourly temperature readings over a day before downsampling.


| Timestamp | Temperature (Celsius) | 
| --- | --- | 
| 12:00 pm | 30 | 
| 1:00 am | 32 | 
| 2:00 am | 35 | 
| 3:00 am | 32 | 
| 4:00 am | 30 | 

The following table shows the temperature readings after downsampling to every two hours.


| Timestamp | Temperature (Celsius) | 
| --- | --- | 
| 12:00 pm | 30 | 
| 2:00 am | 33.5 | 
| 2:00 am | 35 | 
| 4:00 am | 32.5 | 

To downsample time series data, do the following:

1. Expand the **Advanced ** section under the **Resample** transform.

1. Choose **Non-numeric combination** to specify the combination method for non-numeric columns. See the table below for a complete list of combination methods.

1. Choose **Numeric combination** to specify the combination method for numeric columns. See the table below for a complete list of combination methods.

If you don’t specify combination methods, the default values are `Most Common` for **Non-numeric combination** and `Mean` for **Numeric combination**. The following table lists the methods for numeric and non-numeric combination.


| Downsampling methodology | Combination method | Description | 
| --- | --- | --- | 
| Non-numeric combination | Most Common | Aggregate values in the non-numeric column by the most commonly ocurring value | 
| Non-numeric combination | Last | Aggregate values in the non-numeric column by the last value in the column | 
| Non-numeric combination | First | Aggregate values in the non-numeric column by the first value in the column | 
| Numeric combination | Mean | Aggregate values in the numeric column by the taking the mean of all the values in the column | 
| Numeric combination | Median | Aggregate values in the numeric column by the taking the median of all the values in the column | 
| Numeric combination | Min | Aggregate values in the numeric column by the taking the minimum of all the values in the column | 
| Numeric combination | Max | Aggregate values in the numeric column by the taking the maximum of all the values in the column | 
| Numeric combination | Sum | Aggregate values in the numeric column by adding all the values in the column | 
| Numeric combination | Quantile | Aggregate values in the numeric column by the taking the quantile of all the values in the column | 

*Upsampling* reduces the interval between observations in the dataset. For example, if you upsample observations that are taken every two hours into hourly observations, the values of other columns of the hourly observations are interpolated from the ones that have been taken every two hours.

To upsample time series data, do the following:

1. Expand the **Advanced** section under the **Resample** transform.

1. Choose **Non-numeric estimation** to specify the estimation method for non-numeric columns. See the table after this procedure for a complete list of methods.

1. Choose **Numeric estimation** to specify the estimation method for numeric columns. See the table below for a complete list of methods.

1. (Optional) Choose **ID Column** to specify the column that has the IDs of the observations of the time series. Specify this option if your dataset has two time series. If you have a column representing only one time series, don't specify a value for this field. For example, you can have a dataset that has the columns `id` and `purchase`. The `id` column has the following values: `[1, 2, 2, 1]`. The `purchase` column has the following values `[$2, $3, $4, $1]`. Therefore, the dataset has two time series—one time series is: `1: [$2, $1]`, and the other time series is `2: [$3, $4]`.

If you don’t specify estimation methods, the default values are `Forward Fill` for **Non-numeric estimation** and `Linear` for **Numeric estimation**. The following table lists the methods for estimation.


| Upsampling methodology | Estimation method | Description | 
| --- | --- | --- | 
| Non-numeric estimation | Forward Fill | Interpolate values in the non-numeric column by taking the consecutive values after all the values in the column | 
| Non-numeric estimation | Backward Fill | Interpolate values in the non-numeric column by taking the consecutive values before all the values in the column | 
| Non-numeric estimation | Keep Missing | Interpolate values in the non-numeric column by showing empty values | 
| Numeric estimation | Linear, Time, Index, Zero, S-Linear, Nearest, Quadratic, Cubic, Barycentric, Polynomial, Krogh, Piecewise Polynomial, Spline, P-chip, Akima, Cubic Spline, From Derivatives | Interpolate values in the numeric column by using the specfied interpolator. For information on interpolation methods, see [pandas.DataFrame.interpolate](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html) in the pandas documentation. | 

The following screenshot shows the **Advanced** settings with the fields for downsampling and upsampling filled out.

![\[The Canvas application, with the time series resampling side panel showing the advanced options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-prepare-data-resampling.png)


### Use datetime extraction
<a name="canvas-prepare-data-datetime"></a>

With the datetime extraction transform, you can extract values from a datetime column to a separate column. For example, if you have a column containing dates of purchases, you can extract the month value to a separate column and use the new column when building your model. You can also extract multiple values to separate columns with a single transform.

Your datetime column must use a supported timestamp format. For a list of the formats that SageMaker Canvas supports, see [Time Series Forecasts in Amazon SageMaker Canvas](canvas-time-series.md). If your dataset does not use one of the supported formats, update your dataset to use a supported timestamp format and re-import it to Amazon SageMaker Canvas before building your model.

To perform a datetime extraction, do the following.

1. In the **Build** tab of the SageMaker Canvas application, on the transforms bar, choose **View all**.

1. Choose **Extract features**.

1. Choose the **Timestamp column** from which you want to extract values.

1. For **Values**, select one or more values to extract from the column. The values you can extract from a timestamp column are **Year**, **Month**, **Day**, **Hour**, **Week of year**, **Day of year**, and **Quarter**.

1. (Optional) Choose **Preview** to preview the transform results.

1. Choose **Add** to add the transform to the **Model recipe**.

SageMaker Canvas creates a new column in the dataset for each of the values you extract. Except for **Year** values, SageMaker Canvas uses a 0-based encoding for the extracted values. For example, if you extract the **Month** value, January is extracted as 0, and February is extracted as 1.

![\[Screenshot of the datetime extraction box in the SageMaker Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-datetime-extract.png)


You can see the transform listed in the **Model recipe** section. If you remove the transform from the **Model recipe** section, the new columns are removed from the dataset.

# Model evaluation
<a name="canvas-evaluate-model"></a>

After you’ve built your model, you can evaluate how well your model performed on your data before using it to make predictions. You can use information, such as the model’s accuracy when predicting labels and advanced metrics, to determine whether your model can make sufficiently accurate predictions for your data.

The section [Evaluate your model's performance](canvas-scoring.md) describes how to view and interpret the information on your model's **Analyze** page. The section [Use advanced metrics in your analyses](canvas-advanced-metrics.md) contains more detailed information about the **Advanced metrics** used to quantify your model’s accuracy.

You can also view more advanced information for specific *model candidates*, which are all of the model iterations that Canvas runs through while building your model. Based on the advanced metrics for a given model candidate, you can select a different candidate to be the default, or the version that is used for making predictions and deploying. For each model candidate, you can view the **Advanced metrics** information to help you decide which model candidate you’d like to select as the default. You can view this information by selecting the model candidate from the **Model leaderboard**. For more information, see [View model candidates in the model leaderboard](canvas-evaluate-model-candidates.md).

Canvas also provides the option to download a Jupyter notebook so that you can view and run the code used to build your model. This is useful if you’d like to make adjustments to the code or learn more about how your model was built. For more information, see [Download a model notebook](canvas-notebook.md).

# Evaluate your model's performance
<a name="canvas-scoring"></a>

Amazon SageMaker Canvas provides overview and scoring information for the different types of model. Your model’s score can help you determine how accurate your model is when it makes predictions. The additional scoring insights can help you quantify the differences between the actual and predicted values.

To view the analysis of your model, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose the model that you built.

1. In the top navigation pane, choose the **Analyze** tab.

1. Within the **Analyze** tab, you can view the overview and scoring information for your model.

The following sections describe how to interpret the scoring for each model type.

## Evaluate categorical prediction models
<a name="canvas-scoring-categorical"></a>

The **Overview** tab shows you the column impact for each column. **Column impact** is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. For a column impact of 25%, Canvas weighs the prediction as 25% for the column and 75% for the other columns.

The following screenshot shows the **Accuracy** score for the model, along with the **Optimization metric**, which is the metric that you choose to optimize when building the model. In this case, the **Optimization metric** is **Accuracy**. You can specify a different optimization metric if you build a new version of your model.

![\[Screenshot of the accuracy score and optimization metric on the Analyze tab in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/analyze-tab-2-category.png)


The **Scoring** tab for a categorical prediction model gives you the ability to visualize all the predictions. Line segments extend from the left of the page, indicating all the predictions the model has made. In the middle of the page, the line segments converge on a perpendicular segment to indicate the proportion of each prediction to a single category. From the predicted category, the segments branch out to the actual category. You can get a visual sense of how accurate the predictions were by following each line segment from the predicted category to the actual category.

The following image gives you an example **Scoring** section for a **3\$1 category prediction** model.

![\[Screenshot of the Scoring tab for a 3+ category prediction model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-analyze/canvas-multiclass-classification.png)


You can also view the **Advanced metrics** tab for more detailed information about your model’s performance, such as the advanced metrics, error density plots, or confusion matrices. To learn more about the **Advanced metrics** tab, see [Use advanced metrics in your analyses](canvas-advanced-metrics.md).

## Evaluate numeric prediction models
<a name="canvas-scoring-numeric"></a>

The **Overview** tab shows you the column impact for each column. **Column impact** is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. For a column impact of 25%, Canvas weighs the prediction as 25% for the column and 75% for the other columns.

The following screenshot shows the **RMSE** score for the model on the **Overview** tab, which in this case is the **Optimization metric**. The **Optimization metric** is the metric that you choose to optimize when building the model. You can specify a different optimization metric if you build a new version of your model.

![\[Screenshot of the RMSE optimization metric on the Analyze tab in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/analyze-tab-2-numeric.png)


The **Scoring** tab for numeric prediction shows a line to indicate the model's predicted value in relation to the data used to make predictions. The values of the numeric prediction are often \$1/- the RMSE (root mean squared error) value. The value that the model predicts is often within the range of the RMSE. The width of the purple band around the line indicates the RMSE range. The predicted values often fall within the range.

The following image shows the **Scoring** section for numeric prediction.

![\[Screenshot of the Scoring tab for a numeric prediction model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-analyze/canvas-analyze-regression-scoring.png)


You can also view the **Advanced metrics** tab for more detailed information about your model’s performance, such as the advanced metrics, error density plots, or confusion matrices. To learn more about the **Advanced metrics** tab, see [Use advanced metrics in your analyses](canvas-advanced-metrics.md).

## Evaluate time series forecasting models
<a name="canvas-scoring-time-series"></a>

On the **Analyze** page for time series forecasting models, you can see an overview of the model’s metrics. You can hover over each metric for more information, or you can see [Use advanced metrics in your analyses](canvas-advanced-metrics.md) for more information about each metric.

In the **Column impact** section, you can see the score for each column. **Column impact** is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. For a column impact of 25%, Canvas weighs the prediction as 25% for the column and 75% for the other columns.

The following screenshot shows the time series metrics scores for the model, along with the **Optimization metric**, which is the metric that you choose to optimize when building the model. In this case, the **Optimization metric** is **RMSE**. You can specify a different optimization metric if you build a new version of your model. These metrics scores are taken from your backtest results, which are available for download in the **Artifacts** tab.

![\[Screenshot of the RMSE optimization metric on the Analyze tab in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/analyze-tab-2-time-series.png)


The **Artifacts** tab provides access to several key resources that you can use to dive deeper into your model’s performance and continue iterating upon it:
+ **Shuffled training and validation splits** – This section includes links to the artifacts generated when your dataset was split into training and validation sets, enabling you to review the data distribution and potential biases.
+ **Backtest results** – This section includes a link to the forecasted values for your validation dataset, which is used to generate accuracy metrics and evaluation data for your model.
+ **Accuracy metrics** – This section lists the advanced metrics that evaluate your model's performance, such as Root Mean Squared Error (RMSE). For more information about each metric, see [Metrics for time series forecasts](canvas-metrics.md#canvas-time-series-forecast-metrics).
+ **Explainability report** – This section provides a link to download the explainability report, which offers insights into the model's decision-making process and the relative importance of input columns. This report can help you identify potential areas for improvement.

On the **Analyze** page, you can also choose the **Download** button to directly download the backtest results, accuracy metrics, and explainability report artifacts to your local machine.

## Evaluate image prediction models
<a name="canvas-scoring-image"></a>

The **Overview** tab shows you the **Per label performance**, which gives you an overall accuracy score for the images predicted for each label. You can choose a label to see more specific details, such as the **Correctly predicted** and **Incorrectly predicted** images for the label.

You can turn on the **Heatmap** toggle to see a heatmap for each image. The heatmap shows you the areas of interest that have the most impact when your model is making predictions. For more information about heatmaps and how to use them to improve your model, choose the **More info** icon next to the **Heatmap** toggle.

The **Scoring** tab for single-label image prediction models shows you a comparison of what the model predicted as the label versus what the actual label was. You can select up to 10 labels at a time. You can change the labels in the visualization by choosing the labels dropdown menu and selecting or deselecting labels.

You can also view insights for individual labels or groups of labels, such as the three labels with the highest or lowest accuracy, by choosing the **View scores for** dropdown menu in the **Model accuracy insights** section.

The following screenshot shows the **Scoring** information for a single-label image prediction model.

![\[The actual versus predicted labels on the Scoring page for a multi-category text prediction model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/analyze-image-scoring.png)


## Evaluate text prediction models
<a name="canvas-scoring-text"></a>

The **Overview** tab shows you the **Per label performance**, which gives you an overall accuracy score for the passages of text predicted for each label. You can choose a label to see more specific details, such as the **Correctly predicted** and **Incorrectly predicted** passages for the label.

The **Scoring** tab for multi-category text prediction models shows you a comparison of what the model predicted as the label versus what the actual label was.

In the **Model accuracy insights** section, you can see the **Most frequent category**, which tells you the category that the model predicted most frequently and how accurate those predictions were. If you model predicts a label of **Positive** correctly 99% of the time, then you can be fairly confident that your model is good at predicting positive sentiment in text.

The following screenshot shows the **Scoring** information for a multi-category text prediction model.

![\[The actual versus predicted labels on the Scoring page for a single-label image prediction model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/analyze-text-scoring.png)


# Use advanced metrics in your analyses
<a name="canvas-advanced-metrics"></a>

The following section describes how to find and interpret the advanced metrics for your model in Amazon SageMaker Canvas.

**Note**  
Advanced metrics are only currently available for numeric and categorical prediction models.

To find the **Advanced metrics** tab, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose the model that you built.

1. In the top navigation pane, choose the **Analyze** tab.

1. Within the **Analyze** tab, choose the **Advanced metrics** tab.

In the **Advanced metrics** tab, you can find the **Performance** tab. The page looks like the following screenshot.

![\[Screenshot of the advanced metrics tab for a categorical prediction model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-analyze-performance.png)


At the top, you can see an overview of the metrics scores, including the **Optimization metric**, which is the metric that you selected (or that Canvas selected by default) to optimize when building the model.

The following sections describe more detailed information for the **Performance** tab within the **Advanced metrics**.

## Performance
<a name="canvas-advanced-metrics-performance"></a>

In the **Performance** tab, you’ll see a **Metrics table**, along with visualizations that Canvas creates based on your model type. For categorical prediction models, Canvas provides a *confusion matrix*, whereas for numeric prediction models, Canvas provides you with *residuals* and *error density* charts.

In the **Metrics table**, you are provided with a full list of your model’s scores for each advanced metric, which is more comprehensive than the scores overview at the top of the page. The metrics shown here depend on your model type. For a reference to help you understand and interpret each metric, see [Metrics reference](canvas-metrics.md).

To understand the visualizations that might appear based on your model type, see the following options:
+ **Confusion matrix** – Canvas uses confusion matrices to help you visualize when a model makes predictions correctly. In a confusion matrix, your results are arranged to compare the predicted values against the actual values. The following example explains how a confusion matrix works for a 2 category prediction model that predicts positive and negative labels:
  + True positive – The model correctly predicted positive when the true label was positive.
  + True negative – The model correctly predicted negative when the true label was negative.
  + False positive – The model incorrectly predicted positive when the true label was negative.
  + False negative – The model incorrectly predicted negative when the true label was positive.
+ **Precision recall curve** – The precision recall curve is a visualization of the model’s precision score plotted against the model’s recall score. Generally, a model that can make perfect predictions would have precision and recall scores that are both 1. The precision recall curve for a decently accurate model is fairly high in both precision and recall.
+ **Residuals** – Residuals are the difference between the actual values and the values predicted by the model. A residuals chart plots the residuals against the corresponding values to visualize their distribution and any patterns or outliers. A normal distribution of residuals around zero indicates that the model is a good fit for the data. However, if the residuals are significantly skewed or have outliers, it may indicate that the model is overfitting the data or that there are other issues that need to be addressed.
+ **Error density** – An error density plot is a representation of the distribution of errors made by a model. It shows the probability density of the errors at each point, helping you to identify any areas where the model may be overfitting or making systematic errors.

# View model candidates in the model leaderboard
<a name="canvas-evaluate-model-candidates"></a>

When you do a [Standard build](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-build-model.html) for tabular and time series forecasting models in Amazon SageMaker Canvas, SageMaker AI trains multiple *model candidates* (different iterations of the model) and by default selects the one with the highest value for the optimization metric. For tabular models, Canvas builds up to 250 different model candidates using various algorithms and hyperparameter settings. For time series forecasting models, Canvas builds 7 different models—one for each of the [supported forecasting algorithms](canvas-advanced-settings.md#canvas-advanced-settings-time-series) and one ensemble model that averages the predictions of the other models to try to optimize accuracy.

The default model candidate is the only version that you can use in Canvas for actions like making predictions, registering to the model registry, or deploying to an endpoint. However, you might want to review all of the model candidates and select a different candidate to be the default model. You can view all of the model candidates and more details about each candidate on the **Model leaderboard** in Canvas.

To view the **Model leaderboard**, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose the model that you built.

1. In the top navigation pane, choose the **Analyze** tab.

1. Within the **Analyze** tab, choose **Model leaderboard.**

The **Model leaderboard** page opens, which for tabular models looks like the following screenshot.

![\[The model leaderboard, which lists all of the model candidates that Canvas trained.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-model-leaderboard.png)


For time series forecasting models, you see 7 models, which include one for each of the time series forecasting algorithms supported by Canvas and one ensemble model. For more information about the algorithms, see [Advanced time series forecasting model settings](canvas-advanced-settings.md#canvas-advanced-settings-time-series).

In the preceding screenshot, you can see that the first model candidate listed is marked as the **Default model**. This is the model candidate with which you can make predictions or deploy to endpoints.

To view more detailed metrics information about the model candidates to compare them, you can choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) and choose **View model details**.

**Important**  
 Loading the model details for non-default model candidates may take a few minutes (typically less than 10 minutes), and SageMaker AI Hosting charges apply. For more information, see [SageMaker AI Pricing](https://aws.amazon.com/sagemaker/pricing/).

The model candidate opens in the **Analyze** tab, and the metrics shown are specific to that model candidate. When you’re done reviewing the model candidate’s metrics, you can go back or exit the view to return to the **Model leaderboard**.

If you’d like to set the **Default model** to a different candidate, you can choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) and choose **Change to default model**. Changing the default model for a model trained using HPO mode might take several minutes.

**Note**  
If your model is already deployed in production, [registered to the model registry](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-register-model.html), or has [automations](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-manage-automations.html) set up, you must delete your deployment, model registration, or automations before changing the default model.

# Metrics reference
<a name="canvas-metrics"></a>

The following sections describe the metrics that are available in Amazon SageMaker Canvas for each model type.

## Metrics for numeric prediction
<a name="canvas-numeric-metrics"></a>

The following list defines the metrics for numeric prediction in SageMaker Canvas and gives you information about how you can use them.
+ InferenceLatency – The approximate amount of time between making a request for a model prediction to receiving it from a real-time endpoint to which the model is deployed. This metric is measured in seconds and is only available for models built with the **Ensembling**mode.
+ MAE – Mean absolute error. On average, the prediction for the target column is \$1/- \$1MAE\$1 from the actual value.

  Measures how different the predicted and actual values are when they're averaged over all values. MAE is commonly used in numeric prediction to understand model prediction error. If the predictions are linear, MAE represents the average distance from a predicted line to the actual value. MAE is defined as the sum of absolute errors divided by the number of observations. Values range from 0 to infinity, with smaller numbers indicating a better model fit to the data.
+ MAPE – Mean absolute percent error. On average, the prediction for the target column is \$1/- \$1MAPE\$1 % from the actual value.

  MAPE is the mean of the absolute differences between the actual values and the predicted or estimated values, divided by the actual values and expressed as a percentage. A lower MAPE indicates better performance, as it means that the predicted or estimated values are closer to the actual values.
+ MSE – Mean squared error, or the average of the squared differences between the predicted and actual values.

  MSE values are always positive. The better a model is at predicting the actual values, the smaller the MSE value is.
+ R2 – The percentage of the difference in the target column that can be explained by the input column.

  Quantifies how much a model can explain the variance of a dependent variable. Values range from one (1) to negative one (-1). Higher numbers indicate a higher fraction of explained variability. Values close to zero (0) indicate that very little of the dependent variable can be explained by the model. Negative values indicate a poor fit and that the model is outperformed by a constant function (or a horizontal line).
+ RMSE – Root mean squared error, or the standard deviation of the errors.

  Measures the square root of the squared difference between predicted and actual values, and is averaged over all values. It is used to understand model prediction error, and it's an important metric to indicate the presence of large model errors and outliers. Values range from zero (0) to infinity, with smaller numbers indicating a better model fit to the data. RMSE is dependent on scale, and should not be used to compare datasets of different types.

## Metrics for categorical prediction
<a name="canvas-categorical-metrics"></a>

This section defines the metrics for categorical prediction in SageMaker Canvas and gives you information about how you can use them.

The following is a list of available metrics for 2-category prediction:
+ Accuracy – The percentage of correct predictions.

  Or, the ratio of the number of correctly predicted items to the total number of predictions. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates complete inaccuracy.
+ AUC – A value between 0 and 1 that indicates how well your model is able to separate the categories in your dataset. A value of 1 indicates that it was able to separate the categories perfectly.
+ BalancedAccuracy – Measures the ratio of accurate predictions to all predictions.

  This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is defined as follows: `0.5*((TP/P)+(TN/N))`, with values ranging from 0 to 1. The balanced accuracy metric gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.
+ F1 – A balanced measure of accuracy that takes class balance into account.

  It is the harmonic mean of the precision and recall scores, defined as follows: `F1 = 2 * (precision * recall) / (precision + recall)`. F1 scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
+ InferenceLatency – The approximate amount of time between making a request for a model prediction to receiving it from a real-time endpoint to which the model is deployed. This metric is measured in seconds and is only available for models built with the **Ensembling**mode.
+ LogLoss – Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
+ Precision – Of all the times that \$1category x\$1 was predicted, the prediction was correct \$1precision\$1% of the time.

  Precision measures how well an algorithm predicts the true positives (TP) out of all of the positives that it identifies. It is defined as follows: `Precision = TP/(TP+FP)`, with values ranging from zero (0) to one (1). Precision is an important metric when the cost of a false positive is high. For example, the cost of a false positive is very high if an airplane safety system is falsely deemed safe to fly. A false positive (FP) reflects a positive prediction that is actually negative in the data.
+ Recall – The model correctly predicted \$1recall\$1% to be \$1category x\$1 when \$1target\$1column\$1 was actually \$1category x\$1.

  Recall measures how well an algorithm correctly predicts all of the true positives (TP) in a dataset. A true positive is a positive prediction that is also an actual positive value in the data. Recall is defined as follows: `Recall = TP/(TP+FN)`, with values ranging from 0 to 1. Higher scores reflect a better ability of the model to predict true positives (TP) in the data. Note that it is often insufficient to measure only recall, because predicting every output as a true positive yields a perfect recall score.

The following is a list of available metrics for 3\$1 category prediction:
+ Accuracy – The percentage of correct predictions.

  Or, the ratio of the number of correctly predicted items to the total number of predictions. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates complete inaccuracy.
+ BalancedAccuracy – Measures the ratio of accurate predictions to all predictions.

  This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is defined as follows: `0.5*((TP/P)+(TN/N))`, with values ranging from 0 to 1. The balanced accuracy metric gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.
+ F1macro – The F1macro score applies F1 scoring by calculating the precision and recall, and then taking their harmonic mean to calculate the F1 score for each class. Then, the F1macro averages the individual scores to obtain the F1macro score. F1macro scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
+ InferenceLatency – The approximate amount of time between making a request for a model prediction to receiving it from a real-time endpoint to which the model is deployed. This metric is measured in seconds and is only available for models built with the **Ensembling**mode.
+ LogLoss – Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
+ PrecisionMacro – Measures precision by calculating precision for each class and averaging scores to obtain precision for several classes. Scores range from zero (0) to one (1). Higher scores reflect the model's ability to predict true positives (TP) out of all of the positives that it identifies, averaged across multiple classes.
+ RecallMacro – Measures recall by calculating recall for each class and averaging scores to obtain recall for several classes. Scores range from 0 to 1. Higher scores reflect the model's ability to predict true positives (TP) in a dataset, whereas a true positive reflects a positive prediction that is also an actual positive value in the data. It is often insufficient to measure only recall, because predicting every output as a true positive will yield a perfect recall score.

Note that for 3\$1 category prediction, you also receive the average F1, Accuracy, Precision, and Recall metrics. The scores for these metrics are just the metric scores averaged for all categories.

## Metrics for image and text prediction
<a name="canvas-cv-nlp-metrics"></a>

The following is a list of available metrics for image prediction and text prediction.
+ Accuracy – The percentage of correct predictions.

  Or, the ratio of the number of correctly predicted items to the total number of predictions. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates complete inaccuracy.
+ F1 – A balanced measure of accuracy that takes class balance into account.

  It is the harmonic mean of the precision and recall scores, defined as follows: `F1 = 2 * (precision * recall) / (precision + recall)`. F1 scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
+ Precision – Of all the times that \$1category x\$1 was predicted, the prediction was correct \$1precision\$1% of the time.

  Precision measures how well an algorithm predicts the true positives (TP) out of all of the positives that it identifies. It is defined as follows: `Precision = TP/(TP+FP)`, with values ranging from zero (0) to one (1). Precision is an important metric when the cost of a false positive is high. For example, the cost of a false positive is very high if an airplane safety system is falsely deemed safe to fly. A false positive (FP) reflects a positive prediction that is actually negative in the data.
+ Recall – The model correctly predicted \$1recall\$1% to be \$1category x\$1 when \$1target\$1column\$1 was actually \$1category x\$1.

  Recall measures how well an algorithm correctly predicts all of the true positives (TP) in a dataset. A true positive is a positive prediction that is also an actual positive value in the data. Recall is defined as follows: `Recall = TP/(TP+FN)`, with values ranging from 0 to 1. Higher scores reflect a better ability of the model to predict true positives (TP) in the data. Note that it is often insufficient to measure only recall, because predicting every output as a true positive yields a perfect recall score.

Note that for image and text prediction models where you are predicting 3 or more categories, you also receive the *average* F1, Accuracy, Precision, and Recall metrics. The scores for these metrics are just the metric scores average for all categories.

## Metrics for time series forecasts
<a name="canvas-time-series-forecast-metrics"></a>

The following defines the advanced metrics for time series forecasts in Amazon SageMaker Canvas and gives you information about how you can use them.
+ Average Weighted Quantile Loss (wQL) – Evaluates the forecast by averaging the accuracy at the P10, P50, and P90 quantiles. A lower value indicates a more accurate model.
+ Weighted Absolute Percent Error (WAPE) – The sum of the absolute error normalized by the sum of the absolute target, which measures the overall deviation of forecasted values from observed values. A lower value indicates a more accurate model, where WAPE = 0 is a model with no errors.
+ Root Mean Square Error (RMSE) – The square root of the average squared errors. A lower RMSE indicates a more accurate model, where RMSE = 0 is a model with no errors.
+ Mean Absolute Percent Error (MAPE) – The percentage error (percent difference of the mean forecasted value versus the actual value) averaged over all time points. A lower value indicates a more accurate model, where MAPE = 0 is a model with no errors.
+ Mean Absolute Scaled Error (MASE) – The mean absolute error of the forecast normalized by the mean absolute error of a simple baseline forecasting method. A lower value indicates a more accurate model, where MASE < 1 is estimated to be better than the baseline and MASE > 1 is estimated to be worse than the baseline.

# Predictions with custom models
<a name="canvas-make-predictions"></a>

Use the custom model that you've built in SageMaker Canvas to make predictions for your data. The following sections show you how to make predictions for numeric and categorical prediction models, time series forecasts, image prediction models, and text prediction models.

Numeric and categorical prediction, image prediction, and text prediction custom models support making the following types of predictions for your data:
+ **Single predictions** — A **Single prediction** is when you only need to make one prediction. For example, you have one image or passage of text that you want to classify.
+ **Batch predictions** — A **Batch prediction** is when you’d like to make predictions for an entire dataset. You can make batch predictions for datasets that are 1 TB\$1. For example, you have a CSV file of customer reviews for which you’d like to predict the customer sentiment, or you have a folder of image files that you'd like to classify. You should make predictions with a dataset that matches your input dataset. Canvas provides you with the ability to do manual batch predictions, or you can configure automatic batch predictions that run whenever you update a dataset.

For each prediction or set of predictions, SageMaker Canvas returns the following:
+ The predicted values
+ The probability of the predicted value being correct

**Get started**

Choose one of the following workflows to make predictions with your custom model:
+ [Batch predictions in SageMaker Canvas](canvas-make-predictions-batch.md)
+ [Make single predictions](canvas-make-predictions-single.md)

After generating predictions with your model, you can also do the following:
+ [Update your model by adding versions.](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-update-model.html) If you want to try to improve the prediction accuracy of your model, you can build new versions of your model. You can choose to clone your original model building configuration and dataset, or you can change your configuration and select a different dataset. After adding a new version, you can review and compare versions to choose the best one.
+ [Register a model version in the SageMaker AI model registry](canvas-register-model.md). You can register versions of your model to the SageMaker Model Registry, which is a feature for tracking and managing the status of model versions and machine learning pipelines. A data scientist or MLOps team user with access to the SageMaker Model Registry can review your model versions and approve or reject them before deploying them to production.
+ [Send your batch predictions to Quick.](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-send-predictions.html) In Quick, you can build and publish dashboards with your batch prediction datasets. This can help you analyze and share results generated by your custom model.

# Make single predictions
<a name="canvas-make-predictions-single"></a>

**Note**  
This section describes how to get single predictions from your model inside the Canvas application. For information about making real-time invocations in a production environment by deploying your model to an endpoint, see [Deploy your models to an endpoint](canvas-deploy-model.md).

Make single predictions if you want to get a prediction for a single data point. You can use this feature to get real-time predictions or to experiment with changing individual values to see how they impact the prediction outcome. Note that single predictions rely on an Asynchronous Inference endpoint, which shuts down after being idle (or not receiving any prediction requests) for two hours.

Choose one of the following procedures based on your model type.

## Make single predictions with numeric and categorical prediction models
<a name="canvas-make-predictions-numeric-categorical"></a>

To make a single prediction for a numeric or categorical prediction model, do the following:

1. In the left navigation pane of the Canvas application, choose **My models**.

1. On the **My models** page, choose your model.

1. After opening your model, choose the **Predict** tab.

1. On the **Run predictions** page, choose **Single prediction**.

1. For each **Column** field, which represents the columns of your input data, you can change the **Value**. Select the dropdown menu for the **Value** you want to change. For numeric fields, you can enter a new number. For fields with labels, you can select a different label.

1. When you’re ready to generate the prediction, in the right **Prediction** pane, choose **Update**.

In the right **Prediction** pane, you’ll see the prediction result. You can **Copy** the prediction result chart, or you can also choose **Download** to either download the prediction result chart as an image or to download the values and prediction as a CSV file.

## Make single predictions with time series forecasting models
<a name="canvas-make-predictions-forecast"></a>

To make a single prediction for a time series forecasting model, do the following:

1. In the left navigation pane of the Canvas application, choose **My models**.

1. On the **My models** page, choose your model.

1. After opening your model, choose the **Predict** tab.

1. Choose **Single prediction**.

1. For **Item**, select the item for which you want to forecast values.

1. If you used a group by column to train the model, then select the group by category for the item.

The prediction result loads in the pane below, showing you a chart with the forecast for each quantile. Choose **Schema view** to see the numeric predicted values. You can also choose **Download** to download the prediction results as either an image or a CSV file.

## Make single predictions with image prediction models
<a name="canvas-make-predictions-image"></a>

To make a single prediction for a single-label image prediction model, do the following:

1. In the left navigation pane of the Canvas application, choose **My models**.

1. On the **My models** page, choose your model.

1. After opening your model, choose the **Predict** tab.

1. On the **Run predictions** page, choose **Single prediction**.

1. Choose **Import image**.

1. You’ll be prompted to upload an image. You can upload an image from your local computer or from an Amazon S3 bucket.

1. Choose **Import** to import your image and generate the prediction.

In the right **Prediction results** pane, the model lists the possible labels for the image along with a **Confidence** score for each label. For example, the model might predict the label **Sea** for an image, with a confidence score of 96%. The model may have predicted the image as a **Glacier** with only a confidence score of 4%. Therefore, you can determine that your model is fairly confident in predicting images of the sea.

## Make single predictions with text prediction models
<a name="canvas-make-predictions-text"></a>

To make a single prediction for a multi-category text prediction model, do the following:

1. In the left navigation pane of the Canvas application, choose **My models**.

1. On the **My models** page, choose your model.

1. After opening your model, choose the **Predict** tab.

1. On the **Run predictions** page, choose **Single prediction**.

1. For the **Text field**, enter the text for which you’d like to get a prediction.

1. Choose **Generate prediction results** to get your prediction.

In the right **Prediction results** pane, you receive an analysis of your text in addition to a **Confidence** score for each possible label. For example, if you entered a good review for a product, you might get **Positive** with a confidence score of 85%, while the confidence score for **Neutral** might be 10% and the confidence score for **Negative** only 5%.

# Batch predictions in SageMaker Canvas
<a name="canvas-make-predictions-batch"></a>

Make batch predictions when you have an entire dataset for which you’d like to generate predictions. Amazon SageMaker Canvas supports batch predictions for datasets up to PBs in size.

There are two types of batch predictions you can make:
+ [Manual batch predictions](canvas-make-predictions-batch-manual.md) are when you have a dataset for which you want to make one-time predictions.
+ [Automatic batch predictions](canvas-make-predictions-batch-auto.md) are when you set up a configuration that runs whenever a specific dataset is updated. For example, if you’ve configured weekly updates to a SageMaker Canvas dataset of inventory data, you can set up automatic batch predictions that run whenever you update the dataset. After setting up an automated batch predictions workflow, see [How to manage automations](canvas-manage-automations.md) for more information about viewing and editing the details of your configuration. For more information about setting up automatic dataset updates, see [Configure automatic updates for a dataset](canvas-update-dataset-auto.md).

**Note**  
Time series forecasting models don't support automatic batch predictions.  
You can only set up automatic batch predictions for datasets imported through local upload or Amazon S3. Additionally, automatic batch predictions can only run while you’re logged in to the Canvas application. If you log out of Canvas, the automatic batch prediction job resumes when you log back in.

To get started, review the [Batch prediction dataset requirements](canvas-make-predictions-batch-preqreqs.md), and then choose one of the following manual or automatic batch prediction workflows.

**Topics**
+ [Batch prediction dataset requirements](canvas-make-predictions-batch-preqreqs.md)
+ [Make manual batch predictions](canvas-make-predictions-batch-manual.md)
+ [Make automatic batch predictions](canvas-make-predictions-batch-auto.md)
+ [Edit your automatic batch prediction configuration](canvas-make-predictions-batch-auto-edit.md)
+ [Delete your automatic batch prediction configuration](canvas-make-predictions-batch-auto-delete.md)
+ [View your batch prediction jobs](canvas-make-predictions-batch-auto-view.md)

# Batch prediction dataset requirements
<a name="canvas-make-predictions-batch-preqreqs"></a>

For batch predictions, make sure that your datasets meet the requirements outlined in [Create a dataset](canvas-import-dataset.md). If your dataset is larger than 5 GB, then Canvas uses Amazon EMR Serverless to process your data and split it into smaller batches. After your data has been split, Canvas uses SageMaker AI Batch Transform to make predictions. You may see charges from both of these services after running batch predictions. For more information, see [Canvas pricing](https://aws.amazon.com/sagemaker/canvas/pricing/).

You might not be able to make predictions on some datasets if they have incompatible *schemas*. A *schema* is an organizational structure. For a tabular dataset, the schema is the names of the columns and the data type of the data in the columns. An incompatible schema might happen for one of the following reasons:
+ The dataset that you're using to make predictions has fewer columns than the dataset that you're using to build the model.
+ The data types in the columns you used to build the dataset might be different from the data types in dataset that you're using to make predictions.
+ The dataset that you're using to make predictions and the dataset that you've used to build the model have column names that don't match. The column names are case sensitive. `Column1` is not the same as `column1`.

To ensure that you can successfully generate batch predictions, match the schema of your batch predictions dataset to the dataset you used to train the model.

**Note**  
For batch predictions, if you dropped any columns when building your model, Canvas adds the dropped columns back to the prediction results. However, Canvas does not add the dropped columns to your batch predictions for time series models.

# Make manual batch predictions
<a name="canvas-make-predictions-batch-manual"></a>

Choose one of the following procedures to make manual batch predictions based on your model type.

## Make manual batch predictions with numeric, categorical, and time series forecasting models
<a name="canvas-make-predictions-batch-numeric-categorical"></a>

To make manual batch predictions for numeric, categorical, and time series forecasting model types, do the following:

1. In the left navigation pane of the Canvas application, choose **My models**.

1. On the **My models** page, choose your model.

1. After opening your model, choose the **Predict** tab.

1. On the **Run predictions** page, choose **Batch prediction**.

1. Choose **Select dataset** to pick a dataset for generating predictions.

1. From the list of available datasets, select your dataset, and then choose **Start Predictions** to get your predictions.

After the prediction job finishes running, there is an output dataset listed on the same page in the **Predictions** section. This dataset contains your results, and if you select the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), you can choose **Preview** to preview the output data. You can see the input data matched to the prediction and the probability that the prediction is correct. Then, you can choose **Download prediction** to download the results as a file.

## Make manual batch predictions with image prediction models
<a name="canvas-make-predictions-batch-image"></a>

To make manual batch predictions for a single-label image prediction model, do the following:

1. In the left navigation pane of the Canvas application, choose **My models**.

1. On the **My models** page, choose your model.

1. After opening your model, choose the **Predict** tab.

1. On the **Run predictions** page, choose **Batch prediction**.

1. Choose **Select dataset** if you’ve already imported your dataset. If not, choose **Import new dataset**, and then you’ll be directed through the import data workflow.

1. From the list of available datasets, select your dataset and choose **Generate predictions** to get your predictions.

After the prediction job finishes running, on the **Run predictions** page, you see an output dataset listed under **Predictions**. This dataset contains your results, and if you select the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), you can choose **View prediction results** to see the output data. You can see the images along with their predicted labels and confidence scores. Then, you can choose **Download prediction** to download the results as a CSV or a ZIP file.

## Make manual batch predictions with text prediction models
<a name="canvas-make-predictions-batch-text"></a>

To make manual batch predictions for a multi-category text prediction model, do the following:

1. In the left navigation pane of the Canvas application, choose **My models**.

1. On the **My models** page, choose your model.

1. After opening your model, choose the **Predict** tab.

1. On the **Run predictions** page, choose **Batch prediction**.

1. Choose **Select dataset** if you’ve already imported your dataset. If not, choose **Import new dataset**, and then you’ll be directed through the import data workflow. The dataset you choose must have the same source column as the dataset with which you built the model.

1. From the list of available datasets, select your dataset and choose **Generate predictions** to get your predictions.

After the prediction job finishes running, on the **Run predictions** page, you see an output dataset listed under **Predictions**. This dataset contains your results, and if you select the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), you can choose **Preview** to see the output data. You can see the images along with their predicted labels and confidence scores. Then, you can choose **Download prediction** to download the results.

# Make automatic batch predictions
<a name="canvas-make-predictions-batch-auto"></a>

**Note**  
Time series forecasting models don't support automatic batch predictions.

To set up a schedule for automatic batch predictions, do the following:

1. In the left navigation pane of Canvas, choose **My models**.

1. Choose your model.

1. Choose the **Predict** tab.

1. Choose **Batch prediction**.

1. For **Generate predictions**, choose **Automatic**.

1. The **Automate batch predictions** dialog box pops up. Choose **Select dataset** and choose the dataset for which you want to automate predictions. Note that you can only select a dataset that was imported through local upload or Amazon S3.

1. After selecting a dataset, choose **Set up**.

Canvas runs a batch predictions job for the dataset after you set up the configuration. Then, every time you [Update a dataset](canvas-update-dataset.md), either manually or automatically, another batch predictions job runs.

After the prediction job finishes running, on the **Run predictions** page, you see an output dataset listed under **Predictions**. This dataset contains your results, and if you select the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), you can choose **Preview** to preview the output data. You can see the input data matched to the prediction and the probability that the prediction is correct. Then, you can choose **Download** to download the results.

The following sections describe how to view, update, and delete your automatic batch prediction configuration through the **Datasets** page in the Canvas application. You can only set up a maximum of 20 automatic configurations in Canvas. For more information about viewing your automated batch predictions job history or making changes to your automatic configuration through the **Automations** page, see [How to manage automations](canvas-manage-automations.md).

# Edit your automatic batch prediction configuration
<a name="canvas-make-predictions-batch-auto-edit"></a>

You might want to make changes to your auto update configuration for a dataset, such as changing the frequency of the updates. You might also want to turn off your automatic update configuration to pause the updates to your dataset.

When you edit a batch prediction configuration, you can change the target dataset but not the frequency (since automatic batch predictions occur whenever the dataset is updated).

To edit your auto update configuration, do the following:

1. Go to the **Predict** tab of your model.

1. Under **Predictions**, choose the **Configuration** tab.

1. Find your configuration and choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. From the dropdown menu, choose **Update configuration**.

1. The **Automate batch prediction** dialog box opens. You can select another dataset and choose **Set up** to save your changes.

Your automatic batch predictions configuration is now updated.

To pause your automatic batch predictions, turn off your automatic configuration by doing the following:

1. Go to the **Predict** tab of your model.

1. Under **Predictions**, choose the **Configuration** tab.

1. Find your configuration from the list and turn off the **Auto update** toggle.

Automatic batch predictions are now paused. You can turn the toggle back on at any time to resume the update schedule.

# Delete your automatic batch prediction configuration
<a name="canvas-make-predictions-batch-auto-delete"></a>

To learn how to delete your automatic batch prediction configuration, see [Delete an automatic configuration](canvas-manage-automations-delete.md).

You can also delete your configuration by doing the following:

1. Go to the **Predict** tab of your model.

1. Under **Predictions**, choose the **Configuration** tab.

1. Find your configuration from the list and choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. From the dropdown menu, choose **Delete configuration**.

Your configuration should now be deleted.

# View your batch prediction jobs
<a name="canvas-make-predictions-batch-auto-view"></a>

To view the statuses and history of your batch prediction jobs, go to the **Predict** tab of your model.

Each batch prediction job shows up in the **Predict** tab of your model. Under **Predictions**, you can see the **All jobs** tab and the **Configuration** tabs:
+ **All jobs** – In this tab, you can see all of the manual and automatic batch prediction jobs for this model. You can filter the jobs by configuration name. For each job, you can see the following fields:
  + **Status** – The current status of your batch prediction job. If the status is **Failed** or **Partially failed**, you can hover over the status to view a more detailed error message to help you troubleshoot.
  + **Input dataset** – The name of your Canvas input dataset, including the dataset version.
  + **Prediction type** – Whether the prediction job was automatic or manual.
  + **Rows** – The number of rows predicted.
  + **Configuration name** – The name of the batch prediction job configuration.
  + **QuickSight** – Describes whether you've sent the batch predictions to Quick.
  + **Created** – The creation time of the batch prediction job.

  If you choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), you can choose **View details**, **Preview prediction**, **Download prediction**, or **Send to Quick**. If you choose **View details**, a page opens that shows you the full details of the batch prediction job, including the status, the input and output data configurations, information about the instances used to complete the job and access to the Amazon CloudWatch logs. The page looks like the following screenshot.  
![\[Batch prediction job details page showing all of the additional details about a job.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-view-batch-prediction-job-details.png)
+ **Configuration** – In this tab, you can see all of the automatic batch prediction configurations you’ve created for this model. For each configuration, you can see fields such as the timestamp for when it was **Created**, the **Input dataset** it tracks for updates, and the **Next job scheduled**, which is the time when the next automatic prediction job is scheduled to start. If you choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), you can choose **View all jobs** to see the job history and in progress jobs for the configuration.


# Send predictions to Quick
<a name="canvas-send-predictions"></a>

**Note**  
You can send batch predictions to Quick for numeric and categorical prediction and time series forecasting models. Single-label image prediction and multi-category text prediction models are excluded.

Once you generate batch predictions with custom tabular models in SageMaker Canvas, you can send those predictions as CSV files to Quick, which is a business intelligence (BI) service to build and publish predictive dashboards.

For example, if you built a 2 category prediction model to determine whether a customer will churn, you can create a visual, predictive dashboard in Quick to show the percentage of customers that are expected to churn. To learn more about Quick, see the [Quick User Guide](https://docs.aws.amazon.com/quicksight/latest/user/welcome.html).

The following sections show you how to send your batch predictions to Quick for analysis.

## Before you begin
<a name="canvas-send-predictions-prereqs"></a>

Your user must have the necessary AWS Identity and Access Management (IAM) permissions to send your predictions to Quick. Your administrator can set up the IAM permissions for your user. For more information, see [Grant Your Users Permissions to Send Predictions to Quick](canvas-quicksight-permissions.md).

Your Quick account must contain the `default` namespace, which is set up when you first create your Quick account. Contact your administrator to help you get access to Quick. For more information, see [Setting up for Quick](https://docs.aws.amazon.com/quicksight/latest/user/setting-up.html) in the *Quick User Guide*.

Your Quick account must be created in the same Region as your Canvas application. If your Quick account’s home Region differs from your Canvas application’s Region, you must either [close](https://docs.aws.amazon.com/quicksight/latest/user/closing-account.html) and recreate your Quick account, or [set up a Canvas application](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-getting-started.html#canvas-prerequisites) in the same Region as your Quick account. You can check your Quick home Region by doing the following (assuming you already have an Quick account):

1. Open your [Quick console](https://quicksight.aws.amazon.com/).

1. When the page loads, your Quick home Region is appended to the URL in the following format: `https://<your-home-region>.quicksight.aws.amazon.com/`.

You must know the usernames of the Quick users to whom you want to send your predictions. You can send predictions to yourself or other users who have the right permissions. Any users to whom you send predictions must be in the `default` [namespace](https://docs.aws.amazon.com/quicksight/latest/user/namespaces.html) of your Quick account and have the `Author` or `Admin` role in Quick.

Additionally, Quick must have access to the SageMaker AI default Amazon S3 bucket for your domain, which is named with the following format: `sagemaker-{REGION}-{ACCOUNT_ID}`. The Region should be the same as your Quick account's home Region and your Canvas application’s Region. To learn how to give Quick access to the batch predictions stored in your Amazon S3 bucket, see the topic [I can’t connect to Amazon S3](https://docs.aws.amazon.com/quicksight/latest/user/troubleshoot-connect-S3.html) in the *Quick User Guide*.

## Supported data formats
<a name="canvas-send-predictions-formatting"></a>

Before sending your predictions, check that the data format of your batch predictions is compatible with Quick.
+ To learn more about the accepted data formats for timeseries data, see [Supported date formats](https://docs.aws.amazon.com/quicksight/latest/user/supported-date-formats.html) in the *Quick User Guide*.
+ To learn more about data values that might prevent you from sending to Quick, see [Unsupported values in data](https://docs.aws.amazon.com/quicksight/latest/user/unsupported-data-values.html) in the *Quick User Guide*.

Also note that Quick uses the character `"` as a text qualifier, so if your Canvas data contains any `"` characters, make sure that you close all matching quotes. Any mismatching quotes can cause issues with sending your dataset to Quick.

## Send your batch predictions to Quick
<a name="canvas-send-predictions-send"></a>

Use the following procedure to send your predictions to Quick:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. On the **My models** page, choose your model.

1. Choose the **Predict** tab.

1. Under **Predictions**, select the dataset (or datasets) of batch predictions that you’d like to share. You can share up to 5 datasets of batch predictions at a time.

1. After you select your dataset, choose **Send to Quick**.
**Note**  
The **Send to Quick** button doesn’t activate unless you select one or more datasets.

   Alternatively, you can preview your predictions by choosing the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) and then **View prediction results**. From the dataset preview, you can choose **Send to Quick**. The following screenshot shows you the **Send to Quick** button in a dataset preview.  
![\[Screenshot of a dataset preview with the Send to Quick button at the bottom.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/send-to-quicksight-preview.png)

1. In the **Send to Quick** dialog box, do the following:

   1. For **QuickSight users**, enter the name of the Quick users to whom you want to send your predictions. If you want to send them to yourself, enter your own username. You can only send predictions to users in the `default` namespace of the Quick account, and the user must have the `Author` or `Admin` role in Quick.

   1. Choose **Send**.

   The following screenshot shows the **Send to Quick** dialog box:  
![\[The Send to Quick dialog box.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/send-to-quicksight.png)

After you send your batch predictions, the **QuickSight** field for the datasets you sent shows as `Sent`. In the confirmation box that confirms your predictions were sent, you can choose **Open Quick** to open your Quick application. If you’re done using Canvas, you should [log out](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-log-out.html) of the Canvas application.

The Quick users that you’ve sent datasets to can open their Quick application and view the Canvas datasets that have been shared with them. Then, they can create predictive dashboards with the data. For more information, see [Getting started with Quick data analysis](https://docs.aws.amazon.com/quicksight/latest/user/getting-started.html) in the *Quick User Guide*.

By default, all of the users to whom you send predictions have owner permissions for the dataset in Quick. Owners are able to create analyses, refresh, edit, delete, and re-share datasets. The changes that owners make to a dataset change the dataset for all users with access. To change the permissions, go to the dataset in Quick and manage its permissions. For more information, see [Viewing and editing the permissions users that a dataset is shared with](https://docs.aws.amazon.com/quicksight/latest/user/sharing-data-sets.html#view-users-data-set) in the *Quick User Guide*.

# Download a model notebook
<a name="canvas-notebook"></a>

**Note**  
The model notebook feature is available for quick build and standard build tabular models, and fine-tuned foundation models. Model notebooks aren't supported for image prediction, text prediction, or time series forecasting models.  
If you'd like to generate a model notebook for a tabular model built before this feature was launched, you must rebuild the model to generate a notebook.

For eligible models that you successfully build in Amazon SageMaker Canvas, a Jupyter notebook containing a report of all the model building steps is generated. This Jupyter notebook contains Python code that you can run locally or run in an environment like Amazon SageMaker Studio Classic to replicate the steps necessary to build your model. The notebook can be useful if you’d like to experiment with the code or see the backend details of how Canvas builds models.

To access the model notebook, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose the model and version that you built.

1. On the model version’s page, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) in the header.

1. From the dropdown menu, choose **View Notebook**.

1. A popup appears with the notebook content. You can choose **Download** and then do one of the following:

   1. Choose **Download** to save the notebook content to your local device.

   1. Choose **Copy S3 URI** to copy the Amazon S3 location where the notebook is stored. The notebook is stored in the Amazon S3 bucket specified in your **Canvas storage configuration**, which is configured in the [Prerequisites for setting up Amazon SageMaker Canvas](canvas-getting-started.md#canvas-prerequisites) section.

You should now be able to view the notebook either locally or as an object in Amazon S3. You can upload the notebook to an IDE to edit and run the code, or you can share the notebook with others in your organization to review.

# Send your model to Quick
<a name="canvas-send-model-to-quicksight"></a>

If you use Quick and want to leverage SageMaker Canvas in your Quick visualizations, you can build an Amazon SageMaker Canvas model and use it as a *predictive field* in your Quick dataset. A *predictive field* is a field in your Quick dataset that can make predictions for a given column in your dataset, similar to how Canvas users make single or batch predictions with a model. To learn more about how to integrate Canvas predictive abilities into your Quick datasets, see [SageMaker Canvas integration](https://docs.aws.amazon.com/quicksight/latest/user/sagemaker-canvas-integration.html) in the [Quick User Guide](https://docs.aws.amazon.com/quicksight/latest/user/welcome.html).

The following steps explain how you can add a predictive field to your Quick dataset using a Canvas model:

1. Open the Canvas application and build a model with your dataset.

1. After building the model in Canvas, send the model to Quick. A schema file automatically downloads to your local machine when you send the model to Quick. You upload this schema file to Quick in the next step.

1. Open Quick and choose a dataset with the same schema as the dataset you used to build your model. Add a predictive field to the dataset and do the following:

   1. Specify the model sent from Canvas.

   1. Upload the schema file that was downloaded in Step 2.

1. Save and publish your changes, and then generate predictions for the new dataset. Quick uses the model to fill in the target column with predictions.

In order to send a model from Canvas to Quick, you must meet the following prerequisites:
+ You must have both Canvas and Quick set up. Your Quick account must be created in the same AWS Region as your Canvas application. If your Quick account’s home Region differs from your Canvas application’s Region, you must either [close](https://docs.aws.amazon.com/quicksight/latest/user/closing-account.html) and recreate your Quick account, or [set up a Canvas application](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-getting-started.html#canvas-prerequisites) in the same Region as your Quick account. Your Quick account must also contain the default namespace, which you set up when you first create your Quick account. Contact your administrator to help you get access to Quick. For more information, see [Setting up for Quick](https://docs.aws.amazon.com/quicksight/latest/user/setting-up.html) in the *Quick User Guide*.
+ Your user must have the necessary AWS Identity and Access Management (IAM) permissions to send your predictions to Quick. Your administrator can set up the IAM permissions for your user. For more information, see [Grant Your Users Permissions to Send Predictions to Quick](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-quicksight-permissions.html).
+ Quick must have access to the Amazon S3 bucket that you’ve specified for Canvas application storage. For more information, see [Configure your Amazon S3 storage](canvas-storage-configuration.md).

# Time Series Forecasts in Amazon SageMaker Canvas
<a name="canvas-time-series"></a>

**Note**  
Time series forecasting models are only supported for tabular datasets.

Amazon SageMaker Canvas gives you the ability to use machine learning time series forecasts. Time series forecasts give you the ability to make predictions that can vary with time.

You can make a time series forecast for the following examples:
+ Forecasting your inventory in the coming months.
+ The number of items sold in the next four months.
+ The effect of reducing the price on sales during the holiday season.
+ Item inventory in the next 12 months.
+ The number of customers entering a store in the next several hours.
+ Forecasting how a 10% reduction in the price of a product affects sales over a time period.

To make a time series forecast, your dataset must have the following:
+ A timestamp column with all values having the `datetime` type.
+ A target column that has the values that you're using to forecast future values.
+ An item ID column that contains unique identifiers for each item in your dataset, such as SKU numbers.

The `datetime` values in the timestamp column must use one of the following formats:
+ `YYYY-MM-DD HH:MM:SS`
+ `YYYY-MM-DDTHH:MM:SSZ`
+ `YYYY-MM-DD`
+ `MM/DD/YY`
+ `MM/DD/YY HH:MM`
+ `MM/DD/YYYY`
+ `YYYY/MM/DD HH:MM:SS`
+ `YYYY/MM/DD`
+ `DD/MM/YYYY`
+ `DD/MM/YY`
+ `DD-MM-YY`
+ `DD-MM-YYYY`

You can make forecasts for the following intervals:
+ 1 min
+ 5 min
+ 15 min
+ 30 min
+ 1 hour
+ 1 day
+ 1 week
+ 1 month
+ 1 year

## Future values in your input dataset
<a name="canvas-time-series-future"></a>

Canvas automatically detects columns in your dataset that might potentially contain future values. If present, these values can enhance the accuracy of predictions. Canvas marks these specific columns with a `Future values` label. Canvas infers the relationship between the data in these columns and the target column that you are trying to predict, and utilizes that relationship to generate more accurate forecasts.

For example, you can forecast the amount of ice cream sold by a grocery store. To make a forecast, you must have a timestamp column and a column that indicates how much ice cream the grocery store sold. For a more accurate forecast, your dataset can also include the price, the ambient temperature, the flavor of the ice cream, or a unique identifier for the ice cream.

Ice cream sales might increase when the weather is warmer. A decrease in the price of the ice cream might result in more units sold. Having a column with ambient temperature data and a column with pricing data can improve your ability to forecast the number of units of ice cream the grocery store sells.

While providing future values is optional, it helps you to perform what-if analyses directly in the Canvas application, showing you how changes in future values could alter your predictions.

## Handling missing values
<a name="canvas-time-series-missing"></a>

You might have missing data for different reasons. The reason for your missing data might inform how you want Canvas to impute it. For example, your organization might use an automatic system that only tracks when a sale happens. If you're using a dataset that comes from this type of automatic system, you have missing values in the target column.

**Important**  
If you have missing values in the target column, we recommend using a dataset that doesn't have them. SageMaker Canvas uses the target column to forecast future values. Missing values in the target column can greatly reduce the accuracy of the forecast.

For missing values in the dataset, Canvas automatically imputes the missing values for you by filling the target column with `0` and other numeric columns with the median value of the column.

However, you can select your own filling logic for the target column and other numeric columns in your datasets. Target columns have different filling guidelines and restrictions than the rest of the numeric columns. Target columns are filled up to the end of the historical period, whereas numeric columns are filled across both historical and future periods all the way to the end of the forecast horizon. Canvas only fills future values in a numeric column if your data has at least one record with a future timestamp and a value for that specific column.

You can choose one of the following filling logic options to impute missing values in your data:
+ `zero` – Fill with `0`.
+ `NaN` – Fill with NaN, or not a number. This is only supported for the target column.
+ `mean` – Fill with the mean value from the data series.
+ `median` – Fill with the median value from the data series.
+ `min` – Fill with the minimum value from the data series.
+ `max` – Fill with the maximum value from the data series.

When choosing a filling logic, you should consider how your model interprets the logic. For example, in a retail scenario, recording zero sales of an available item is different from recording zero sales of an unavailable item, as the latter scenario doesn’t necessarily imply a lack of customer interest in the unavailable item. In this case, filling with `0` in the target column of the dataset might cause the model to be under-biased in its predictions and infer a lack of customer interest in unavailable items. Conversely, filling with `NaN` might cause the model to ignore true occurrences of zero items being sold of available items.

## Types of forecasts
<a name="canvas-time-series-types"></a>

You can make one of the following types of forecasts:
+ **Single item**
+ **All items**

For a forecast on all the items in your dataset, SageMaker Canvas returns a forecast for the future values for each item in your dataset.

For a single item forecast, you specify the item and SageMaker Canvas returns a forecast for the future values. The forecast includes a line graph that plots the predicted values over time.

**Topics**
+ [Future values in your input dataset](#canvas-time-series-future)
+ [Handling missing values](#canvas-time-series-missing)
+ [Types of forecasts](#canvas-time-series-types)
+ [Additional options for forecasting insights](canvas-additional-insights.md)

# Additional options for forecasting insights
<a name="canvas-additional-insights"></a>

In Amazon SageMaker Canvas, you can use the following optional methods to get more insights from your forecast:
+ Group column
+ Holiday schedule
+ What-if scenario

You can specify a column in your dataset as a **Group column**. Amazon SageMaker Canvas groups the forecast by each value in the column. For example, you can group the forecast on columns containing price data or unique item identifiers. Grouping a forecast by a column lets you make more specific forecasts. For example, if you group a forecast on a column containing item identifiers, you can see the forecast for each item.

Overall sales of items might be impacted by the presence of holidays. For example, in the United States, the number of items sold in both November and December might differ greatly from the number of items sold in January. If you use the data from November and December to forecast the sales in January, your results might be inaccurate. Using a holiday schedule prevents you getting inaccurate results. You can use a holiday schedule for 251 countries.

For a forecast on a single item in your dataset, you can use what-if scenarios. A what-if scenario gives you the ability to change values in your data and change the forecast. For example, you can answer the following questions by using a what-if scenario, "What if I lowered prices? How would that affect the number of items sold?"

# Adding model versions in Amazon SageMaker Canvas
<a name="canvas-update-model"></a>

In Amazon SageMaker Canvas, you can update the models that you’ve built by adding *versions*. Each model that you build has a version number. The first model is version 1 or `V1`. You can use model versions to see changes in prediction accuracy when you update your data or use [advanced transformations](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-prepare-data.html).

When viewing your model, SageMaker Canvas shows you the model history so that you can compare all of the model versions that you built. You can also delete versions that are no longer useful to you. By creating multiple model versions and evaluating their accuracy, you can iteratively improve your model performance.

**Note**  
Text prediction and image prediction models only support one model version.

To add a model version, you can either clone an existing version or create a new version. 

Cloning an existing version copies over the current model configuration, including the model recipe and the input dataset. Alternatively, you can create a new version if you want to configure a new model recipe or choose a different dataset. 

If you create a new version and select a different dataset, you must choose a dataset with the same target column and schema as the dataset from version 1.

Before you can add a new version, you must successfully build at least one model version. Then, you can [ register a model version in the SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-register-model.html). Use the registry for tracking model versions and for collaborating with Studio Classic users on production model approvals.

If you did a quick build for your first model version, you have the option to run a standard build when you add a version. Standard builds generally have higher accuracy. Therefore, if you feel confident in your quick build configuration, you can run a standard build to create a final version of your model. To learn more about the differences between quick builds and standard builds, see [How custom models work](canvas-build-model.md).

The following procedures show you how to add model versions; the procedure is different depending on whether you are adding a version of the same build type or a different build type (quick versus standard). Use the procedure **To add a new model version** to add versions of the same build type. To add a standard build model version after running a quick build, follow the procedure **To run a standard build**.

**To add a new model version**

1. Open your SageMaker Canvas application. For more information, see [Getting started with using Amazon SageMaker Canvas](canvas-getting-started.md).

1. In the left navigation pane, choose **My models**.

1. On the **My models** page, choose your model. To find your model, you can choose **Filter by problem type**.

1. After your model opens, choose the **Add version** button in the top panel.

1. From the dropdown menu, select one of the following options:

   1. **Add a new version from scratch** – When you select this option, the **Build** tab opens with the draft for a new model version. You can select a different dataset (as long as the schema matches the schema of the first model version’s dataset) and configure a new model recipe. For more information about building a model version, see [Build a model](canvas-build-model-how-to.md).

   1. **Clone an existing version with configurations** – A dialog box prompts you to select the version that you want to clone. After you've selected your desired version, choose **Clone**. The **Build** tab opens with the draft for a new model version. Any model recipe configurations are copied over from the cloned version. For more information about building a model version, see [Build a model](canvas-build-model-how-to.md).

**To run a standard build**

1. Open your SageMaker Canvas application. For more information, see [Getting started with using Amazon SageMaker Canvas](canvas-getting-started.md).

1. In the left navigation pane, choose **My models**.

1. On the **My models** page, choose your model. You can choose **Filter by problem type** to find your model more easily.

1. After your model opens, choose the **Analyze** tab.

1. Choose **Standard build**.  
![\[The Analyze tab of a Canvas model showing the standard build button.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-add-version-quick-to-standard.png)

   On the model draft page that opens to the **Build** tab, you can modify your model configuration and start a build. For more information about building a model version, see [Build a model](canvas-build-model-how-to.md).

You should now have a new model version build in progress. For more information about building a model, see [How custom models work](canvas-build-model.md).

After building a model version, you can return to your model details page at any time to view all of the versions or add more versions. The following image shows the **Versions** page for a model.

![\[The model versions page for a model in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/model-versions.png)


On the **Versions** page, you can view the following information for each of your model versions:
+ **Status** – This field tells you whether your model is currently building (`In building`), done building (`Ready`), failed to build (`Failed`), or still being edited (`In draft`).
+ **Model score**, **F1**, **Precision**, **Recall**, and **AUC** – If you turn on the **Show advanced metrics** toggle on this page, you can see these model metrics. These metrics indicate the accuracy and performance of your model. For more information, see [Evaluate your model](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-evaluate-model.html).
+ **Shared** – This field states whether you shared the model version with SageMaker Studio Classic users.
+ **Model registry** – This field states whether you registered the version to a model registry. For more information, see [Register a model version in the SageMaker AI model registry](canvas-register-model.md).

# MLOps
<a name="canvas-mlops"></a>

After building a model in SageMaker Canvas that you feel confident about, you might want to integrate your model with the machine learning operations (MLOps) processes in your organization. MLOps includes common tasks such as deploying a model for use in production or setting up continuous integration and continuous deployment (CI/CD) pipelines.

The following topics describe how you can use features within Canvas to use a Canvas-built model in production.

**Topics**
+ [Register a model version in the SageMaker AI model registry](canvas-register-model.md)
+ [Deploy your models to an endpoint](canvas-deploy-model.md)
+ [View your deployments](canvas-deploy-model-view.md)
+ [Update a deployment configuration](canvas-deploy-model-update.md)
+ [Test your deployment](canvas-deploy-model-test.md)
+ [Invoke your endpoint](canvas-deploy-model-invoke.md)
+ [Delete a model deployment](canvas-deploy-model-delete.md)

# Register a model version in the SageMaker AI model registry
<a name="canvas-register-model"></a>

With SageMaker Canvas, you can build multiple iterations, or versions, of your model to improve it over time. You might want to build a new version of your model if you acquire better training data or if you want to attempt to improve the model’s accuracy. For more information about adding versions to your model, see [Update a model](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-update-model.html).

After you’ve [built a model](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-build-model.html) that you feel confident about, you might want to evaluate its performance and have it reviewed by a data scientist or MLOps engineer in your organization before using it in production. To do this, you can register your model versions to the [SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html). The SageMaker Model Registry is a repository that data scientists or engineers can use to catalog machine learning (ML) models and manage model versions and their associated metadata, such as training metrics. They can also manage and log the approval status of a model.

After you register your model versions to the SageMaker Model Registry, a data scientist or your MLOps team can access the SageMaker Model Registry through [SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html), which is a web-based integrated development environment (IDE) for working with machine learning models. In the SageMaker Model Registry interface in Studio Classic, the data scientist or MLOps team can evaluate your model and update its approval status. If the model doesn’t perform to their requirements, the data scientist or MLOps team can update the status to `Rejected`. If the model does perform to their requirements, then the data scientist or MLOps team can update the status to `Approved`. Then, they can [deploy your model to an endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html#deploy-model-prereqs) or [automate model deployment](https://aws.amazon.com/blogs/machine-learning/building-automating-managing-and-scaling-ml-workflows-using-amazon-sagemaker-pipelines/) with CI/CD pipelines. You can use the SageMaker AI model registry feature to seamlessly integrate models built in Canvas with the MLOps processes in your organization.

The following diagram summarizes an example of registering a model version built in Canvas to the SageMaker Model Registry for integration into an MLOps workflow.

![\[The steps registering a model version built in Canvas for integration into an MLOps workflow.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-model-registration-diagram.jpg)


You can register tabular, image, and text model versions to the SageMaker Model Registry. This includes time series forecasting models and JumpStart based [fine-tuned foundation models](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-fm-chat-fine-tune.html).

**Note**  
Currently, you can't register Amazon Bedrock based fine-tuned foundation models built in Canvas to the SageMaker Model Registry.

The following sections show you how to register a model version to the SageMaker Model Registry from Canvas.

## Permissions management
<a name="canvas-register-model-prereqs"></a>

By default, you have permissions to register model versions to the SageMaker Model Registry. SageMaker AI grants these permissions for all new and existing Canvas user profiles through the [AmazonSageMakerCanvasFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerCanvasFullAccess.html) policy, which is attached to the AWS IAM execution role for the SageMaker AI domain that hosts your Canvas application.

If your Canvas administrator is setting up a new domain or user profile, when they're setting up the domain and following the prerequisite instructions in the [Getting started guide](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-getting-started.html#canvas-prerequisites), SageMaker AI turns on the model registration permissions through the **ML Ops permissions configuration** option, which is enabled by default.

The Canvas administrator can manage model registration permissions at the user profile level as well. For example, if the administrator wants to grant model registration permissions to some user profiles but remove permissions for others, they can edit the permissions for a specific user. The following procedure shows how to turn off model registration permissions for a specific user profile:

1. Open the SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. On the left navigation pane, choose **Admin configurations**.

1. Under **Admin configurations**, choose **domains**. 

1. From the list of domains, select the user profile’s domain.

1. On the **domain details** page, choose the **User profile** whose permissions you want to edit.

1. On the **User Details** page, choose **Edit**.

1. In the left navigation pane, choose **Canvas settings**.

1. In the **ML Ops permissions configuration** section, turn off the **Enable Model Registry registration permissions** toggle.

1. Choose **Submit** to save the changes to your domain settings.

The user profile should no longer have model registration permissions.

## Register a model version to the SageMaker AI model registry
<a name="canvas-register-model-register"></a>

SageMaker Model Registry tracks all of the model versions that you build to solve a particular problem in a *model group*. When you build a SageMaker Canvas model and register it to SageMaker Model Registry, it gets added to a model group as a new model version. For example, if you build and register four versions of your model, then a data scientist or MLOps team working in the SageMaker Model Registry interface can view the model group and review all four versions of the model in one place.

When registering a Canvas model to the SageMaker Model Registry, a model group is automatically created and named after your Canvas model. Optionally, you can rename it to a name of your choice, or use an existing model group in the SageMaker Model Registry. For more information about creating a model group, see [Create a Model Group](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-model-group.html).

**Note**  
Currently, you can only register models built in Canvas to the SageMaker Model Registry in the same account.

To register a model version to the SageMaker Model Registry from the Canvas application, use the following procedure:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. On the **My models** page, choose your model. You can **Filter by problem type** to find your model more easily.

1. After choosing your model, the **Versions** page opens, listing all of the versions of your model. You can turn on the **Show advanced metrics** toggle to view the advanced metrics, such as **Recall** and **Precision**, to compare your model versions and determine which one you’d like to register.

1. From the list of model versions, for the the version that you want to register, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)). Alternatively, you can double click on the version that you need to register, and then on the version details page, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. In the dropdown list, choose **Add to Model Registry**. The **Add to Model Registry** dialog box opens.

1. In the **Add to Model Registry** dialog box, do the following:

   1. (Optional) In the **SageMaker Studio Classic model group** section, for the **Model group name** field, enter the name of the model group to which you want to register your version. You can specify the name for a new model group that SageMaker AI creates for you, or you can specify an existing model group. If you don’t specify this field, Canvas registers your version to a default model group with the same name as your model.

   1. Choose **Add**.

Your model version should now be registered to the model group in the SageMaker Model Registry. When you register a model version to a model group in the SageMaker Model Registry, all subsequent versions of the Canvas model are registered to the same model group (if you choose to register them). If you register your versions to a different model group, you need to go to the SageMaker Model Registry and [delete the model group](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-delete-model-group.html). Then, you can re-register your model versions to the new model group.

To view the status of your models, you can return to the **Versions** page for your model in the Canvas application. This page shows you the **Model Registry** status of each version. If the status is `Registered`, then the model has been successfully registered.

If you want to view the details of your registered model version, for the **Model Registry** status, you can hover over the **Registered** field to see the **Model registry details** pop-up box. These details contain more info, such as the following:
+ The **Model package group name** is the model group that your version is registered to in the SageMaker Model Registry.
+ The **Approval status**, which can be `Pending Approval`, `Approved`, or `Rejected`. If a Studio Classic user approves or rejects your version in the SageMaker Model Registry, then this status is updated on your model versions page when you refresh the page.

The following screenshot shows the **Model registry details** box, along with an **Approval status** of `Approved` for this particular model version.

![\[Screenshot of the SageMaker Model Registry details box in the Canvas application.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/approved-mr.png)


# Deploy your models to an endpoint
<a name="canvas-deploy-model"></a>

In Amazon SageMaker Canvas, you can deploy your models to an endpoint to make predictions. SageMaker AI provides the ML infrastructure for you to host your model on an endpoint with the compute instances that you choose. Then, you can *invoke* the endpoint (send a prediction request) and get a real-time prediction from your model. With this functionality, you can use your model in production to respond to incoming requests, and you can integrate your model with existing applications and workflows.

To get started, you should have a model that you'd like to deploy. You can deploy custom model versions that you've built, Amazon SageMaker JumpStart foundation models, and fine-tuned JumpStart foundation models. For more information about building a model in Canvas, see [How custom models work](canvas-build-model.md). For more information about JumpStart foundation models in Canvas, see [Generative AI foundation models in SageMaker Canvas](canvas-fm-chat.md).

Review the following **Permissions management** section, and then begin creating new deployments in the **Deploy a model** section.

## Permissions management
<a name="canvas-deploy-model-prereqs"></a>

By default, you have permissions to deploy models to SageMaker AI Hosting endpoints. SageMaker AI grants these permissions for all new and existing Canvas user profiles through the [AmazonSageMakerCanvasFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerCanvasFullAccess.html) policy, which is attached to the AWS IAM execution role for the SageMaker AI domain that hosts your Canvas application.

If your Canvas administrator is setting up a new domain or user profile, when they're setting up the domain and following the prerequisite instructions in the [Prerequisites for setting up Amazon SageMaker Canvas](canvas-getting-started.md#canvas-prerequisites), SageMaker AI turns on the model deployment permissions through the **Enable direct deployment of Canvas models** option, which is enabled by default.

The Canvas administrator can manage model deployment permissions at the user profile level as well. For example, if the administrator doesn't want to grant model deployment permissions to all user profiles when setting up a domain, they can grant permissions to specific users after creating the domain.

The following procedure shows how to modify the model deployment permissions for a specific user profile:

1. Open the SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. On the left navigation pane, choose **Admin configurations**.

1. Under **Admin configurations**, choose **Domains**.

1. From the list of domains, select the user profile’s domain.

1. On the **Domain details** page, select the **User profiles** tab.

1. Choose your **User profile**.

1. On the user profile's page, select the **App Configurations** tab.

1. In the **Canvas** section, choose **Edit**.

1. In the **ML Ops configuration** section, turn on the **Enable direct deployment of Canvas models** toggle to enable deployment permissions.

1. Choose **Submit** to save the changes to your domain settings.

The user profile should now have model deployment permissions.

After granting permissions to the domain or user profile, make sure that the user logs out of their Canvas application and logs back in to apply the permission changes.

## Deploy a model
<a name="canvas-deploy-model-deploy"></a>

To get started with deploying your model, you create a new deployment in Canvas and specify the model version that you want to deploy along with the ML infrastructure, such as the type and number of compute instances that you would like to use for hosting the model.

Canvas suggests a default type and number of instances based on your model type, or you can learn more about the various SageMaker AI instance types on the [Amazon SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/). You are charged based on the SageMaker AI instance pricing while your endpoint is active.

When deploying JumpStart foundation models, you also have the option to specify the length of the deployment time. You can deploy the model to an endpoint indefinitely (meaning the endpoint is active until you delete the deployment). Or, if you only need the endpoint for a short period of time and would like to reduce costs, you can deploy the model to an endpoint for a specified amount of time, after which SageMaker AI shuts down the endpoint for you.

**Note**  
If you deploy a model for a specified amount of time, stay logged in to the Canvas application for the duration of the endpoint. If you log out of or delete the application, then Canvas is unable to shut down the endpoint at the specified time.

After your model is deployed to a SageMaker AI Hosting [real-time inference endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html), you can begin making predictions by *invoking* the endpoint.

There are several different ways for you to deploy a model from the Canvas application. You can access the model deployment option through any of the following methods:
+ On the **My models** page of the Canvas application, choose the model that you want to deploy. Then, from the model’s **Versions** page, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) next to a model version and select **Deploy**.
+ When on the details page for a model version, on the **Analyze** tab, choose the **Deploy** option.
+ When on the details page for a model version, on the **Predict** tab, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) at the top of the page and select **Deploy**.
+ On the **ML Ops** page of the Canvas application, choose the **Deployments** tab and then choose **Create deployment**.
+ For JumpStart foundation models and fine-tuned foundation models, go to the **Ready-to-use models** page of the Canvas application. Choose **Generate, extract and summarize content**. Then, find the JumpStart foundation model or fine-tuned foundation model that you want to deploy. Choose the model, and on the model's chat page, choose the **Deploy** button.

All of these methods open the **Deploy model** side panel, where you specify the deployment configuration for your model. To deploy the model from this panel, do the following:

1. (Optional) If you’re creating a deployment from the **ML Ops** page, you’ll have the option to **Select model and version**. Use the dropdown menus to select the model and model version that you want to deploy.

1. Enter a name in the **Deployment name** field.

1. (For JumpStart foundation models and fine-tuned foundation models only) Choose a **Deployment length**. Select **Indefinite** to leave the endpoint active until you shut it down, or select **Specify length** and then enter the period of time for which you want the endpoint to remain active.

1. For **Instance type**, SageMaker AI detects a default instance type and number that is suitable for your model. However, you can change the instance type that you would like to use for hosting your model.
**Note**  
If you run out of the instance quota for the chosen instance type on your AWS account, you can request a quota increase. For more information about the default quotas and how to request an increase, see [Amazon SageMaker AI endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html) in the *AWS General Reference guide*.

1. For **Instance count**, you can set the number of active instances that are used for your endpoint. SageMaker AI detects a default number that is suitable for your model, but you can change this number.

1. When you’re ready to deploy your model, choose **Deploy**.

Your model should now be deployed to an endpoint.

# View your deployments
<a name="canvas-deploy-model-view"></a>

You might want to check the status or details of a model deployment in Amazon SageMaker Canvas. For example, if your deployment failed, you might want to check the details to troubleshoot.

You can view your Canvas model deployments from the Canvas application or from the Amazon SageMaker AI console.

To view deployment details from Canvas, choose one of the following procedures:

To view your deployment details from the **ML Ops** page, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **ML Ops**.

1. Choose the **Deployments** tab.

1. Choose your deployment by name from the list.

To view your deployment details from a model version’s page, do the following:

1. In the SageMaker Canvas application, go to your model version’s details page.

1. Choose the **Deploy** tab.

1. On the **Deployments ** section that lists all of the deployment configurations associated with that model version, find your deployment.

1. Choose the **More options** icon (![\[More options icon for the output CSV file.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)), and then select **View details** to open the details page.

The details page for your deployment opens, and you can view information such as the time of the most recent prediction, the endpoint’s status and configuration, and the model version that is currently deployed to the endpoint.

You can also view your currently active Canvas workspace instances and active endpoints from the **SageMaker AI dashboard** in the [SageMaker AI console](https://console.aws.amazon.com/sagemaker/). Your Canvas endpoints are listed alongside any other SageMaker AI Hosting endpoints that you’ve created, and you can filter them by searching for endpoints with the Canvas tag.

The following screenshot shows the SageMaker AI dashboard. In the **Canvas** section, you can see that one workspace instance is in service and four endpoints are active.

![\[Screenshot of the SageMaker AI dashboard showing the active Canvas workspace instances and endpoints.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-sagemaker-dashboard.png)


# Update a deployment configuration
<a name="canvas-deploy-model-update"></a>

You can update the deployment configuration for models that you've deployed to endpoints in Amazon SageMaker Canvas. For example, you can deploy an updated model version to the endpoint, or you can update the instance type or number of instances behind the endpoint based on your capacity needs.

There are several different ways for you to update your deployment from the Canvas application. You can use any of the following methods:
+ On the **ML Ops** page of the Canvas application, you can choose the **Deployments** tab and select the deployment that you want to update. Then, choose **Update configuration**.
+ When on the details page for a model version, on the **Deploy** tab, you can view the deployments for that version. Next to the deployment, choose the **More options** icon (![\[More options icon for the output CSV file.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) and then choose **Update configuration**.

Both of the preceding methods open the **Update configuration** side panel, where you can make changes to your deployment configuration. To update the configuration, do the following:

1. For the **Select version** dropdown menu, you can select a different model version to deploy to the endpoint.
**Note**  
When updating a deployment configuration, you can only choose a different model version to deploy. To deploy a different model, create a new deployment.

1. For **Instance type**, you can select a different instance type for hosting your model.

1. For **Instance count**, you can change the number of active instances that are used for your endpoint.

1. Choose **Save**.

Your deployment configuration should now be updated.

# Test your deployment
<a name="canvas-deploy-model-test"></a>

You can test a model deployment by invoking the endpoint, or making single prediction requests, through the Amazon SageMaker Canvas application. You can use this functionality to confirm that your endpoint responds to requests before invoking your endpoint programmatically in a production environment.

## Test a custom model deployment
<a name="canvas-deploy-model-test-custom"></a>

You can test a custom model deployment by accessing it through the **ML Ops** page and making a single invocation, which returns a prediction along with the probability that the prediction is correct.

**Note**  
Execution length is an estimate of the time taken to invoke and get a response from the endpoint in Canvas. For detailed latency metrics, see [SageMaker AI Endpoint Invocation Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html#cloudwatch-metrics-endpoint-invocation).

To test your endpoint through the Canvas application, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation panel, choose **ML Ops**.

1. Choose the **Deployments** tab.

1. From the list of deployments, choose the one with the endpoint that you want to invoke.

1. On the deployment’s details page, choose the **Test deployment** tab.

1. On the deployment testing page, you can modify the **Value** fields to specify a new data point. For time series forecasting models, you specify the **Item ID** for which you want to make a forecast.

1. After modifying the values, choose **Update** to get the prediction result.

The prediction loads, along with the **Invocation result** fields which indicate whether or not the invocation was successful and how long the request took to process.

The following screenshot shows a prediction performed in the Canvas application on the **Test deployment** tab.

![\[The Canvas application showing a test prediction for a deployed model.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-test-deployments.png)


For all model types except numeric prediction and time series forecasting, the prediction returns the following fields:
+  **predicted\$1label** – the predicted output
+  **probability** – the probability that the predicted label is correct
+  **labels** – the list of all the possible labels
+  **probabilities** – the probabilities corresponding to each label (the order of this list matches the order of the labels)

For numeric prediction models, the prediction only contains the **score** field, which is the predicted output of the model, such as the predicted price of a house.

For time series forecasting models, the prediction is a graph showing the forecasts by quantile. You can choose **Schema view** to see the forecasted numeric values for each quantile.

You can continue making single predictions through the deployment testing page, or you can see the following section [Invoke your endpoint](canvas-deploy-model-invoke.md) to learn how to invoke your endpoint programmatically from applications.

## Test a JumpStart foundation model deployment
<a name="canvas-deploy-model-test-js"></a>

You can chat with a deployed JumpStart foundation model through the Canvas application to test its functionality before invoking it through code.

To chat with a deployed JumpStart foundation model, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation panel, choose **ML Ops**.

1. Choose the **Deployments** tab.

1. From the list of deployments, find the one that you want to invoke and choose its **More options** icon (![\[More options icon for a model deployment.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. From the context menu, choose **Test deployment**.

1. A new **Generate, extract and summarize content** chat opens with the JumpStart foundation model, and you can begin typing prompts. Note that prompts from this chat are sent as requests to your SageMaker AI Hosting endpoint.

# Invoke your endpoint
<a name="canvas-deploy-model-invoke"></a>

**Note**  
We recommend that you [test your model deployment in Amazon SageMaker Canvas](canvas-deploy-model-test.md) before invoking a SageMaker AI endpoint programmatically.

You can use your Amazon SageMaker Canvas models that you've deployed to a SageMaker AI endpoint in production with your applications. Invoke the endpoint programmatically the same way that you invoke any other [SageMaker AI real-time endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html). Invoking an endpoint programmatically returns a response object which contains the same fields described in [Test your deployment](canvas-deploy-model-test.md).

For more detailed information about how to programmatically invoke endpoints, see [Invoke models for real-time inference](realtime-endpoints-test-endpoints.md).

The following Python examples show you how to invoke your endpoint based on the model type.

## JumpStart foundation models
<a name="canvas-invoke-js-example"></a>

The following example shows you how to invoke a JumpStart foundation model that you've deployed to an endpoint.

```
import boto3
import pandas as pd

client = boto3.client("runtime.sagemaker")
body = pd.DataFrame(
    [['feature_column1', 'feature_column2'], 
    ['feature_column1', 'feature_column2']]
).to_csv(header=False, index=False).encode("utf-8")
    
response = client.invoke_endpoint(
    EndpointName="endpoint_name",
    ContentType="text/csv",
    Body=body,
    Accept="application/json"
)
```

## Numeric and categorical prediction models
<a name="canvas-invoke-tabular-example"></a>

The following example shows you how to invoke numeric or categorical prediction models.

```
import boto3
import pandas as pd

client = boto3.client("runtime.sagemaker")
body = pd.DataFrame(['feature_column1', 'feature_column2'], ['feature_column1', 'feature_column2']).to_csv(header=False, index=False).encode("utf-8")
    
response = client.invoke_endpoint(
    EndpointName="endpoint_name",
    ContentType="text/csv",
    Body=body,
    Accept="application/json"
)
```

## Time series forecasting models
<a name="canvas-invoke-forecast-example"></a>

The following example shows you how to invoke time series forecasting models. For a complete example of how to test invoke a time series forecasting model, see [ Time-Series Forecasting with Amazon SageMaker Autopilot](https://github.com/aws/amazon-sagemaker-examples/blob/eef13dae197a6e588a8bc111aba3244f99ee0fbb/autopilot/autopilot_time_series.ipynb).

```
import boto3
import pandas as pd

csv_path = './real-time-payload.csv'
data = pd.read_csv(csv_path)

client = boto3.client("runtime.sagemaker")

body = data.to_csv(index=False).encode("utf-8")
    
response = client.invoke_endpoint(
    EndpointName="endpoint_name",
    ContentType="text/csv",
    Body=body,
    Accept="application/json"
)
```

## Image prediction models
<a name="canvas-invoke-cv-example"></a>

The following example shows you how to invoke image prediction models.

```
import boto3
client = boto3.client("runtime.sagemaker")
with open("example_image.jpg", "rb") as file:
    body = file.read()
    response = client.invoke_endpoint(
        EndpointName="endpoint_name",
        ContentType="application/x-image",
        Body=body,
        Accept="application/json"
    )
```

## Text prediction models
<a name="canvas-invoke-nlp-example"></a>

The following example shows you how to invoke text prediction models.

```
import boto3
import pandas as pd

client = boto3.client("runtime.sagemaker")
body = pd.DataFrame([["Example text 1"], ["Example text 2"]]).to_csv(header=False, index=False).encode("utf-8")
    
response = client.invoke_endpoint(
    EndpointName="endpoint_name",
    ContentType="text/csv",
    Body=body,
    Accept="application/json"
)
```

# Delete a model deployment
<a name="canvas-deploy-model-delete"></a>

You can delete your model deployments from the Amazon SageMaker Canvas application. This action also deletes the endpoint from the SageMaker AI console and shuts down any endpoint-related resources.

**Note**  
Optionally, you can delete your endpoint through the [SageMaker AI console](https://console.aws.amazon.com/sagemaker/) or using the SageMaker AI `DeleteEndpoint` API. For more information, see [Delete Endpoints and Resources](realtime-endpoints-delete-resources.md). However, when you delete the endpoint through the SageMaker AI console or APIs instead of the Canvas application, the list of deployments in Canvas isn’t automatically updated. You must also delete the deployment from the Canvas application to remove it from the list.

To delete a deployment in Canvas, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation panel, choose **ML Ops**.

1. Choose the **Deployments** tab.

1. From the list of deployments, choose the one that you want to delete.

1. At the top of the deployment details page, choose the **More options** icon (![\[More options icon for the output CSV file.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. Choose **Delete deployment**.

1. In the ** Delete deployment** dialog box, choose **Delete**.

Your deployment and SageMaker AI Hosting endpoint should now be deleted from both Canvas and the SageMaker AI console.

# How to manage automations
<a name="canvas-manage-automations"></a>

In SageMaker Canvas, you can create automations that update your dataset or generate predictions from your model on a schedule. For example, you might receive new shipping data on a daily basis. You can set up an automatic update for your dataset and automatic batch predictions that run whenever the dataset is updated. Using these features, you can set up an automated workflow and reduce the amount of time you spend manually updating datasets and making predictions.

**Note**  
You can only set up a maximum of 20 automatic configurations in your Canvas application. Automations are only active while you’re logged in to the Canvas application. If you log out of Canvas, your automatic jobs pause until you log back in.

The following sections describe how to view, edit, and delete configurations for existing automations. To learn how to set up automations, see the following topics:
+ To set up automatic dataset updates, see [Update a dataset](canvas-update-dataset.md).
+ To set up automatic batch predictions, see [Batch predictions in SageMaker Canvas](canvas-make-predictions-batch.md).

**Topics**
+ [View your automations](canvas-manage-automations-view.md)
+ [Edit your automatic configurations](canvas-manage-automations-edit.md)
+ [Delete an automatic configuration](canvas-manage-automations-delete.md)

# View your automations
<a name="canvas-manage-automations-view"></a>

You can also view all of your auto update jobs by going to the left navigation pane of Canvas and choosing **ML Ops**. The **ML Operations** page combines automations for both automatic dataset updates and automatic batch predictions. On the **Automations** tab, you can see the following sub-tabs:
+ **All jobs** – You can see every instance of a **Dataset update** or **Batch prediction** job that Canvas has done. For each job, you can see fields such as the associated **Input dataset**, the **Configuration name** of the associated auto update configuration, and the **Status** showing whether the job was successful or not. You can filter the jobs by configuration name:
  + For dataset update jobs, you can choose the latest version of the dataset, or the most recent job, to preview the dataset.
  + For batch prediction jobs, you can choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) to preview or download the predictions for that job. You can also choose **View details** to see more details about your prediction job. For more information about batch prediction job details, see [View your batch prediction jobs](canvas-make-predictions-batch-auto-view.md).
+ **Configuration** – You can see all of the **Dataset update** and **Batch prediction** configurations you’ve created. For each configuration, you can see fields such as the associated **Input dataset** and the **Frequency** of the jobs. You can also turn off or turn on the **Auto update** toggle to pause or resume automatic updates. If you choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) for a specific configuration, you can choose to **View all jobs** for the configuration, **Update configuration**, or **Delete configuration**.

# Edit your automatic configurations
<a name="canvas-manage-automations-edit"></a>

After setting up a configuration, you might want to make changes to it. For automatic dataset updates, you can update the Amazon S3 location for Canvas to import data, the frequency of the updates, and the starting time. For automatic batch predictions, you can change the dataset that the configuration tracks for updates. You can also turn off the automation to temporarily pause updates until you choose to resume them.

The following sections show you how to update each type of configuration.

**Note**  
You can’t change the frequency for automatic batch predictions because automatic batch predictions run every time the target dataset is updated.

**Topics**
+ [Edit your automatic dataset update configuration](canvas-manage-automations-edit-dataset.md)
+ [Edit your automatic batch prediction configuration](canvas-manage-automations-edit-batch.md)

# Edit your automatic dataset update configuration
<a name="canvas-manage-automations-edit-dataset"></a>

You might want to make changes to your auto update configuration for a dataset, such as changing the frequency of the updates. You might also want to turn off your automatic update configuration to pause the updates to your dataset.

To make changes to your auto update configuration for a dataset, do the following:

1. In the left navigation pane of Canvas, choose **ML Ops**.

1. Choose the **Automations** tab.

1. Choose the **Configuration** tab.

1. For your auto update configuration, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. In the dropdown menu, choose **Update configuration**. You are taken to the **Auto updates** tab of the dataset.

1. Make your changes to the configuration. When you’re done making changes, choose **Save**.

To pause your dataset updates, turn off your automatic configuration. One way to turn off auto updates is by doing the following:

1. In the left navigation pane of Canvas, choose **ML Ops**.

1. Choose the **Automations** tab.

1. Choose the ** Configuration** tab.

1. Find your configuration from the list and turn off the **Auto update** toggle.

Automatic updates for your dataset are now paused. You can turn this toggle back on at any time to resume the update schedule.

# Edit your automatic batch prediction configuration
<a name="canvas-manage-automations-edit-batch"></a>

When you edit a batch prediction configuration, you can change the target dataset but not the frequency (since automatic batch predictions occur whenever the dataset is updated).

To make changes to your automatic batch predictions configuration, do the following:

1. In the left navigation pane of Canvas, choose **ML Ops**.

1. Choose the **Automations** tab.

1. Choose the **Configuration** tab.

1. For your auto update configuration, choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. In the dropdown menu, choose **Update configuration**. You are taken to the **Auto updates** tab of the dataset.

1. The **Automate batch prediction** dialog box opens. You can select another dataset and choose **Set up** to save your changes.

Your automatic batch predictions configuration is now updated.

To pause your automatic batch predictions, turn off your automatic configuration. Use the following procedure to turn off your configuration:

1. In the left navigation pane of Canvas, choose **ML Ops**.

1. Choose the **Automations** tab.

1. Choose the ** Configuration** tab.

1. Find your configuration from the list and turn off the **Auto update** toggle.

Automatic batch predictions for your dataset are now paused. You can turn this toggle back on at any time to resume the update schedule.

# Delete an automatic configuration
<a name="canvas-manage-automations-delete"></a>

You might want to delete a configuration to stop your automated workflow in SageMaker Canvas.

To delete a configuration for automatic dataset updates or automatic batch predictions, do the following:

1. In the left navigation pane of Canvas, choose **ML Ops**.

1. Choose the **Automations** tab.

1. Choose the **Configuration** tab.

1. Find your auto update configuration, and choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)).

1. Choose **Delete configuration**.

1. In the dialog box that pops up, choose **Delete**.

Your auto update configuration is now deleted.