# Create an AutoML job for time-series forecasting using the API
Create a time-series forecasting job using the AutoML API

Forecasting in machine learning refers to the process of predicting future outcomes or trends based on historical data and patterns. By analyzing past time-series data and identifying underlying patterns, machine learning algorithms can make predictions and provide valuable insights into future behavior. In forecasting, the goal is to develop models that can accurately capture the relationship between input variables and the target variable over time. This involves examining various factors such as trends, seasonality, and other relevant patterns within the data. The collected information is then used to train a machine learning model. The trained model is capable of generating predictions by taking new input data and applying the learned patterns and relationships. It can provide forecasts for a wide range of use cases, such as sales projections, stock market trends, weather forecasts, demand forecasting, and many more.

The following instructions show how to create an Amazon SageMaker Autopilot job as a pilot experiment for time-series forecasting problem types using SageMaker [API Reference](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-reference.html).

**Note**  
Tasks such as text and image classification, time-series forecasting, and fine-tuning of large language models are exclusively available through the version 2 of the [AutoML REST API](autopilot-reference.md). If your language of choice is Python, you can refer to [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_auto_ml_job_v2.html) or the [AutoMLV2 object](https://sagemaker.readthedocs.io/en/stable/api/training/automlv2.html#sagemaker.automl.automlv2.AutoMLV2) of the Amazon SageMaker Python SDK directly.  
Users who prefer the convenience of a user interface can use [Amazon SageMaker Canvas](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-getting-started.html) to access pre-trained models and generative AI foundation models, or create custom models tailored for specific text, image classification, forecasting needs, or generative AI.

You can create an Autopilot time-series forecasting experiment programmatically by calling the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html) API in any language supported by Amazon SageMaker Autopilot or the AWS CLI.

For information on how this API action translates into a function in the language of your choice, see the [ See Also](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html#API_CreateAutoMLJobV2_SeeAlso) section of `CreateAutoMLJobV2` and choose an SDK. As an example, for Python users, see the full request syntax of `[create\$1auto\$1ml\$1job\$1v2](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_auto_ml_job_v2)` in AWS SDK for Python (Boto3).

Autopilot trains several model candidates with your target time-series, then selects an optimal forecasting model for a given objective metric. When your model candidates have been trained, you can find the best candidate metrics in the response to `[DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html)` at `[BestCandidate](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CandidateProperties.html#sagemaker-Type-CandidateProperties-CandidateMetrics)`.

The following sections define the mandatory and optional input request parameters for the `CreateAutoMLJobV2` API used in time-series forecasting.

**Note**  
Refer to the notebook [Time-Series Forecasting with Amazon SageMaker Autopilot](https://github.com/aws/amazon-sagemaker-examples/blob/main/autopilot/autopilot_time_series.ipynb) for a practical, hands-on time-series forecasting example. In this notebook, you use Amazon SageMaker Autopilot to train a time-series model and produce predictions using the trained model. The notebook provides instructions for retrieving a ready-made dataset of tabular historical data on Amazon S3.

## Prerequisites


Before using Autopilot to create a time-series forecasting experiment in SageMaker AI, make sure to:
+ Prepare your time-series dataset. Dataset preparation involves collecting relevant data from various sources, cleaning and filtering it to remove noise and inconsistencies, and organizing it into a structured format. See [Time-series datasets format and missing values filling methods](timeseries-forecasting-data-format.md) to learn more about time-series formats requirements in Autopilot. Optionally, you can supplement your dataset with the public holiday calendar of the country of your choice to capture associated patterns. For more information on holiday calendars, see [National holiday calendars](autopilot-timeseries-forecasting-holiday-calendars.md).
**Note**  
We recommend providing at least 3-5 historical data points for each 1 future data point you want to predict. For example, to forecast 7 days ahead (horizon of 1 week) based on daily data, train your model on a minimum of 21-35 days of historical data. Make sure to provide enough data to capture seasonal and recurrent patterns. 
+ Place your time-series data in an Amazon S3 bucket.
+ Grant full access to the Amazon S3 bucket containing your input data for the SageMaker AI execution role used to run your experiment. Once this is done, you can use the ARN of this execution role in Autopilot API requests.
  + For information on retrieving your SageMaker AI execution role, see [Get your execution role](sagemaker-roles.md#sagemaker-roles-get-execution-role).
  + For information on granting your SageMaker AI execution role permissions to access one or more specific buckets in Amazon S3, see * Add Additional Amazon S3 Permissions to a SageMaker AI Execution Role* in [Create execution role](sagemaker-roles.md#sagemaker-roles-create-execution-role).

## Required parameters
Required parameters

When calling `[CreateAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html)` to create an Autopilot experiment for time-series forecasting, you must provide the following values:
+ An `[AutoMLJobName](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html#API_CreateAutoMLJobV2_RequestSyntax)` to specify the name of your job. The name should be of type `string`, and should have a minimum length of 1 character and a maximum length of 32.
+ At least one `[AutoMLJobChannel](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobChannel.html)` in `[AutoMLJobInputDataConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html#sagemaker-CreateAutoMLJobV2-request-AutoMLJobInputDataConfig)` in which you specify the name of the Amazon S3 bucket that contains your data. Optionally, you can specify the content (CSV or Parquet files) and compression (GZip) types.
+ An `[AutoMLProblemTypeConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html#sagemaker-CreateAutoMLJobV2-request-AutoMLProblemTypeConfig)` of type `[TimeSeriesForecastingJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html)` to configure the settings of your time-series forecasting job. In particular, you must specify:
  + The **frequency** of predictions, which refers to the desired granularity (hourly, daily, monthly, etc) of your forecast.

    Valid intervals are an integer followed by `Y` (Year), `M` (Month), `W` (Week), `D` (Day), `H` (Hour), and `min` (Minute). For example, `1D` indicates every day and `15min` indicates every 15 minutes. The value of a frequency must not overlap with the next larger frequency. For example, you must use a frequency of `1H` instead of `60min`.

    The valid values for each frequency are the following:
    + Minute - 1-59
    + Hour - 1-23
    + Day - 1-6
    + Week - 1-4
    + Month - 1-11
    + Year - 1
  + The **horizon** of predictions in your forecast, which refers to the number of time-steps that the model predicts. The forecast horizon is also called the prediction length. The maximum forecast horizon is the lesser of 500 time-steps or 1/4 of the time-steps in the dataset.
  + A [TimeSeriesConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesConfig.html) in which you define the schema of your dataset to map the column headers to your forecast by specifying:
    + A `TargetAttributeName`: The column that contains historical data of the target field to forecast.
    + A `TimestampAttributeName`: The column that contains a point in time at which the target value of a given item is recorded.
    + A `ItemIdentifierAttributeName`: The column that contains the item identifiers for which you want to predict the target value.

  The following is an example of those request parameters. In this example, you are setting up a daily forecast for the expected quantity or level of demand of specific items over a period of 20 days.

  ```
  "AutoMLProblemTypeConfig": { 
          "ForecastFrequency": "D",
          "ForecastHorizon": 20,
          "TimeSeriesConfig": {
              "TargetAttributeName": "demand",
              "TimestampAttributeName": "timestamp",
              "ItemIdentifierAttributeName": "item_id"
          },
  ```
+ An `[OutputDataConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLOutputDataConfig.html)` to specify the Amazon S3 output path to store the artifacts of your AutoML job.
+ A `[RoleArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJob.html#sagemaker-CreateAutoMLJob-request-RoleArn)` to specify the ARN of the role used to access your data. You can use the ARN of the execution role to which you have granted access to your data.

All other parameters are optional. For example, you can set specific forecast quantiles, choose a filling method for missing values in the dataset, or define how to aggregate data that does not align with forecast frequency. To learn how to set those additional parameters, see [Optional parameters](#timeseries-forecasting-api-optional-params).

## Optional parameters
Optional parameters

The following sections provide details of some optional parameters that you can pass to your time-series forecasting AutoML job.

### How to specify algorithms


By default, your Autopilot job trains a pre-defined list of algorithms on your dataset. However, you can provide a subset of the default selection of algorithms.

For time-series forecasting, you must choose `[TimeSeriesForecastingJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html)` as the type of `[AutoMLProblemTypeConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html#sagemaker-CreateAutoMLJobV2-request-AutoMLProblemTypeConfig)`.

Then, you can specify an array of selected `AutoMLAlgorithms` in the `AlgorithmsConfig` attribute of [CandidateGenerationConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CandidateGenerationConfig.html).

The following is an example of an `AlgorithmsConfig` attribute listing exactly three algorithms ("cnn-qr", "prophet", "arima") in its `AutoMLAlgorithms` field.

```
{
   "[AutoMLProblemTypeConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html#sagemaker-CreateAutoMLJobV2-request-AutoMLProblemTypeConfig)": {
        "[TimeSeriesForecastingJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html)": {
          "[CandidateGenerationConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CandidateGenerationConfig.html)": {
            "[AlgorithmsConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CandidateGenerationConfig.html#sagemaker-Type-CandidateGenerationConfig-AlgorithmsConfig)":[
               {"[AutoMLAlgorithms](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLAlgorithmConfig.html)":["cnn-qr", "prophet", "arima"]}
            ]
         },
       },
     },
  }
```

For the list of available algorithms for time-series forecasting, see [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLAlgorithmConfig.html#sagemaker-Type-AutoMLAlgorithmConfig-AutoMLAlgorithms](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLAlgorithmConfig.html#sagemaker-Type-AutoMLAlgorithmConfig-AutoMLAlgorithms). For details on each algorithm, see [Algorithms support for time-series forecasting](timeseries-forecasting-algorithms.md).

### How to specify custom quantiles
Specify custom quantiles

Autopilot trains 6 models candidates with your target time-series, then combines these models using a stacking ensemble method to create an optimal forecasting model for a given objective metric. Each Autopilot forecasting model generates a probabilistic forecast by producing forecasts at quantiles between P1 and P99. These quantiles are used to account for forecast uncertainty. By default, forecasts will be generated for the 0.1 (`p10`), 0.5 (`p50`), and 0.9 (`p90`). You can choose to specify your own quantiles. 

In Autopilot, you can specify up to five forecast quantiles from 0.01 (`p1`) to 0.99 (`p99`), by increments of 0.01 or higher in the `ForecastQuantiles` attribute of [TimeSeriesForecastingJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html).

In the following example, you are setting up a daily 10th, 25th, 50th, 75th, and 90th percentile forecast for the expected quantity or level of demand of specific items over a period of 20 days.

```
"AutoMLProblemTypeConfig": { 
        "ForecastFrequency": "D",
        "ForecastHorizon": 20,
        "ForecastQuantiles": ["p10", "p25", "p50", "p75", "p90"],
        "TimeSeriesConfig": {
            "TargetAttributeName": "demand",
            "TimestampAttributeName": "timestamp",
            "ItemIdentifierAttributeName": "item_id"
        },
```

### How to aggregate data for different forecast frequencies
Aggregate data

To create a forecast model (also referred to as the best model candidate from your experiment), you must specify a forecast frequency. The forecast frequency determines the frequency of predictions in your forecasts. For example, monthly sales forecasts. Autopilot best model can generate forecasts for data frequencies that are higher than the frequency at which your data is recorded.

During training, Autopilot aggregates any data that does not align with the forecast frequency you specify. For example, you might have some daily data but specify a weekly forecast frequency. Autopilot aligns the daily data based on the week that it belongs in. Autopilot then combines it into single record for each week.

During aggregation, the default transformation method is to sum the data. You can configure the aggregation when you create your AutoML job in the `Transformations` attribute of [TimeSeriesForecastingJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html). The supported aggregation methods are `sum` (default), `avg`, `first`, `min`, `max`. Aggregation is only supported for the target column.

In the following example, you configure the aggregation to calculate the average of the individual promo forecasts to provide the final aggregated forecast values.

```
"Transformations": {
            "Aggregation": {
                "promo": "avg"
            }
        }
```

### How to handle missing values in your input datasets
Handle missing values in input datasets

Autopilot provides a number of filling methods to handle missing values in the target and other numeric columns of your time-series datasets. For information on the list of supported filling methods and their available filling logic, see [Handle missing values](timeseries-forecasting-data-format.md#timeseries-missing-values).

You configure your filling strategy in the `Transformations` attribute of [TimeSeriesForecastingJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html) when creating your AutoML job.

To set a filling method, you need to provide a key-value pair:
+ The key is the name of the column for which you want to specify the filling method.
+ The value associated with the key is an object that defines the filling strategy for that column.

You can specify multiple filling methods for a single column.

To set a specific value for the filling method, you should set the fill parameter to the desired filling method value (for example `"backfill" : "value"`), and define the actual filling value in an additional parameter suffixed with "\$1value". For example, to set `backfill` to a value of `2`, you must include two parameters: `"backfill": "value"` and `"backfill_value":"2"`.

In the following example, you specify the filling strategy for the incomplete data column, "price" as follows: All missing values between the first data point of an item and the last are set to `0` after which all missing values are filled with the value `2` until the end date of the dataset.

```
"Transformations": {
            "Filling": {
                "price": {
                        "middlefill" : "zero",
                        "backfill" : "value",
                        "backfill_value": "2"
                }
            }
        }
```

### How to specify an objective metric
Specify an objective metric

Autopilot produces accuracy metrics to evaluate the model candidates and help you choose which to use to generate forecasts. When you run a time-series forecasting experiment, you can either choose AutoML to let Autopilot optimize the predictor for you, or you can manually choose an algorithm for your predictor.

By default, Autopilot uses the Average Weighted Quantile Loss. However, you can configure the objective metric when you create your AutoML job in the `MetricName` attribute of [AutoMLJobObjective](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobObjective.html).

For the list of available algorithms, see [Algorithms support for time-series forecasting](timeseries-forecasting-algorithms.md).

### How to incorporate national holiday information to your dataset
Add a national holiday calendar to your dataset

In Autopilot, you can incorporate a feature-engineered dataset of national holiday information to your time-series. Autopilot provide native support for the holiday calendars of over 250 countries. After you choose a country, Autopilot applies that country’s holiday calendar to every item in your dataset during training. This allows the model to identify patterns associated with specific holidays.

You can enable the holiday featurization when you create your AutoML job by passing an [HolidayConfigAttributes](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HolidayConfigAttributes.html) object to the `HolidayConfig` attribute of [TimeSeriesForecastingJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html). The `HolidayConfigAttributes` object contains the two letters `CountryCode` attribute that determines the country of the public national holiday calendar used to augment your time-series dataset.

Refer to [Country Codes](autopilot-timeseries-forecasting-holiday-calendars.md#holiday-country-codes) for the list of supported calendars and their corresponding country code.

### How to enable automatic deployment
Enable automatic deployment

Autopilot allows you to automatically deploy your forecast model to an endpoint. To enable automatic deployment for the best model candidate of an AutoML job, include a `[ModelDeployConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAutoMLJobV2.html#sagemaker-CreateAutoMLJobV2-request-ModelDeployConfig)` in the AutoML job request. This allows the deployment of the best model to a SageMaker AI endpoint. Below are the available configurations for customization.
+ To let Autopilotgenerate the endpoint name, set `[AutoGenerateEndpointName](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelDeployConfig.html#API_ModelDeployConfig_Contents)` to `True`.
+ To provide your own name for the endpoint, set `[AutoGenerateEndpointName](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelDeployConfig.html#API_ModelDeployConfig_Contents) to False and provide a name of your choice in [EndpointName](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelDeployConfig.html#API_ModelDeployConfig_Contents)`.

### How to configure AutoML to initiate a remote job on EMR Serverless for large datasets
Set remote compute for large datasets

You can configure your AutoML job V2 to automatically initiate a remote job on Amazon EMR Serverless when additional compute resources are needed to process large datasets. By seamlessly transitioning to EMR Serverless when required, the AutoML job can handle datasets that would otherwise exceed the initially provisioned resources, without any manual intervention from you. EMR Serverless is available for the tabular and time series problem types. We recommend setting up this option for time-series datasets larger than 30 GB.

To allow your AutoML job V2 to automatically transition to EMR Serverless for large dataset, you need to provide an `EmrServerlessComputeConfig` object, which includes an `ExecutionRoleARN` field, to the `AutoMLComputeConfig` of the AutoML job V2 input request.

The `ExecutionRoleARN` is the ARN of the IAM role granting the AutoML job V2 the necessary permissions to run EMR Serverless jobs.

This role should have the following trust relationship:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "emr-serverless.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

------

And grant the permissions to:
+ Create, list, and update EMR Serverless applications.
+ Start, list, get, or cancel job runs on an EMR Serverless application.
+ Tag EMR Serverless resources.
+ Pass an IAM role to the EMR Serverless service for execution.

  By granting the `iam:PassRole` permission, the AutoML job V2 can temporarily assume the `EMRServerlessRuntimeRole-*` role and pass it to the EMR Serverless service. These are the IAM roles used by the EMR Serverless job execution environments to access other AWS services and resources needed during runtime, such as Amazon S3 for data access, CloudWatch for logging, access to the AWS Glue Data Catalog or other services based on your workload requirements.

  See [Job runtime roles for Amazon EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html) for details on this role permissions.

The IAM policy defined in the provided JSON document grants those permissions:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [{
           "Sid": "EMRServerlessCreateApplicationOperation",
           "Effect": "Allow",
           "Action": "emr-serverless:CreateApplication",
           "Resource": "arn:aws:emr-serverless:*:*:/*",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/sagemaker:is-canvas-resource": "True",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        },
        {
            "Sid": "EMRServerlessListApplicationOperation",
            "Effect": "Allow",
            "Action": "emr-serverless:ListApplications",
            "Resource": "arn:aws:emr-serverless:*:*:/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        },
        {
            "Sid": "EMRServerlessApplicationOperations",
            "Effect": "Allow",
            "Action": [
                "emr-serverless:UpdateApplication",
                "emr-serverless:GetApplication"
            ],
            "Resource": "arn:aws:emr-serverless:*:*:/applications/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/sagemaker:is-canvas-resource": "True",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        },
        {
            "Sid": "EMRServerlessStartJobRunOperation",
            "Effect": "Allow",
            "Action": "emr-serverless:StartJobRun",
            "Resource": "arn:aws:emr-serverless:*:*:/applications/*",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/sagemaker:is-canvas-resource": "True",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        },
        {
            "Sid": "EMRServerlessListJobRunOperation",
            "Effect": "Allow",
            "Action": "emr-serverless:ListJobRuns",
            "Resource": "arn:aws:emr-serverless:*:*:/applications/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/sagemaker:is-canvas-resource": "True",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        },
        {
            "Sid": "EMRServerlessJobRunOperations",
            "Effect": "Allow",
            "Action": [
                "emr-serverless:GetJobRun",
                "emr-serverless:CancelJobRun"
            ],
            "Resource": "arn:aws:emr-serverless:*:*:/applications/*/jobruns/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/sagemaker:is-canvas-resource": "True",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        },
        {
            "Sid": "EMRServerlessTagResourceOperation",
            "Effect": "Allow",
            "Action": "emr-serverless:TagResource",
            "Resource": "arn:aws:emr-serverless:*:*:/*",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/sagemaker:is-canvas-resource": "True",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        },
        {
            "Sid": "IAMPassOperationForEMRServerless",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/EMRServerlessRuntimeRole-*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "emr-serverless.amazonaws.com",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
         }
    ]
}
```

------

# Time-series datasets format and missing values filling methods
Datasets format and missing values filling methods

Time-series data refers to a collection of observations or measurements recorded over regular intervals of time. In this type of data, each observation is associated with a specific timestamp or time period, creating a sequence of data points ordered chronologically.

The specific columns you include in your time-series dataset depend on the goals of your analysis and the data available to you. At a minimum, the time-series data is composed of a 3-column table where:
+ One column contains unique identifiers assigned to individual items to refer to their value at a specific moment.
+ Another column represents the point-in-time value or **target** to log the value of a given item at a specific moment. After the model is trained on those target values, this target column contains the values that the model predicts at a specified frequency within a defined horizon.
+ And a timestamp column is included to record the date and time when the value was measured.
+ Additional columns can contain other factors that may influence the forecast performance. For example, in a time-series dataset for retail where the target is the sales or revenue, you might include features that provide information about units sold, product ID, store location, customer count, inventory levels, as well as covariate indicators such as weather data or demographic information.

**Note**  
You can add a feature-engineered dataset of national holiday information to your time-series. By including holidays in your time series model, you can capture the periodic patterns that holidays create. This helps your forecasts better reflect the underlying seasonality of your data. For information on the available calendars per country, see [National holiday calendars](autopilot-timeseries-forecasting-holiday-calendars.md)

## Datasets format for time-series forecasting
Datasets format

Autopilot supports numeric, categorical, text, and datetime data types. The data type of the target column must be numeric.

Autopilot supports time-series data formatted as CSV (default) files or as Parquet files.
+ **CSV** (comma-separated-values) is a row-based file format that stores data in human readable plaintext which a popular choice for data exchange as they are supported by a wide range of applications.
+ **Parquet** is a column-based file format where the data is stored and processed more efficiently than row-based file formats. This makes them a better option for big data problems.

For more information about the resource limits on time-series datasets for forecasting in Autopilot, see [Time-series forecasting resource limits for Autopilot](timeseries-forecasting-limits.md).

## Handle missing values
Handle missing values

A common issue in time-series forecasting data is the presence of missing values. Your data might contain missing values for a number of reasons, including measurement failures, formatting problems, human errors, or a lack of information to record. For instance, if you are forecasting product demand for a retail store and an item is sold out or unavailable, there would be no sales data to record while that item is out of stock. If prevalent enough, missing values can significantly impact a model's accuracy.

Autopilot provides a number of filling methods to handle missing values, with distinct approaches for the target column and other additional columns. Filling is the process of adding standardized values to missing entries in your dataset.

Refer to [How to handle missing values in your input datasets](autopilot-create-experiment-timeseries-forecasting.md#timeseries-forecasting-fill-missing-values) to learn how to set the method for filling missing values in your time-series dataset.

Autopilot supports the following filling methods:
+ **Front filling:** Fills any missing values between the earliest recorded data point among all items and the starting point of each item (each item can start at a different time). This ensures that the data for each item is complete and spans from the earliest recorded data point to its respective starting point.
+ **Middle filling:** Fills any missing values between the start and end dates of the items in the dataset.
+ **Back filling:** Fills any missing values between the last data point of each item (each item can stop at a different time) and the last recorded data point among all items.
+ **Future filling:** Fills any missing values between the last recorded data point among all items and the end of the forecast horizon.

The following image provides a visual representation of the different filling methods.

![\[The different filling methods for time series forecasting in Amazon SageMaker Autopilot.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/autopilot/autopilot-forecast-filling-methods.png)


### Choose a filling logic
Choose a filling logic

When choosing a filling logic, you should consider how the logic will be interpreted by your model. For instance, in a retail scenario, recording 0 sales of an available item is different from recording 0 sales of an unavailable item, as the latter does not imply a lack of customer interest in the item. Because of this, `0` filling in the target column of the time-series might cause the predictor to be under-biased in its predictions, while `NaN` filling might ignore actual occurrences of 0 available items being sold and cause the predictor to be over-biased.

### Filling logic
Filling logic

You can perform filling on the target column and other numeric columns in your datasets. Target columns have different filling guidelines and restrictions than the rest of the numeric columns.

Filling Guidelines


| Column type | Filling by default? | Supported filling methods | Default filling logic | Accepted filling logic | 
| --- | --- | --- | --- | --- | 
| Target column | Yes | Middle and back filling | 0 |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/timeseries-forecasting-data-format.html)  | 
| Other numeric columns | No | Middle, back, and future filling | No default |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/timeseries-forecasting-data-format.html)  | 

**Note**  
For both the target and other numeric columns, `mean`, `median`, `min`, and `max` are calculated based on a rolling window of the 64 most recent data entries before the missing values.

# National holiday calendars
National Holiday Calendars

Autopilot supports a feature-engineered dataset of national holiday information that provides access to the holiday calendars of over 250 countries. Holiday calendar features are especially useful in the retail domain, where public holidays can significantly affect demand. The following section lists the country codes that you can use to access the holiday calendars of each supported country.

Refer to [How to incorporate national holiday information to your dataset](autopilot-create-experiment-timeseries-forecasting.md#timeseries-forecasting-add-holiday-calendar) to learn how to add a calendar to your dataset.

## Country Codes


Autopilot provides native support for the public holiday calendars of the following countries. Use the **Country Code** when specifying a country with the API.


| Country | Country Code | 
| --- | --- | 
|   Afghanistan   |   AF   | 
|   Åland Islands   |   AX   | 
|   Albania   |   AL   | 
|   Algeria   |   DZ   | 
|   American Samoa   |   AS   | 
|   Andorra   |   AD   | 
|   Angola   |   AO   | 
|   Anguilla   |   AI   | 
|   Antartica   |   AQ   | 
|   Antigua and Barbuda   |   AG   | 
|   Argentina   |   AR   | 
|   Armenia   |   AM   | 
|   Aruba   |   AW   | 
|   Australia   |   AU   | 
|   Austria   |   AT   | 
|   Azerbaijan   |   AZ   | 
|   Bahamas   |   BS   | 
|   Bahrain   |   BH   | 
|   Bangladesh   |   BD   | 
|   Barbados   |   BB   | 
|   Belarus   |   BY   | 
|   Belgium   |   BE   | 
|   Belize   |   BZ   | 
|   Benin   |   BJ   | 
|   Bermuda   |   BM   | 
|   Bhutan   |   BT   | 
|   Bolivia   |   BO   | 
|   Bosnia and Herzegovina   |   BA   | 
|   Botswana   |   BW   | 
|   Bouvet Island   |   BV   | 
|   Brazil   |   BR   | 
|   British Indian Ocean Territory   |   IO   | 
|   British Virgin Islands   |   VG   | 
|   Brunei Darussalam   |   BN   | 
|   Bulgaria   |   BG   | 
|   Burkina Faso   |   BF   | 
|   Burundi   |   BI   | 
|   Cambodia   |   KH   | 
|   Cameroon   |   CM   | 
|   Canada   |   CA   | 
|   Cape Verde   |   CV   | 
|   Caribbean Netherlands   |   BQ   | 
|   Cayman Islands   |   KY   | 
|   Central African Republic   |   CF   | 
|   Chad   |   TD   | 
|   Chile   |   CL   | 
|   China   |   CN   | 
|   Christmas Island   |   CX   | 
|   Cocos (Keeling) Islands   |   CC   | 
|   Colombia   |   CO   | 
|   Comoros   |   KM   | 
|   Cook Islands   |   CK   | 
|   Costa Rica   |   CR   | 
|   Croatia   |   HR   | 
|   Cuba   |   CU   | 
|   Curaçao   |   CW   | 
|   Cyprus   |   CY   | 
|   Czechia   |   CZ   | 
|   Democratic Republic of the Congo   |   CD   | 
|   Denmark   |   DK   | 
|   Djibouti   |   DJ   | 
|   Dominica   |   DM   | 
|   Dominican Republic   |   DO   | 
|   Ecuador   |   EC   | 
|   Egypt   |   EG   | 
|   El Salvador   |   SV   | 
|   Equatorial Guinea   |   GQ   | 
|   Eritrea   |   ER   | 
|   Estonia   |   EE   | 
|   Eswatini   |   SZ   | 
|   Ethiopia   |   ET   | 
|   Falkland Islands   |   FK   | 
|   Faroe Islands   |   FO   | 
|   Fiji   |   FJ   | 
|   Finland   |   FI   | 
|   France   |   FR   | 
|   French Guiana   |   GF   | 
|   French Polynesia   |   PF   | 
|   French Southern Territories   |   TF   | 
|   Gabon   |   GA   | 
|   Gambia   |   GM   | 
|   Georgia   |   GE   | 
|   Germany   |   DE   | 
|   Ghana   |   GH   | 
|   Gibraltar   |   GI   | 
|   Greece   |   GR   | 
|   Greenland   |   GL   | 
|   Grenada   |   GD   | 
|   Guadeloupe   |   GP   | 
|   Guam   |   GU   | 
|   Guatemala   |   GT   | 
|   Guernsey   |   GG   | 
|   Guinea   |   GN   | 
|   Guinea-Bissau   |   GW   | 
|   Guyana   |   GY   | 
|   Haiti   |   HT   | 
|   Heard Island and McDonald Islands   |   HM   | 
|   Honduras   |   HN   | 
|   Hong Kong   |   HK   | 
|   Hungary   |   HU   | 
|   Iceland   |   IS   | 
|   India   |   IN   | 
|   Indonesia   |   ID   | 
|   Iran   |   IR   | 
|   Iraq   |   IQ   | 
|   Ireland   |   IE   | 
|   Isle of Man   |   IM   | 
|   Israel   |   IL   | 
|   Italy   |   IT   | 
|   Ivory Coast   |   CI   | 
|   Jamaica   |   JM   | 
|   Japan   |   JP   | 
|   Jersey   |   JE   | 
|   Jordan   |   JO   | 
|   Kazakhstan   |   KZ   | 
|   Kenya   |   KE   | 
|   Kiribati   |   KI   | 
|   Kosovo   |   XK   | 
|   Kuwait   |   KW   | 
|   Kyrgyzstan   |   KG   | 
|   Laos   |   LA   | 
|   Latvia   |   LV   | 
|   Lebanon   |   LB   | 
|   Lesotho   |   LS   | 
|   Liberia   |   LR   | 
|   Libya   |   LY   | 
|   Liechtenstein   |   LI   | 
|   Lithuania   |   LT   | 
|   Luxembourg   |   LU   | 
|   Macao   |   MO   | 
|   Madagascar   |   MG   | 
|   Malawi   |   MW   | 
|   Malaysia   |   MY   | 
|   Maldives   |   MV   | 
|   Mali   |   ML   | 
|   Malta   |   MT   | 
|   Marshall Islands   |   MH   | 
|   Martinique   |   MQ   | 
|   Mauritania   |   MR   | 
|   Mauritius   |   MU   | 
|   Mayotte   |   YT   | 
|   Mexico   |   MX   | 
|   Micronesia   |   FM   | 
|   Moldova   |   MD   | 
|   Monaco   |   MC   | 
|   Mongolia   |   MN   | 
|   Montenegro   |   ME   | 
|   Montserrat   |   MS   | 
|   Morocco   |   MA   | 
|   Mozambique   |   MZ   | 
|   Myanmar   |   MM   | 
|   Namibia   |   NA   | 
|   Nauru   |   NR   | 
|   Nepal   |   NP   | 
|   Netherlands   |   NL   | 
|   New Caledonia   |   NC   | 
|   New Zealand   |   NZ   | 
|   Nicaragua   |   NI   | 
|   Niger   |   NE   | 
|   Nigeria   |   NG   | 
|   Niue   |   NU   | 
|   Norfolk Island   |   NF   | 
|   North Korea   |   KP   | 
|   North Macedonia   |   MK   | 
|   Northern Mariana Islands   |   MP   | 
|   Norway   |   NO   | 
|   Oman   |   OM   | 
|   Pakistan   |   PK   | 
|   Palau   |   PW   | 
|   Palestine   |   PS   | 
|   Panama   |   PA   | 
|   Papua New Guinea   |   PG   | 
|   Paraguay   |   PY   | 
|   Peru   |   PE   | 
|   Philippines   |   PH   | 
|   Pitcairn Islands   |   PN   | 
|   Poland   |   PL   | 
|   Portugal   |   PT   | 
|   Puerto Rico   |   PR   | 
|   Qatar   |   QA   | 
|   Republic of the Congo   |   CG   | 
|   Réunion   |   RE   | 
|   Romania   |   RO   | 
|   Russian Federation   |   RU   | 
|   Rwanda   |   RW   | 
|   Saint Barthélemy   |   BL   | 
|   "Saint Helena, Ascension and Tristan da Cunha "   |   SH   | 
|   Saint Kitts and Nevis   |   KN   | 
|   Saint Lucia   |   LC   | 
|   Saint Martin   |   MF   | 
|   Saint Pierre and Miquelon   |   PM   | 
|   Saint Vincent and the Grenadines   |   VC   | 
|   Samoa   |   WS   | 
|   San Marino   |   SM   | 
|   Sao Tome and Principe   |   ST   | 
|   Saudi Arabia   |   SA   | 
|   Senegal   |   SN   | 
|   Serbia   |   RS   | 
|   Seychelles   |   SC   | 
|   Sierra Leone   |   SL   | 
|   Singapore   |   SG   | 
|   Sint Maarten   |   SX   | 
|   Slovakia   |   SK   | 
|   Slovenia   |   SI   | 
|   Solomon Islands   |   SB   | 
|   Somalia   |   SO   | 
|   South Africa   |   ZA   | 
|   South Georgia and the South Sandwich Islands   |   GS   | 
|   South Korea   |   KR   | 
|   South Sudan   |   SS   | 
|   Spain   |   ES   | 
|   Sri Lanka   |   LK   | 
|   Sudan   |   SD   | 
|   Suriname   |   SR   | 
|   Svalbard and Jan Mayen   |   SJ   | 
|   Sweden   |   SE   | 
|   Switzerland   |   CH   | 
|   Syrian Arab Republic   |   SY   | 
|   Taiwan   |   TW   | 
|   Tajikistan   |   TJ   | 
|   Tanzania   |   TZ   | 
|   Thailand   |   TH   | 
|   Timor-Leste   |   TL   | 
|   Togo   |   TG   | 
|   Tokelau   |   TK   | 
|   Tonga   |   TO   | 
|   Trinidad and Tobago   |   TT   | 
|   Tunisia   |   TN   | 
|   Turkey   |   TR   | 
|   Turkmenistan   |   TM   | 
|   Turks and Caicos Islands   |   TC   | 
|   Tuvalu   |   TV   | 
|   Uganda   |   UG   | 
|   Ukraine   |   UA   | 
|   United Arab Emirates   |   AE   | 
|   United Kingdom   |   UK   | 
|   United Nations   |   UN   | 
|   United States   |   US   | 
|   United States Minor Outlying Islands   |   UM   | 
|   United States Virgin Islands   |   VI   | 
|   Uruguay   |   UY   | 
|   Uzbekistan   |   UZ   | 
|   Vanuatu   |   VU   | 
|   Vatican City   |   VA   | 
|   Venezuela   |   VE   | 
|   Vietnam   |   VN   | 
|   Wallis and Futuna   |   WF   | 
|   Western Sahara   |   EH   | 
|   Yemen   |   YE   | 
|   Zambia   |   ZM   | 
|   Zimbabwe   |   ZW   | 

# Objective metrics


Autopilot produces accuracy metrics to evaluate the model candidates and help you choose which to use to generate forecasts. You can either let Autopilot optimize the predictor for you, or you can manually choose an algorithm for your predictor. By default, Autopilot uses the Average Weighted Quantile Loss.

The following list contains the names of the metrics that are currently available to measure the performance of models for time-series forecasting.

**`RMSE`**  
Root mean squared error (RMSE) – Measures the square root of the squared difference between predicted and actual values, and is averaged over all values. It's an important metric to indicate the presence of large model errors and outliers. Values range from zero (0) to infinity, with smaller numbers indicating a better model fit to the data. RMSE is dependent on scale, and should not be used to compare datasets of different sizes.

**`wQL`**  
Weighted Quantile Loss (wQL) – Assess the accuracy of the forecast by measuring the weighted absolute differences between predicted and actual P10, P50, and P90 quantiles with lower values indicating better performance.

**`Average wQL (default)`**  
Average Weighted Quantile Loss (Average wQL) – Evaluates the forecast by averaging the accuracy at the P10, P50, and P90 quantiles. A lower value indicates a more accurate model.

**`MASE`**  
Mean Absolute Scaled Error (MASE) – The mean absolute error of the forecast normalized by the mean absolute error of a simple baseline forecasting method. A lower value indicates a more accurate model, where MASE < 1 is estimated to be better than the baseline and MASE > 1 is estimated to be worse than the baseline.

**`MAPE`**  
Mean Absolute Percent Error (MAPE) – The percentage error (percent difference of the mean forecasted value versus the actual value) averaged over all time points. A lower value indicates a more accurate model, where MAPE = 0 is a model with no errors.

**`WAPE`**  
Weighted Absolute Percent Error (WAPE) – The sum of the absolute error normalized by the sum of the absolute target, which measure the overall deviation of forecasted values from observed values. A lower value indicates a more accurate model.

# Algorithms support for time-series forecasting
Algorithms

Autopilot trains the following six built-in algorithms with your target time-series. Then, using a stacking ensemble method, it combines these model candidates to create an optimal forecasting model for a given objective metric.
+ **Convolutional Neural Network - Quantile Regression (CNN-QR)** – CNN-QR is a proprietary machine learning algorithm for forecasting time-series using causal convolutional neural networks (CNNs). CNN-QR works best with large datasets containing hundreds of time-series.
+ **DeepAR\$1** – DeepAR\$1 is a proprietary machine learning algorithm for forecasting time-series using recurrent neural networks (RNNs). DeepAR\$1 works best with large datasets containing hundreds of feature time-series.
+ **Prophet** – [Prophet](https://facebook.github.io/prophet/) is a popular local Bayesian structural time series model based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality. The Autopilot Prophet algorithm uses the [Prophet class](https://facebook.github.io/prophet/docs/quick_start.html#python-ap) of the Python implementation of Prophet. It works best with time-series with strong seasonal effects and several seasons of historical data. 
+ **Non-Parametric Time Series (NPTS)** – The NPTS proprietary algorithm is a scalable, probabilistic baseline forecaster. It predicts the future value distribution of a given time-series by sampling from past observations. NPTS is especially useful when working with sparse or intermittent time series. 
+ **Autoregressive Integrated Moving Average (ARIMA)** – ARIMA is a commonly used statistical algorithm for time-series forecasting. The algorithm captures standard temporal structures (patterned organizations of time) in the input dataset. It is especially useful for simple datasets with under 100 time series. 
+ **Exponential Smoothing (ETS)** – ETS is a commonly used statistical algorithm for time-series forecasting. The algorithm is especially useful for simple datasets with under 100 time series, and datasets with seasonality patterns. ETS computes a weighted average over all observations in the time series dataset as its prediction, with exponentially decreasing weights over time.

# Forecast a deployed Autopilot model
Model deployment and forecasts

After training your models using the AutoML API, you can deploy them for real-time or batch-based forecasting. 

The AutoML API trains several model candidates for your time-series data and selects an optimal forecasting model based on your target objective metric. Once your model candidates have been trained, you can find the best candidate in the response [DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html) at [BestCandidate](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLCandidate.html#sagemaker-Type-AutoMLCandidate-CandidateName).

To get predictions using this best performing model, you can either set up an endpoint to obtain forecasts interactively or use batch forecasting to make predictions on a batch of observations.

**Considerations**
+ When providing input data for forecasting, the schema of your data should remain the same as the one used to train your model, including the number of columns, column headers, and data types. You can forecast for existing or new item IDs within the same or different timestamp range to predict for a different time period.
+ Forecasting models predict for the forecast horizon points in the future specified in the input request at training, which is from the *target end date* to the *target end date \$1 forecast horizon*. To use the model for predicting specific dates, you should provide the data in the same format as the original input data, extending up to a specified *target end date*. In this scenario, the model will start predicting from the new target end date.

  For example, if your dataset had monthly data from January to June with a Forecast horizon of 2, then the model would predict the target value for the next 2 months, which would be July and August. If in August, you want to predict for the next 2 months, this time your input data should be from January to August and the model will predict for the next 2 months (September, October).
+ When forecasting future data points, there is no set minimum for the amount of historical data to provide. Include enough data to capture seasonal and recurrent patterns in your time-series.

**Topics**
+ [

# Real-time forecasting
](timeseries-forecasting-realtime.md)
+ [

# Batch forecasting
](timeseries-forecasting-batch.md)

# Real-time forecasting
Real-time forecasting

Real-time forecasting is useful when you need to generate predictions on-the-fly, such as for applications that require immediate responses or when forecasting for individual data points.

By deploying your AutoML model as a real-time endpoint, you can generate forecasts on-demand and minimize the latency between receiving new data and obtaining predictions. This makes real-time forecasting well-suited for applications that require immediate, personalized, or event-driven forecasting capabilities.

For real time forecasting, the dataset should be a subset of the input dataset. The real time endpoint has an input data size of approximately 6MB and a response timeout limitation of 60 seconds. We recommend bringing in one or few items at a time.

You can use SageMaker APIs to retrieve the best candidate of an AutoML job and then create a SageMaker AI endpoint using that candidate.

Alternatively, you can chose the automatic deployment option when creating your Autopilot experiment. For information on setting up the automatic deployment of models, see [How to enable automatic deployment](autopilot-create-experiment-timeseries-forecasting.md#timeseries-forecasting-auto-model-deployment).

**To create a SageMaker AI endpoint using your best model candidate:**

1. 

**Retrieve the details of the AutoML job.**

   The following AWS CLI command example uses the [DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html) API to obtain details of the AutoML job, including the information about the best model candidate.

   ```
   aws sagemaker describe-auto-ml-job-v2 --auto-ml-job-name job-name --region region
   ```

1. 

**Extract the container definition from [InferenceContainers](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLCandidate.html#sagemaker-Type-AutoMLCandidate-InferenceContainers) for the best model candidate.**

   A container definition is the containerized environment used to host the trained SageMaker AI model for making predictions.

   ```
   BEST_CANDIDATE=$(aws sagemaker describe-auto-ml-job-v2 \
     --auto-ml-job-name job-name 
     --region region \
     --query 'BestCandidate.InferenceContainers[0]' \
     --output json
   ```

   This command extracts the container definition for the best model candidate and stores it in the `BEST_CANDIDATE` variable.

1. 

**Create a SageMaker AI model using the best candidate container definition.**

   Use the container definitions from the previous steps to create a SageMaker AI model by using the [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) API.

   ```
   aws sagemaker create-model \
               --model-name 'your-candidate-name>' \
               --primary-container "$BEST_CANDIDATE"
               --execution-role-arn 'execution-role-arn>' \
               --region 'region>
   ```

   The `--execution-role-arn` parameter specifies the IAM role that SageMaker AI assumes when using the model for inference. For details on the permissions required for this role, see [CreateModel API: Execution Role Permissions](https://docs.aws.amazon.com/).

1. 

**Create a SageMaker AI endpoint configuration using the model.**

   The following AWS CLI command uses the [CreateEndpointConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) API to create an endpoint configuration.

   ```
   aws sagemaker create-endpoint-config \
     --production-variants file://production-variants.json \
     --region 'region'
   ```

   Where the `production-variants.json` file contains the model configuration, including the model name and instance type.
**Note**  
We recommend using [m5.12xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instances for real-time forecasting.

   ```
   [
       {
         "VariantName": "variant-name",
         "ModelName": "model-name",
         "InitialInstanceCount": 1,
         "InstanceType": "m5.12xlarge"
       }
     ]
   }
   ```

1. 

**Create the SageMaker AI endpoint using the endpoint configuration.**

   The following AWS CLI example uses the [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) API to create the endpoint.

   ```
   aws sagemaker create-endpoint \
               --endpoint-name 'endpoint-name>' \
               --endpoint-config-name 'endpoint-config-name' \
               --region 'region'
   ```

   Check the progress of your real-time inference endpoint deployment by using the [DescribeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeEndpoint.html) API. See the following AWS CLI command as an example.

   ```
   aws sagemaker describe-endpoint \
               --endpoint-name 'endpoint-name' \
               --region 'region'
   ```

   After the `EndpointStatus` changes to `InService`, the endpoint is ready to use for real-time inference.

1. 

**Invoke the SageMaker AI endpoint to make predictions.**

   ```
   aws sagemaker invoke-endpoint \
               --endpoint-name 'endpoint-name' \ 
               --region 'region' \
               --body file://input-data-in-bytes.json \
               --content-type 'application/json' outfile
   ```

   Where the `input-data-in-bytes.json` file contains the input data for the prediction.

# Batch forecasting
Batch forecasting

Batch forecasting, also known as offline inferencing, generates model predictions on a batch of observations. Batch inference is a good option for large datasets or if you don't need an immediate response to a model prediction request.

By contrast, online inference (real-time inferencing) generates predictions in real time. 

You can use SageMaker APIs to retrieve the best candidate of an AutoML job and then submit a batch of input data for inference using that candidate.

1. 

**Retrieve the details of the AutoML job.**

   The following AWS CLI command example uses the [DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html) API to obtain details of the AutoML job, including the information about the best model candidate.

   ```
   aws sagemaker describe-auto-ml-job-v2 --auto-ml-job-name job-name --region region
   ```

1. 

**Extract the container definition from [InferenceContainers](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLCandidate.html#sagemaker-Type-AutoMLCandidate-InferenceContainers) for the best model candidate.**

   A container definition is the containerized environment used to host the trained SageMaker AI model for making predictions.

   ```
   BEST_CANDIDATE=$(aws sagemaker describe-auto-ml-job-v2 \
         --auto-ml-job-name job-name 
         --region region \
         --query 'BestCandidate.InferenceContainers[0]' \
         --output json
   ```

   This command extracts the container definition for the best model candidate and stores it in the `BEST_CANDIDATE` variable.

1. 

**Create a SageMaker AI model using the best candidate container definition.**

   Use the container definitions from the previous steps to create a SageMaker AI model by using the [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) API.

   ```
   aws sagemaker create-model \
         --model-name 'model-name' \
         --primary-container "$BEST_CANDIDATE"
         --execution-role-arn 'execution-role-arn>' \
         --region 'region>
   ```

   The `--execution-role-arn` parameter specifies the IAM role that SageMaker AI assumes when using the model for inference. For details on the permissions required for this role, see [CreateModel API: Execution Role Permissions](https://docs.aws.amazon.com/).

1. 

**Create a batch transform job.**

   The following example creates a transform job using the [CreateTransformJob](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-transform-job.html) API. 

   ```
   aws sagemaker create-transform-job \ 
          --transform-job-name 'transform-job-name' \
          --model-name 'model-name'\
          --transform-input file://transform-input.json \
          --transform-output file://transform-output.json \
          --transform-resources file://transform-resources.json \
          --region 'region'
   ```

   The input, output, and resource details are defined in separate JSON files:
   + `transform-input.json`:

     ```
     {
       "DataSource": {
         "S3DataSource": {
           "S3DataType": "S3Prefix",
           "S3Uri": "s3://my-input-data-bucket/path/to/input/data"
         }
       },
       "ContentType": "text/csv",
       "SplitType": "None"
     }
     ```
   + `transform-output.json`:

     ```
     {
       "S3OutputPath": "s3://my-output-bucket/path/to/output",
       "AssembleWith": "Line"
     }
     ```
   + `transform-resources.json`:
**Note**  
We recommend using [m5.12xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instances for general-purpose workloads and `m5.24xlarge` instances for big data forecasting tasks.

     ```
     {
       "InstanceType": "instance-type",
       "InstanceCount": 1
     }
     ```

1. 

**Monitor the progress of your transform job using the [DescribeTransformJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTransformJob.html) API.**

   See the following AWS CLI command as an example.

   ```
   aws sagemaker describe-transform-job \
         --transform-job-name 'transform-job-name' \
         --region region
   ```

1. 

**Retrieve the batch transform output.**

   After the job is finished, the predicted result is available in the `S3OutputPath`. 

   The output file name has the following format: `input_data_file_name.out`. As an example, if your input file is `text_x.csv`, the output name will be `text_x.csv.out`.

   ```
   aws s3 ls s3://my-output-bucket/path/to/output/
   ```

The following code examples illustrate the use of the AWS SDK for Python (boto3) and the AWS CLI for batch forecasting.

------
#### [ AWS SDK for Python (boto3) ]

 The following example uses **AWS SDK for Python (boto3)** to make predictions in batches.

```
import sagemaker 
import boto3

session = sagemaker.session.Session()

sm_client = boto3.client('sagemaker', region_name='us-west-2')
role = 'arn:aws:iam::1234567890:role/sagemaker-execution-role'
output_path = 's3://test-auto-ml-job/output'
input_data = 's3://test-auto-ml-job/test_X.csv'

best_candidate = sm_client.describe_auto_ml_job_v2(AutoMLJobName=job_name)['BestCandidate']
best_candidate_containers = best_candidate['InferenceContainers']
best_candidate_name = best_candidate['CandidateName']

# create model
reponse = sm_client.create_model(
    ModelName = best_candidate_name,
    ExecutionRoleArn = role,
    Containers = best_candidate_containers 
)

# Lauch Transform Job
response = sm_client.create_transform_job(
    TransformJobName=f'{best_candidate_name}-transform-job',
    ModelName=model_name,
    TransformInput={
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'S3Prefix',
                'S3Uri': input_data
            }
        },
        'ContentType': "text/csv",
        'SplitType': 'None'
    },
    TransformOutput={
        'S3OutputPath': output_path,
        'AssembleWith': 'Line',
    },
    TransformResources={
        'InstanceType': 'ml.m5.2xlarge',
        'InstanceCount': 1,
    },
)
```

The batch inference job returns a response in the following format.

```
{'TransformJobArn': 'arn:aws:sagemaker:us-west-2:1234567890:transform-job/test-transform-job',
 'ResponseMetadata': {'RequestId': '659f97fc-28c4-440b-b957-a49733f7c2f2',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '659f97fc-28c4-440b-b957-a49733f7c2f2',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '96',
   'date': 'Thu, 11 Aug 2022 22:23:49 GMT'},
  'RetryAttempts': 0}}
```

------
#### [ AWS Command Line Interface (AWS CLI) ]

1. **Obtain the best candidate container definitions**.

   ```
   aws sagemaker describe-auto-ml-job-v2 --auto-ml-job-name 'test-automl-job' --region us-west-2
   ```

1. **Create the model**.

   ```
   aws sagemaker create-model --model-name 'test-sagemaker-model'
   --containers '[{
       "Image": "348316444620.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3",
       "ModelDataUrl": "s3://amzn-s3-demo-bucket/out/test-job1/data-processor-models/test-job1-dpp0-1-e569ff7ad77f4e55a7e549a/output/model.tar.gz",
       "Environment": {
           "AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF": "1",
           "AUTOML_TRANSFORM_MODE": "feature-transform",
           "SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "application/x-recordio-protobuf",
           "SAGEMAKER_PROGRAM": "sagemaker_serve",
           "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
       }
   }, {
       "Image": "348316444620.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3",
       "ModelDataUrl": "s3://amzn-s3-demo-bucket/out/test-job1/tuning/flicdf10v2-dpp0-xgb/test-job1E9-244-7490a1c0/output/model.tar.gz",
       "Environment": {
           "MAX_CONTENT_LENGTH": "20971520",
           "SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv",
           "SAGEMAKER_INFERENCE_OUTPUT": "predicted_label", 
           "SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,probabilities" 
       }
   }, {
       "Image": "348316444620.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3", 
       "ModelDataUrl": "s3://amzn-s3-demo-bucket/out/test-job1/data-processor-models/test-job1-dpp0-1-e569ff7ad77f4e55a7e549a/output/model.tar.gz", 
       "Environment": { 
           "AUTOML_TRANSFORM_MODE": "inverse-label-transform", 
           "SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv", 
           "SAGEMAKER_INFERENCE_INPUT": "predicted_label", 
           "SAGEMAKER_INFERENCE_OUTPUT": "predicted_label", 
           "SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,labels,probabilities", 
           "SAGEMAKER_PROGRAM": "sagemaker_serve", 
           "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code" 
       } 
   }]' \
   --execution-role-arn 'arn:aws:iam::1234567890:role/sagemaker-execution-role' \
   --region 'us-west-2'
   ```

1. **Create a transform job**.

   ```
   aws sagemaker create-transform-job --transform-job-name 'test-tranform-job'\
    --model-name 'test-sagemaker-model'\
    --transform-input '{
           "DataSource": {
               "S3DataSource": {
                   "S3DataType": "S3Prefix",
                   "S3Uri": "s3://amzn-s3-demo-bucket/data.csv"
               }
           },
           "ContentType": "text/csv",
           "SplitType": "None"
       }'\
   --transform-output '{
           "S3OutputPath": "s3://amzn-s3-demo-bucket/output/",
           "AssembleWith": "Line"
       }'\
   --transform-resources '{
           "InstanceType": "ml.m5.2xlarge",
           "InstanceCount": 1
       }'\
   --region 'us-west-2'
   ```

1. **Check the progress of the transform job**. 

   ```
   aws sagemaker describe-transform-job --transform-job-name  'test-tranform-job' --region us-west-2
   ```

   The following is the response from the transform job.

   ```
   {
       "TransformJobName": "test-tranform-job",
       "TransformJobArn": "arn:aws:sagemaker:us-west-2:1234567890:transform-job/test-tranform-job",
       "TransformJobStatus": "InProgress",
       "ModelName": "test-model",
       "TransformInput": {
           "DataSource": {
               "S3DataSource": {
                   "S3DataType": "S3Prefix",
                   "S3Uri": "s3://amzn-s3-demo-bucket/data.csv"
               }
           },
           "ContentType": "text/csv",
           "CompressionType": "None",
           "SplitType": "None"
       },
       "TransformOutput": {
           "S3OutputPath": "s3://amzn-s3-demo-bucket/output/",
           "AssembleWith": "Line",
           "KmsKeyId": ""
       },
       "TransformResources": {
           "InstanceType": "ml.m5.2xlarge",
           "InstanceCount": 1
       },
       "CreationTime": 1662495635.679,
       "TransformStartTime": 1662495847.496,
       "DataProcessing": {
           "InputFilter": "$",
           "OutputFilter": "$",
           "JoinSource": "None"
       }
   }
   ```

   After the `TransformJobStatus` changes to `Completed`, you can check the inference result in the `S3OutputPath`.

------

# Amazon SageMaker Autopilot data exploration notebook
Data exploration notebook

Amazon SageMaker Autopilot cleans and pre-processes your dataset automatically. To help users understand their data, uncover patterns, relationships, and anomalies about the time-series, Amazon SageMaker Autopilot generates a **data exploration** static report in the form of a notebook for users to reference.

The data exploration notebook is generated for every Autopilot job. The report is stored in an Amazon S3 bucket and can be accessed from the job output path.

You can find the Amazon S3 prefix to the data exploration notebook in the response to `[DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html)` at `[AutoMLJobArtifacts.DataExplorationNotebookLocation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html#sagemaker-DescribeAutoMLJobV2-response-AutoMLJobArtifacts)`.

# Reports generated by Amazon SageMaker Autopilot
Reports generated

In addition to the data exploration notebook, Autopilot generates various reports for the best model candidate of each experiment.
+ An explainability report provides insights into how the model makes forecasts. 
+ A performance report provides a quantitative assessment of the model's forecasting capabilities.
+ A backtest results report is generated after testing the model's performance on historical data. 

## Explainability report
Explainability report

Autopilot explainability report helps you better understand how the attributes in your datasets impact forecasts for specific time-series (item and dimension combinations) and time points. Autopilot uses a metric called *Impact scores* to quantify the relative impact of each attribute and determine whether they increase or decrease forecast values.

For example, consider a forecasting scenario where the target is `sales` and there are two related attributes: `price` and `color`. Autopilot may find that the item’s color has a high impact on sales for certain items, but a negligible effect for other items. It may also find that a promotion in the summer has a high impact on sales, but a promotion in the winter has little effect.

The explainability report is generated only when:
+ The time series dataset includes additional feature columns or is associated with a holiday calendar.
+ The base models CNN-QR and DeepAR\$1 are included in the final ensemble.

### Interpret Impact scores
Interpret impact scores

Impact scores measure the relative impact attributes have on forecast values. For example, if the `price` attribute has an impact score that is twice as large as the `store location` attribute, you can conclude that the price of an item has twice the impact on forecast values than the store location.

Impact scores also provide information on whether attributes increase or decrease forecast values.

The Impact scores range from -1 to 1, where the sign denotes the direction of the impact. A score of 0 indicates no impact, while scores close to 1 or -1 indicate a significant impact.

It is important to note that Impact scores measure the relative impact of attributes, not the absolute impact. Therefore, Impact scores cannot be used to determine whether particular attributes improve model accuracy. If an attribute has a low Impact score, that does not necessarily mean that it has a low impact on forecast values; it means that it has a lower impact on forecast values than other attributes used by the predictor.

### Find the explainability report
Find the explainability Report

You can find the Amazon S3 prefix to the explainability artifacts generated for the best candidate in the response to `[DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html)` at `[BestCandidate.CandidateProperties.CandidateArtifactLocations.Explainability](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CandidateArtifactLocations.html#sagemaker-Type-CandidateArtifactLocations-Explainability)`.

## Model performance report
Model performance report

Autopilot model quality report (also referred to as performance report) provides insights and quality information for the best model candidate (best predictor) generated by an AutoML job. This includes information about the job details, objective function, and accuracy metrics (`wQL`, `MAPE`, `WAPE`, `RMSE`, `MASE`).

You can find the Amazon S3 prefix to the model quality report artifacts generated for the best candidate in the response to `[DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html)` at `[BestCandidate.CandidateProperties.CandidateArtifactLocations.ModelInsights](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CandidateArtifactLocations.html#sagemaker-Type-CandidateArtifactLocations-ModelInsights)`.

## Backtests results report
Backtests results report

Backtests results provide insights into the performance of a time-series forecasting model by evaluating its predictive accuracy and reliability. It helps analysts and data scientists assess its performance on historical data and assists in understanding its potential performance on future, unseen data.

Autopilot uses backtesting to tune parameters and produce accuracy metrics. During backtesting, Autopilot automatically splits your time-series data into two sets, a training set and a testing set. The training set is used to train a model which is then used to generate forecasts for data points in the testing set. Autopilot uses this testing dataset to evaluate the model's accuracy by comparing forecasted values with observed values in the testing set.

You can find the Amazon S3 prefix to the model quality report artifacts generated for the best candidate in the response to `[DescribeAutoMLJobV2](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html)` at `[BestCandidate.CandidateProperties.CandidateArtifactLocations.BacktestResults](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CandidateArtifactLocations.html#sagemaker-Type-CandidateArtifactLocations-BacktestResults)`.

# Time-series forecasting resource limits for Autopilot
Time-series forecasting resource limits

The following table lists the resource limits for time-series forecasting jobs in Amazon SageMaker Autopilot and whether or not you can adjust each limit.


| **Resource limits** | **Default limit** | **Adjustable** | 
| --- | --- | --- | 
|  Size of input dataset  |  30 GB  |  Yes  | 
|  Size of a single Parquet file  |  2 GB  |  No  | 
|  Maximum number of rows in a dataset  |  3 billion  |  Yes  | 
|  Maximum number of grouping columns  |  5  |  No  | 
|  Maximum number of numerical features  |  13  |  No  | 
|  Maximum number of categorical features  |  10  |  No  | 
|  Maximum number of time-series (unique combinations of item and grouping columns) per dataset  |  5,000,000  |  Yes  | 
|  Maximum Forecast horizon  |  500  |  Yes  |