

# Time Series Forecasts in Amazon SageMaker Canvas
<a name="canvas-time-series"></a>

**Note**  
Time series forecasting models are only supported for tabular datasets.

Amazon SageMaker Canvas gives you the ability to use machine learning time series forecasts. Time series forecasts give you the ability to make predictions that can vary with time.

You can make a time series forecast for the following examples:
+ Forecasting your inventory in the coming months.
+ The number of items sold in the next four months.
+ The effect of reducing the price on sales during the holiday season.
+ Item inventory in the next 12 months.
+ The number of customers entering a store in the next several hours.
+ Forecasting how a 10% reduction in the price of a product affects sales over a time period.

To make a time series forecast, your dataset must have the following:
+ A timestamp column with all values having the `datetime` type.
+ A target column that has the values that you're using to forecast future values.
+ An item ID column that contains unique identifiers for each item in your dataset, such as SKU numbers.

The `datetime` values in the timestamp column must use one of the following formats:
+ `YYYY-MM-DD HH:MM:SS`
+ `YYYY-MM-DDTHH:MM:SSZ`
+ `YYYY-MM-DD`
+ `MM/DD/YY`
+ `MM/DD/YY HH:MM`
+ `MM/DD/YYYY`
+ `YYYY/MM/DD HH:MM:SS`
+ `YYYY/MM/DD`
+ `DD/MM/YYYY`
+ `DD/MM/YY`
+ `DD-MM-YY`
+ `DD-MM-YYYY`

You can make forecasts for the following intervals:
+ 1 min
+ 5 min
+ 15 min
+ 30 min
+ 1 hour
+ 1 day
+ 1 week
+ 1 month
+ 1 year

## Future values in your input dataset
<a name="canvas-time-series-future"></a>

Canvas automatically detects columns in your dataset that might potentially contain future values. If present, these values can enhance the accuracy of predictions. Canvas marks these specific columns with a `Future values` label. Canvas infers the relationship between the data in these columns and the target column that you are trying to predict, and utilizes that relationship to generate more accurate forecasts.

For example, you can forecast the amount of ice cream sold by a grocery store. To make a forecast, you must have a timestamp column and a column that indicates how much ice cream the grocery store sold. For a more accurate forecast, your dataset can also include the price, the ambient temperature, the flavor of the ice cream, or a unique identifier for the ice cream.

Ice cream sales might increase when the weather is warmer. A decrease in the price of the ice cream might result in more units sold. Having a column with ambient temperature data and a column with pricing data can improve your ability to forecast the number of units of ice cream the grocery store sells.

While providing future values is optional, it helps you to perform what-if analyses directly in the Canvas application, showing you how changes in future values could alter your predictions.

## Handling missing values
<a name="canvas-time-series-missing"></a>

You might have missing data for different reasons. The reason for your missing data might inform how you want Canvas to impute it. For example, your organization might use an automatic system that only tracks when a sale happens. If you're using a dataset that comes from this type of automatic system, you have missing values in the target column.

**Important**  
If you have missing values in the target column, we recommend using a dataset that doesn't have them. SageMaker Canvas uses the target column to forecast future values. Missing values in the target column can greatly reduce the accuracy of the forecast.

For missing values in the dataset, Canvas automatically imputes the missing values for you by filling the target column with `0` and other numeric columns with the median value of the column.

However, you can select your own filling logic for the target column and other numeric columns in your datasets. Target columns have different filling guidelines and restrictions than the rest of the numeric columns. Target columns are filled up to the end of the historical period, whereas numeric columns are filled across both historical and future periods all the way to the end of the forecast horizon. Canvas only fills future values in a numeric column if your data has at least one record with a future timestamp and a value for that specific column.

You can choose one of the following filling logic options to impute missing values in your data:
+ `zero` – Fill with `0`.
+ `NaN` – Fill with NaN, or not a number. This is only supported for the target column.
+ `mean` – Fill with the mean value from the data series.
+ `median` – Fill with the median value from the data series.
+ `min` – Fill with the minimum value from the data series.
+ `max` – Fill with the maximum value from the data series.

When choosing a filling logic, you should consider how your model interprets the logic. For example, in a retail scenario, recording zero sales of an available item is different from recording zero sales of an unavailable item, as the latter scenario doesn’t necessarily imply a lack of customer interest in the unavailable item. In this case, filling with `0` in the target column of the dataset might cause the model to be under-biased in its predictions and infer a lack of customer interest in unavailable items. Conversely, filling with `NaN` might cause the model to ignore true occurrences of zero items being sold of available items.

## Types of forecasts
<a name="canvas-time-series-types"></a>

You can make one of the following types of forecasts:
+ **Single item**
+ **All items**

For a forecast on all the items in your dataset, SageMaker Canvas returns a forecast for the future values for each item in your dataset.

For a single item forecast, you specify the item and SageMaker Canvas returns a forecast for the future values. The forecast includes a line graph that plots the predicted values over time.

**Topics**
+ [

## Future values in your input dataset
](#canvas-time-series-future)
+ [

## Handling missing values
](#canvas-time-series-missing)
+ [

## Types of forecasts
](#canvas-time-series-types)
+ [

# Additional options for forecasting insights
](canvas-additional-insights.md)

# Additional options for forecasting insights
<a name="canvas-additional-insights"></a>

In Amazon SageMaker Canvas, you can use the following optional methods to get more insights from your forecast:
+ Group column
+ Holiday schedule
+ What-if scenario

You can specify a column in your dataset as a **Group column**. Amazon SageMaker Canvas groups the forecast by each value in the column. For example, you can group the forecast on columns containing price data or unique item identifiers. Grouping a forecast by a column lets you make more specific forecasts. For example, if you group a forecast on a column containing item identifiers, you can see the forecast for each item.

Overall sales of items might be impacted by the presence of holidays. For example, in the United States, the number of items sold in both November and December might differ greatly from the number of items sold in January. If you use the data from November and December to forecast the sales in January, your results might be inaccurate. Using a holiday schedule prevents you getting inaccurate results. You can use a holiday schedule for 251 countries.

For a forecast on a single item in your dataset, you can use what-if scenarios. A what-if scenario gives you the ability to change values in your data and change the forecast. For example, you can answer the following questions by using a what-if scenario, "What if I lowered prices? How would that affect the number of items sold?"