

 Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# CREATE MODEL
<a name="r_CREATE_MODEL"></a>

**Topics**
+ [Prerequisites](#r_create_model_prereqs)
+ [Required privileges](#r_simple_create_model-privileges)
+ [Cost control](#r_create_model_cost)
+ [Full CREATE MODEL](#r_full_create_model)
+ [Parameters](#r_create_model_parameters)
+ [Usage notes](r_create_model_usage_notes.md)
+ [Use cases](r_create_model_use_cases.md)

## Prerequisites
<a name="r_create_model_prereqs"></a>

Before you use the CREATE MODEL statement, complete the prerequisites in [Cluster setup for using Amazon Redshift ML](getting-started-machine-learning.md#cluster-setup). The following is a high-level summary of the prerequisites.
+ Create an Amazon Redshift cluster with the AWS Management Console or the AWS Command Line Interface (AWS CLI).
+ Attach the AWS Identity and Access Management (IAM) policy while creating the cluster.
+ To allow Amazon Redshift and SageMaker AI to assume the role to interact with other services, add the appropriate trust policy to the IAM role.

For details for the IAM role, trust policy, and other prerequisites, see [Cluster setup for using Amazon Redshift ML](getting-started-machine-learning.md#cluster-setup).

Following, you can find different use cases for the CREATE MODEL statement.
+ [Simple CREATE MODEL](r_create_model_use_cases.md#r_simple_create_model)
+ [CREATE MODEL with user guidance](r_create_model_use_cases.md#r_user_guidance_create_model)
+ [CREATE XGBoost models with AUTO OFF](r_create_model_use_cases.md#r_auto_off_create_model)
+ [Bring your own model (BYOM) - local inference](r_create_model_use_cases.md#r_byom_create_model)
+ [Bring your own model (BYOM) - remote inference](r_create_model_use_cases.md#r_byom_create_model_remote)
+ [CREATE MODEL with K-MEANS](r_create_model_use_cases.md#r_k-means_create_model)
+ [Full CREATE MODEL](#r_full_create_model)

## Required privileges
<a name="r_simple_create_model-privileges"></a>

Following are required privileges for CREATE MODEL:
+ Superuser
+ Users with the CREATE MODEL privilege
+ Roles with the GRANT CREATE MODEL privilege

## Cost control
<a name="r_create_model_cost"></a>

 Amazon Redshift ML uses existing cluster resources to create prediction models, so you don’t have to pay additional costs. However, you might have additional costs if you need to resize your cluster or want to train your models. Amazon Redshift ML uses Amazon SageMaker AI to train models, which does have an additional associated cost. There are ways to control additional costs, such as limiting the maximum amount of time training can take or by limiting the number of training examples used to train your model. For more information, see [Costs for using Amazon Redshift ML](https://docs.aws.amazon.com/redshift/latest/dg/cost.html). 

## Full CREATE MODEL
<a name="r_full_create_model"></a>

The following summarizes the basic options of the full CREATE MODEL syntax.

### Full CREATE MODEL syntax
<a name="r_auto_off-create-model-synposis"></a>

The following is the full syntax of the CREATE MODEL statement.

**Important**  
When creating a model using the CREATE MODEL statement, follow the order of the keywords in the syntax following.

```
CREATE MODEL model_name
FROM { table_name | ( select_statement )  | 'job_name' }
[ TARGET column_name ]
FUNCTION function_name [ ( data_type [, ...] ) ] 
[ RETURNS data_type ] 
  -- supported only for BYOM
[ SAGEMAKER 'endpoint_name'[:'model_name']] 
  -- supported only for BYOM remote inference
IAM_ROLE { default | 'arn:aws:iam::<account-id>:role/<role-name>' }
[ AUTO ON / OFF ]
  -- default is AUTO ON
[ MODEL_TYPE { XGBOOST | MLP | LINEAR_LEARNER | KMEANS | FORECAST } ]
  -- not required for non AUTO OFF case, default is the list of all supported types
  -- required for AUTO OFF
[ PROBLEM_TYPE ( REGRESSION | BINARY_CLASSIFICATION | MULTICLASS_CLASSIFICATION ) ]
  -- not supported when AUTO OFF
[ OBJECTIVE ( 'MSE' | 'Accuracy' | 'F1' | 'F1_Macro' | 'AUC' |
             'reg:squarederror' | 'reg:squaredlogerror'| 'reg:logistic'|
             'reg:pseudohubererror' | 'reg:tweedie' | 'binary:logistic' | 'binary:hinge',
             'multi:softmax' | 'RMSE' | 'WAPE' | 'MAPE' | 'MASE' | 'AverageWeightedQuantileLoss' ) ]
  -- for AUTO ON: first 5 are valid
  -- for AUTO OFF: 6-13 are valid
  -- for FORECAST: 14-18 are valid
[ PREPROCESSORS 'string' ]
  -- required for AUTO OFF, when it has to be 'none'
  -- optional for AUTO ON
[ HYPERPARAMETERS { DEFAULT | DEFAULT EXCEPT ( Key 'value' (,...) ) } ]
  -- support XGBoost hyperparameters, except OBJECTIVE
  -- required and only allowed for AUTO OFF
  -- default NUM_ROUND is 100
  -- NUM_CLASS is required if objective is multi:softmax (only possible for AUTO OFF)
 [ SETTINGS (
   S3_BUCKET 'amzn-s3-demo-bucket',  |
    -- required
  TAGS 'string', |
    -- optional
  KMS_KEY_ID 'kms_string', |
    -- optional
  S3_GARBAGE_COLLECT on / off, |
    -- optional, defualt is on.
  MAX_CELLS integer, |
    -- optional, default is 1,000,000
  MAX_RUNTIME integer (, ...) |
    -- optional, default is 5400 (1.5 hours)
  HORIZON integer, |
    -- required if creating a forecast model
  FREQUENCY integer, |
    -- required if creating a forecast model
  PERCENTILES string, |
    -- optional if creating a forecast model
  MAX_BATCH_ROWS integer -- optional for BYOM remote inference
    ) ]
```

## Parameters
<a name="r_create_model_parameters"></a>

model\$1name  
The name of the model. The model name in a schema must be unique.

FROM \$1 *table\$1name* \$1 ( *select\$1query* ) \$1 *'job\$1name'*\$1  
The table\$1name or the query that specifies the training data. They can either be an existing table in the system, or an Amazon Redshift-compatible SELECT query enclosed with parentheses, that is (). There must be at least two columns in the query result. 

TARGET *column\$1name*  
The name of the column that becomes the prediction target. The column must exist in the FROM clause. 

FUNCTION *function\$1name* ( *data\$1type* [, ...] )  
The name of the function to be created and the data types of the input arguments. You can provide the schema name of a schema in your database instead of a function name.

RETURNS *data\$1type*  
The data type to be returned from the model's function. The returned `SUPER` data type is applicable only to BYOM with remote inference.

SAGEMAKER '*endpoint\$1name*'[:'*model\$1name*']  
The name of the Amazon SageMaker AI endpoint. If the endpoint name points to a multimodel endpoint, add the name of the model to use. The endpoint must be hosted in the same AWS Region as the Amazon Redshift cluster.

IAM\$1ROLE \$1 default \$1 'arn:aws:iam::<account-id>:role/<role-name>' \$1  
 Use the default keyword to have Amazon Redshift use the IAM role that is set as default and associated with the cluster when the CREATE MODEL command runs. Alternatively, you can specify an ARN of an IAM role to use that role.

[ AUTO ON / OFF ]  
 Turns on or off CREATE MODEL automatic discovery of preprocessor, algorithm, and hyper-parameters selection. Specifying on when creating a Forecast model indicates to use an AutoPredictor, where Amazon Forecast applies the optimal combinations of algorithms to each time series in your dataset. 

 *MODEL\$1TYPE \$1 XGBOOST \$1 MLP \$1 LINEAR\$1LEARNER \$1 KMEANS \$1 FORECAST \$1*   
(Optional) Specifies the model type. You can specify if you want to train a model of a specific model type, such as XGBoost, multilayer perceptron (MLP), KMEANS, or Linear Learner, which are all algorithms that Amazon SageMaker AI Autopilot supports. If you don't specify the parameter, then all supported model types are searched during training for the best model. You can also create a forecast model in Redshift ML to create accurate time-series forecasts.

 *PROBLEM\$1TYPE ( REGRESSION \$1 BINARY\$1CLASSIFICATION \$1 MULTICLASS\$1CLASSIFICATION )*   
(Optional) Specifies the problem type. If you know the problem type, you can restrict Amazon Redshift to only search of the best model of that specific model type. If you don't specify this parameter, a problem type is discovered during the training, based on your data.

OBJECTIVE ( 'MSE' \$1 'Accuracy' \$1 'F1' \$1 'F1Macro' \$1 'AUC' \$1 'reg:squarederror' \$1 'reg:squaredlogerror' \$1 'reg:logistic' \$1 'reg:pseudohubererror' \$1 'reg:tweedie' \$1 'binary:logistic' \$1 'binary:hinge' \$1 'multi:softmax' \$1 'RMSE' \$1 'WAPE' \$1 'MAPE' \$1 'MASE' \$1 'AverageWeightedQuantileLoss' )  
(Optional) Specifies the name of the objective metric used to measure the predictive quality of a machine learning system. This metric is optimized during training to provide the best estimate for model parameter values from data. If you don't specify a metric explicitly, the default behavior is to automatically use MSE: for regression, F1: for binary classification, Accuracy: for multiclass classification. For more information about objectives, see [AutoMLJobObjective](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_AutoMLJobObjective.html) in the *Amazon SageMaker AI API Reference* and [Learning task parameters](https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters)in the XGBOOST documentation. The values RMSE, WAPE, MAPE, MASE, and AverageWeightedQuantileLoss are only applicable to Forecast models. For more information, see the [CreateAutoPredictor](https://docs.aws.amazon.com/forecast/latest/dg/API_CreateAutoPredictor.html#forecast-CreateAutoPredictor-request-OptimizationMetric) API operation.

 *PREPROCESSORS 'string' *   
(Optional) Specifies certain combinations of preprocessors to certain sets of columns. The format is a list of columnSets, and the appropriate transforms to be applied to each set of columns. Amazon Redshift applies all the transformers in a specific transformers list to all columns in the corresponding ColumnSet. For example, to apply OneHotEncoder with Imputer to columns t1 and t2, use the sample command following.  

```
CREATE MODEL customer_churn
FROM customer_data
TARGET 'Churn'
FUNCTION predict_churn
IAM_ROLE { default | 'arn:aws:iam::<account-id>:role/<role-name>' }
PROBLEM_TYPE BINARY_CLASSIFICATION
OBJECTIVE 'F1'
PREPROCESSORS '[
...
  {"ColumnSet": [
      "t1",
      "t2"
    ],
    "Transformers": [
      "OneHotEncoder",
      "Imputer"
    ]
  },
  {"ColumnSet": [
      "t3"
    ],
    "Transformers": [
      "OneHotEncoder"
    ]
  },
  {"ColumnSet": [
      "temp"
    ],
    "Transformers": [
      "Imputer",
      "NumericPassthrough"
    ]
  }
]'
SETTINGS (
  S3_BUCKET 'amzn-s3-demo-bucket'
)
```

HYPERPARAMETERS \$1 DEFAULT \$1 DEFAULT EXCEPT ( key ‘value’ (,..) ) \$1  
Specifies whether the default XGBoost parameters are used or overridden by user-specified values. The values must be enclosed with single quotes. Following are examples of parameters for XGBoost and their defaults.      
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_MODEL.html)

SETTINGS ( S3\$1BUCKET *'amzn-s3-demo-bucket'*, \$1 TAGS 'string', \$1 KMS\$1KEY\$1ID *'kms\$1string' *, \$1 S3\$1GARBAGE\$1COLLECT on / off, \$1 MAX\$1CELLS integer , \$1 MAX\$1RUNTIME (,...) , \$1 HORIZON integer, \$1 FREQUENCY forecast\$1frequency, \$1 PERCENTILES array of strings )  
S3\$1BUCKET clause specifies the Amazon S3 location that is used to store intermediate results.  
(Optional) The TAGS parameter is a comma-separated list of key-value pairs that you can use to tag resources created in Amazon SageMaker AI; and Amazon Forecast. Tags help you organize resources and allocate costs. Values in the pair are optional, so you can create tags by using the format `key=value` or just by creating a key. For more information about tags in Amazon Redshift, see [ Tagging overview](https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-tagging.html).  
(Optional) KMS\$1KEY\$1ID specifies if Amazon Redshift uses server-side encryption with an AWS KMS key to protect data at rest. Data in transit is protected with Secure Sockets Layer (SSL).   
(Optional) S3\$1GARBAGE\$1COLLECT \$1 ON \$1 OFF \$1 specifies whether Amazon Redshift performs garbage collection on the resulting datasets used to train models and the models. If set to OFF, the resulting datasets used to train models and the models remains in Amazon S3 and can be used for other purposes. If set to ON, Amazon Redshift deletes the artifacts in Amazon S3 after the training completes. The default is ON.  
(Optional) MAX\$1CELLS specifies the number of cells in the training data. This value is the product of the number of records (in the training query or table) times the number of columns. The default is 1,000,000.  
(Optional) MAX\$1RUNTIME specifies the maximum amount of time to train. Training jobs often complete sooner depending on dataset size. This specifies the maximum amount of time the training should take. The default is 5,400 (90 minutes).  
HORIZON specifies the maximum number of predictions the forecast model can return. Once the model is trained, you can't change this integer. This parameter is required if training a forecast model.  
FREQUENCY specifies how granular in time units you want the forecasts to be. Available options are `Y | M | W | D | H | 30min | 15min | 10min | 5min | 1min`. This parameter is required if training a forecast model.  
(Optional) PERCENTILES is a comma-delimited string that specifies the forecast types used to train a predictor. Forecast types can be quantiles from 0.01 to 0.99, in increments of 0.01 or higher. You can also specify the mean forecast with mean. You can specify a maximum of five forecast types.

 MAX\$1BATCH\$1ROWS *integer*   
(Optional) The maximum number of rows that Amazon Redshift sends in a single batch request for a single SageMaker AI invocation. It is supported only for BYOM with remote inference. This parameter's minimum value is 1. The maximum value is `INT_MAX`, or 2,147,483,647. This parameter is required only when both input and returned data types are *SUPER*. The default value is `INT_MAX`, or 2,147,483,647. 

# Usage notes
<a name="r_create_model_usage_notes"></a>

When using CREATE MODEL, consider the following:
+ The CREATE MODEL statement operates in an asynchronous mode and returns upon the export of training data to Amazon S3. The remaining steps of training in Amazon SageMaker AI occur in the background. While training is in progress, the corresponding inference function is visible but can't be run. You can query [STV\$1ML\$1MODEL\$1INFO](r_STV_ML_MODEL_INFO.md) to see the state of training. 
+ The training can run for up to 90 minutes in the background, by default in the Auto model and can be extended. To cancel the training, simply run the [DROP MODEL](r_DROP_MODEL.md) command.
+ The Amazon Redshift cluster that you use to create the model and the Amazon S3 bucket that is used to stage the training data and model artifacts must be in the same AWS Region.
+ During the model training, Amazon Redshift and SageMaker AI store intermediate artifacts in the Amazon S3 bucket that you provide. By default, Amazon Redshift performs garbage collection at the end of the CREATE MODEL operation. Amazon Redshift removes those objects from Amazon S3. To retain those artifacts on Amazon S3, set the S3\$1GARBAGE COLLECT OFF option.
+ You must use at least 500 rows in the training data provided in the FROM clause.
+ You can only specify up to 256 feature (input) columns in the FROM \$1 table\$1name \$1 ( select\$1query ) \$1 clause when using the CREATE MODEL statement.
+ For AUTO ON, the column types that you can use as the training set are SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, BOOLEAN, CHAR, VARCHAR, DATE, TIME, TIMETZ, TIMESTAMP, and TIMESTAMPTZ. For AUTO OFF, the column types that you can use as the training set are SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, and BOOLEAN.
+ You can't use DECIMAL, DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ, GEOMETRY, GEOGRAPHY, HLLSKETCH, SUPER, or VARBYTE as the target column type.
+ To improve model accuracy, do one of the following:
  + Add as many relevant columns in the CREATE MODEL command as possible when you specify the training data in the FROM clause.
  + Use a larger value for MAX\$1RUNTIME and MAX\$1CELLS. Larger values for this parameter increase the cost of training a model.
+ The CREATE MODEL statement execution returns as soon as the training data is computed and exported to the Amazon S3 bucket. After that point, you can check the status of the training using the SHOW MODEL command. When a model being trained in the background fails, you can check the error using SHOW MODEL. You can't retry a failed model. Use DROP MODEL to remove a failed model and recreate a new model. For more information about SHOW MODEL, see [SHOW MODEL](r_SHOW_MODEL.md).
+ Local BYOM supports the same kind of models that Amazon Redshift ML supports for non-BYOM cases. Amazon Redshift supports plain XGBoost (using XGBoost version 1.0 or later), KMEANS models without preprocessors, and XGBOOST/MLP/Linear Learner models trained by trained by Amazon SageMaker AI Autopilot. It supports the latter with preprocessors that Autopilot has specified that are also supported by Amazon SageMaker AI Neo.
+ If your Amazon Redshift cluster has enhanced routing enabled for your virtual private cloud (VPC), make sure to create an Amazon S3 VPC endpoint and an SageMaker AI VPC endpoint for the VPC that your cluster is in. Doing this enables the traffic to run through your VPC between these services during CREATE MODEL. For more information, see [SageMaker AI Clarify Job Amazon VPC Subnets and Security Groups](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-vpc.html#clarify-vpc-job).

# Use cases
<a name="r_create_model_use_cases"></a>

The following use cases demonstrate how to use CREATE MODEL to suit your needs.

## Simple CREATE MODEL
<a name="r_simple_create_model"></a>

The following summarizes the basic options of the CREATE MODEL syntax.

### Simple CREATE MODEL syntax
<a name="r_simple-create-model-synposis"></a>

```
CREATE MODEL model_name
FROM { table_name | ( select_query ) }
TARGET column_name
FUNCTION prediction_function_name
IAM_ROLE { default }
SETTINGS (
  S3_BUCKET 'amzn-s3-demo-bucket',
  [ MAX_CELLS integer ]
)
```

### Simple CREATE MODEL parameters
<a name="r_simple-create-model-parameters"></a>

 *model\$1name*   
The name of the model. The model name in a schema must be unique.

FROM \$1 *table\$1name* \$1 ( *select\$1query* ) \$1  
The table\$1name or the query that specifies the training data. They can either be an existing table in the system, or an Amazon Redshift-compatible SELECT query enclosed with parentheses, that is (). There must be at least two columns in the query result. 

TARGET *column\$1name*  
The name of the column that becomes the prediction target. The column must exist in the FROM clause. 

FUNCTION *prediction\$1function\$1name*   
A value that specifies the name of the Amazon Redshift machine learning function to be generated by the CREATE MODEL and used to make predictions using this model. The function is created in the same schema as the model object and can be overloaded.  
Amazon Redshift machine learning supports models, such as Xtreme Gradient Boosted tree (XGBoost) models for regression and classification.

IAM\$1ROLE \$1 default \$1 'arn:aws:iam::<account-id>:role/<role-name>' \$1  
 Use the default keyword to have Amazon Redshift use the IAM role that is set as default and associated with the cluster when the CREAT MODEL command runs. Alternatively, you can specify the ARN of an IAM role to use that role.

 *S3\$1BUCKET *'amzn-s3-demo-bucket'**   
The name of the Amazon S3 bucket that you previously created used to share training data and artifacts between Amazon Redshift and SageMaker AI. Amazon Redshift creates a subfolder in this bucket prior to unload of the training data. When training is complete, Amazon Redshift deletes the created subfolder and its contents. 

MAX\$1CELLS integer   
The maximum number of cells to export from the FROM clause. The default is 1,000,000.   
The number of cells is the product of the number of rows in the training data (produced by the FROM clause table or query) times the number of columns. If the number of cells in the training data are more than that specified by the max\$1cells parameter, CREATE MODEL downsamples the FROM clause training data to reduce the size of the training set below MAX\$1CELLS. Allowing larger training datasets can produce higher accuracy but also can mean the model takes longer to train and costs more.  
For information about costs of using Amazon Redshift, see [Costs for using Amazon Redshift ML](cost.md).  
For more information about costs associated with various cell numbers and free trial details, see [Amazon Redshift pricing](https://aws.amazon.com/redshift/pricing).

## CREATE MODEL with user guidance
<a name="r_user_guidance_create_model"></a>

Following, you can find a description of options for CREATE MODEL in addition to the options described in [Simple CREATE MODEL](#r_simple_create_model).

By default, CREATE MODEL searches for the best combination of preprocessing and model for your specific dataset. You might want additional control or introduce additional domain knowledge (such as problem type or objective) over your model. In a customer churn scenario, if the outcome “customer is not active” is rare, then the F1 objective is often preferred to the accuracy objective. Because high accuracy models might predict “customer is active” all the time, this results in high accuracy but little business value. For information about F1 objective, see [AutoMLJobObjective](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_AutoMLJobObjective.html) in the *Amazon SageMaker AI API Reference*.

Then the CREATE MODEL follows your suggestions on the specified aspects, such as the objective. At the same time, the CREATE MODEL automatically discovers the best preprocessors and the best hyperparameters. 

### CREATE MODEL with user guidance syntax
<a name="r_user_guidance-create-model-synposis"></a>

CREATE MODEL offers more flexibility on the aspects that you can specify and the aspects that Amazon Redshift automatically discovers.

```
CREATE MODEL model_name
FROM { table_name | ( select_statement ) }
TARGET column_name
FUNCTION function_name
IAM_ROLE { default }
[ MODEL_TYPE { XGBOOST | MLP | LINEAR_LEARNER} ]
[ PROBLEM_TYPE ( REGRESSION | BINARY_CLASSIFICATION | MULTICLASS_CLASSIFICATION ) ]
[ OBJECTIVE ( 'MSE' | 'Accuracy' | 'F1' | 'F1Macro' | 'AUC') ]
SETTINGS (
  S3_BUCKET 'amzn-s3-demo-bucket', |
  S3_GARBAGE_COLLECT { ON | OFF }, |
  KMS_KEY_ID 'kms_key_id', |
  MAX_CELLS integer, |
  MAX_RUNTIME integer (, ...)
)
```

### CREATE MODEL with user guidance parameters
<a name="r_user_guidance-create-model-parameters"></a>

 *MODEL\$1TYPE \$1 XGBOOST \$1 MLP \$1 LINEAR\$1LEARNER \$1*   
(Optional) Specifies the model type. You can specify if you want to train a model of a specific model type, such as XGBoost, multilayer perceptron (MLP), or Linear Learner, which are all algorithms that Amazon SageMaker AI Autopilot supports. If you don't specify the parameter, then all supported model types are searched during training for the best model.

 *PROBLEM\$1TYPE ( REGRESSION \$1 BINARY\$1CLASSIFICATION \$1 MULTICLASS\$1CLASSIFICATION )*   
(Optional) Specifies the problem type. If you know the problem type, you can restrict Amazon Redshift to only search of the best model of that specific model type. If you don't specify this parameter, a problem type is discovered during the training, based on your data.

OBJECTIVE ( 'MSE' \$1 'Accuracy' \$1 'F1' \$1 'F1Macro' \$1 'AUC')  
(Optional) Specifies the name of the objective metric used to measure the predictive quality of a machine learning system. This metric is optimized during training to provide the best estimate for model parameter values from data. If you don't specify a metric explicitly, the default behavior is to automatically use MSE: for regression, F1: for binary classification, Accuracy: for multiclass classification. For more information about objectives, see [AutoMLJobObjective](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_AutoMLJobObjective.html) in the *Amazon SageMaker AI API Reference*.

MAX\$1CELLS integer   
(Optional) Specifies the number of cells in the training data. This value is the product of the number of records (in the training query or table) times the number of columns. The default is 1,000,000.

MAX\$1RUNTIME integer   
(Optional) Specifies the maximum amount of time to train. Training jobs often complete sooner depending on dataset size. This specifies the maximum amount of time the training should take. The default is 5,400 (90 minutes).

S3\$1GARBAGE\$1COLLECT \$1 ON \$1 OFF \$1  
(Optional) Specifies whether Amazon Redshift performs garbage collection on the resulting datasets used to train models and the models. If set to OFF, the resulting datasets used to train models and the models remains in Amazon S3 and can be used for other purposes. If set to ON, Amazon Redshift deletes the artifacts in Amazon S3 after the training completes. The default is ON.

KMS\$1KEY\$1ID 'kms\$1key\$1id'  
(Optional) Specifies if Amazon Redshift uses server-side encryption with an AWS KMS key to protect data at rest. Data in transit is protected with Secure Sockets Layer (SSL). 

 *PREPROCESSORS 'string' *   
(Optional) Specifies certain combinations of preprocessors to certain sets of columns. The format is a list of columnSets, and the appropriate transforms to be applied to each set of columns. Amazon Redshift applies all the transformers in a specific transformers list to all columns in the corresponding ColumnSet. For example, to apply OneHotEncoder with Imputer to columns t1 and t2, use the sample command following.  

```
CREATE MODEL customer_churn
FROM customer_data
TARGET 'Churn'
FUNCTION predict_churn
IAM_ROLE { default | 'arn:aws:iam::<account-id>:role/<role-name>' }
PROBLEM_TYPE BINARY_CLASSIFICATION
OBJECTIVE 'F1'
PREPROCESSORS '[
...
{"ColumnSet": [
    "t1",
    "t2"
  ],
  "Transformers": [
    "OneHotEncoder",
    "Imputer"
  ]
},
{"ColumnSet": [
    "t3"
  ],
  "Transformers": [
    "OneHotEncoder"
  ]
},
{"ColumnSet": [
    "temp"
  ],
  "Transformers": [
    "Imputer",
    "NumericPassthrough"
  ]
}
]'
SETTINGS (
S3_BUCKET 'amzn-s3-demo-bucket'
)
```

Amazon Redshift supports the following transformers:
+ OneHotEncoder – Typically used to encode a discrete value into a binary vector with one nonzero value. This transformer is suitable for many machine learning models. 
+ OrdinalEncoder – Encodes discrete values into a single integer. This transformer is suitable for certain machine learning models, such as MLP and Linear Learner. 
+ NumericPassthrough – Passes input as is into the model.
+ Imputer – Fills in missing values and not a number (NaN) values.
+ ImputerWithIndicator – Fills in missing values and NaN values. This transformer also creates an indicator of whether any values were missing and filled in.
+ Normalizer – Normalizes values, which can improve the performance of many machine learning algorithms.
+ DateTimeVectorizer – Creates a vector embedding, representing a column of datetime data type that can be used in machine learning models.
+ PCA – Projects the data into a lower dimensional space to reduce the number of features while keeping as much information as possible.
+ StandardScaler – Standardizes features by removing the mean and scaling to unit variance. 
+ MinMax – Transforms features by scaling each feature to a given range.

Amazon Redshift ML stores the trained transformers, and automatically applies them as part of the prediction query. You don't need to specify them when generating predictions from your model. 

## CREATE XGBoost models with AUTO OFF
<a name="r_auto_off_create_model"></a>

The AUTO OFF CREATE MODEL has generally different objectives from the default CREATE MODEL.

As an advanced user who already knows the model type that you want and hyperparameters to use when training these models, you can use CREATE MODEL with AUTO OFF to turn off the CREATE MODEL automatic discovery of preprocessors and hyperparameters. To do so, you explicitly specify the model type. XGBoost is currently the only model type supported when AUTO is set to OFF. You can specify hyperparameters. Amazon Redshift uses default values for any hyperparameters that you specified. 

### CREATE XGBoost models with AUTO OFF syntax
<a name="r_auto_off-create-model-synposis"></a>

```
CREATE MODEL model_name
FROM { table_name | (select_statement ) }
TARGET column_name
FUNCTION function_name
IAM_ROLE { default }
AUTO OFF
MODEL_TYPE XGBOOST
OBJECTIVE { 'reg:squarederror' | 'reg:squaredlogerror' | 'reg:logistic' |
            'reg:pseudohubererror' | 'reg:tweedie' | 'binary:logistic' | 'binary:hinge' |
            'multi:softmax' | 'rank:pairwise' | 'rank:ndcg' }
HYPERPARAMETERS DEFAULT EXCEPT (
    NUM_ROUND '10',
    ETA '0.2',
    NUM_CLASS '10',
    (, ...)
)
PREPROCESSORS 'none'
SETTINGS (
  S3_BUCKET 'amzn-s3-demo-bucket', |
  S3_GARBAGE_COLLECT { ON | OFF }, |
  KMS_KEY_ID 'kms_key_id', |
  MAX_CELLS integer, |
  MAX_RUNTIME integer (, ...)
)
```

### CREATE XGBoost models with AUTO OFF parameters
<a name="r_auto_off-create-model-parameters"></a>

 *AUTO OFF*   
Turns off CREATE MODEL automatic discovery of preprocessor, algorithm, and hyper-parameters selection.

MODEL\$1TYPE XGBOOST  
Specifies to use XGBOOST to train the model. 

OBJECTIVE str  
Specifies an objective recognized by the algorithm. Amazon Redshift supports reg:squarederror, reg:squaredlogerror, reg:logistic, reg:pseudohubererror, reg:tweedie, binary:logistic, binary:hinge, multi:softmax. For more information about these objectives, see [Learning task parameters](https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters) in the XGBoost documentation.

HYPERPARAMETERS \$1 DEFAULT \$1 DEFAULT EXCEPT ( key ‘value’ (,..) ) \$1  
Specifies whether the default XGBoost parameters are used or overridden by user-specified values. The values must be enclosed with single quotes. Following are examples of parameters for XGBoost and their defaults.      
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/r_create_model_use_cases.html)

The following example prepares data for XGBoost.

```
DROP TABLE IF EXISTS abalone_xgb;

CREATE TABLE abalone_xgb (
length_val float,
diameter float,
height float,
whole_weight float,
shucked_weight float,
viscera_weight float,
shell_weight float,
rings int,
record_number int);

COPY abalone_xgb
FROM 's3://redshift-downloads/redshift-ml/abalone_xg/'
REGION 'us-east-1'
IAM_ROLE default
IGNOREHEADER 1 CSV;
```

The following example creates an XGBoost model with specified advanced options, such as MODEL\$1TYPE, OBJECTIVE, and PREPROCESSORS.

```
DROP MODEL abalone_xgboost_multi_predict_age;

CREATE MODEL abalone_xgboost_multi_predict_age
FROM ( SELECT length_val,
              diameter,
              height,
              whole_weight,
              shucked_weight,
              viscera_weight,
              shell_weight,
              rings
   FROM abalone_xgb WHERE record_number < 2500 )
TARGET rings FUNCTION ml_fn_abalone_xgboost_multi_predict_age
IAM_ROLE default
AUTO OFF
MODEL_TYPE XGBOOST
OBJECTIVE 'multi:softmax'
PREPROCESSORS 'none'
HYPERPARAMETERS DEFAULT EXCEPT (NUM_ROUND '100', NUM_CLASS '30')
SETTINGS (S3_BUCKET 'amzn-s3-demo-bucket');
```

The following example uses an inference query to predict the age of the fish with a record number greater than 2500. It uses the function ml\$1fn\$1abalone\$1xgboost\$1multi\$1predict\$1age created from the above command. 

```
select ml_fn_abalone_xgboost_multi_predict_age(length_val,
                                                   diameter,
                                                   height,
                                                   whole_weight,
                                                   shucked_weight,
                                                   viscera_weight,
                                                   shell_weight)+1.5 as age
from abalone_xgb where record_number > 2500;
```

## Bring your own model (BYOM) - local inference
<a name="r_byom_create_model"></a>

Amazon Redshift ML supports using bring your own model (BYOM) for local inference.

The following summarizes the options for the CREATE MODEL syntax for BYOM. You can use a model trained outside of Amazon Redshift with Amazon SageMaker AI for in-database inference locally in Amazon Redshift.

### CREATE MODEL syntax for local inference
<a name="r_local-create-model"></a>

The following describes the CREATE MODEL syntax for local inference.

```
CREATE MODEL model_name
FROM ('job_name' | 's3_path' )
FUNCTION function_name ( data_type [, ...] )
RETURNS data_type
IAM_ROLE { default }
[ SETTINGS (
  S3_BUCKET 'amzn-s3-demo-bucket', | --required
  KMS_KEY_ID 'kms_string') --optional
];
```

Amazon Redshift currently only supports pretrained XGBoost, MLP, and Linear Learner models for BYOM. You can import SageMaker AI Autopilot and models directly trained in Amazon SageMaker AI for local inference using this path. 

#### CREATE MODEL parameters for local inference
<a name="r_local-create-model-parameters"></a>

 *model\$1name*   
The name of the model. The model name in a schema must be unique.

FROM (*'job\$1name'* \$1 *'s3\$1path'* )  
The *job\$1name* uses an Amazon SageMaker AI job name as the input. The job name can either be an Amazon SageMaker AI training job name or an Amazon SageMaker AI Autopilot job name. The job must be created in the same AWS account that owns the Amazon Redshift cluster. To find the job name, launch Amazon SageMaker AI. In the **Training** dropdown menu, choose **Training jobs**.  
The *'s3\$1path'* specifies the S3 location of the .tar.gz model artifacts file that is to be used when creating the model.

FUNCTION *function\$1name* ( *data\$1type* [, ...] )  
The name of the function to be created and the data types of the input arguments. You can provide a schema name.

RETURNS *data\$1type*  
The data type of the value returned by the function.

IAM\$1ROLE \$1 default \$1 'arn:aws:iam::<account-id>:role/<role-name>'\$1  
 Use the default keyword to have Amazon Redshift use the IAM role that is set as default and associated with the cluster when the CREATE MODEL command runs.  
Use the Amazon Resource Name (ARN) for an IAM role that your cluster uses for authentication and authorization. 

SETTINGS ( S3\$1BUCKET *'amzn-s3-demo-bucket'*, \$1 KMS\$1KEY\$1ID *'kms\$1string'*)  
The S3\$1BUCKET clause specifies the Amazon S3 location that is used to store intermediate results.  
(Optional) The KMS\$1KEY\$1ID clause specifies if Amazon Redshift uses server-side encryption with an AWS KMS key to protect data at rest. Data in transit is protected with Secure Sockets Layer (SSL).  
For more information, see [CREATE MODEL with user guidance](#r_user_guidance_create_model).

#### CREATE MODEL for local inference example
<a name="r_local-create-model-example"></a>

The following example creates a model that has been previously trained in Amazon SageMaker AI, outside of Amazon Redshift. Because the model type is supported by Amazon Redshift ML for local inference, the following CREATE MODEL creates a function that can be used locally in Amazon Redshift. You can provide a SageMaker AI training job name.

```
CREATE MODEL customer_churn
FROM 'training-job-customer-churn-v4'
FUNCTION customer_churn_predict (varchar, int, float, float)
RETURNS int
IAM_ROLE default
SETTINGS (S3_BUCKET 'amzn-s3-demo-bucket');
```

After the model is created, you can use the function *customer\$1churn\$1predict* with the specified argument types to make predictions.

## Bring your own model (BYOM) - remote inference
<a name="r_byom_create_model_remote"></a>

Amazon Redshift ML also supports using bring your own model (BYOM) for remote inference.

The following summarizes the options for the CREATE MODEL syntax for BYOM.

### CREATE MODEL syntax for remote inference
<a name="r_remote-create-model"></a>

The following describes the CREATE MODEL syntax for remote inference.

```
CREATE MODEL model_name 
FUNCTION function_name ( data_type [, ...] )
RETURNS data_type
SAGEMAKER 'endpoint_name'[:'model_name']
IAM_ROLE { default | 'arn:aws:iam::<account-id>:role/<role-name>' }
[SETTINGS (MAX_BATCH_ROWS integer)];
```

#### CREATE MODEL parameters for remote inference
<a name="r_remote-create-model-parameters"></a>

 *model\$1name*   
The name of the model. The model name in a schema must be unique.

FUNCTION *fn\$1name* ( [*data\$1type*] [, ...] )  
The name of the function and the data types of the input arguments. See [Data types](https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html) for all of the supported data types. `Geography`, `geometry`, and `hllsketch` aren't supported.   
You can also provide a function name inside a schema using two-part notation, such as `myschema.myfunction`.

RETURNS *data\$1type*  
The data type of the value returned by the function. See [Data types](https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html) for all of the supported data types. `Geography`, `geometry`, and `hllsketch` aren't supported. 

SAGEMAKER *'endpoint\$1name'*[:*'model\$1name'*]   
The name of the Amazon SageMaker AI endpoint. If the endpoint name points to a multimodel endpoint, add the name of the model to use. The endpoint must be hosted in the same AWS Region and AWS account as the Amazon Redshift cluster. To find your endpoint, launch Amazon SageMaker AI. In the **Inference** dropdown menu, choose **Endpoints**.

IAM\$1ROLE \$1 default \$1 'arn:aws:iam::<account-id>:role/<role-name>'\$1  
 Use the default keyword to have Amazon Redshift use the IAM role that is set as default and associated with the cluster when the CREATE MODEL command runs. Alternatively, you can specify the ARN of an IAM role to use that role.

MAX\$1BATCH\$1ROWS *integer*  
The maximum number of rows that Amazon Redshift sends in a single batch request for a single SageMaker AI invocation. It is supported only for BYOM with remote inference. The actual number of rows in a batch also depends on the input size, but is less than or equal to this value. This parameter's minimum value is 1. The maximum value is `INT_MAX`, or 2,147,483,647. This parameter is required only when both input and returned data types are `SUPER`. The default value is `INT_MAX`, or 2,147,483,647. 

When the model is deployed to a SageMaker AI endpoint, SageMaker AI creates the information of the model in Amazon Redshift. It then performs inference through the external function. You can use the SHOW MODEL command to view the model information on your Amazon Redshift cluster.

#### CREATE MODEL for remote inference usage notes
<a name="r_remote-create-model-usage-notes"></a>

Before using CREATE MODEL for remote inference, consider the following:
+ The endpoint must be hosted by the same AWS account that owns the Amazon Redshift cluster.
+ Make sure either that the Amazon SageMaker AI endpoint has enough resources to accommodate inference calls from Amazon Redshift or that the Amazon SageMaker AI endpoint can be automatically scaled.
+ If you're not using the `SUPER` data type as input, the model only accepts inputs in the format of comma-separated values (CSV) which corresponds to a content type of `text/CSV` in SageMaker AI.
+ If you're not using the `SUPER` data type as input, the output of models is a single value of the type specified when you create the function. The output is in the format of comma-separated values (CSV) through a content type of `text/CSV` in SageMaker AI. `VARCHAR` data types cannot be in quotes and cannot contain new lines, and each output must be in a new line.
+ Models accept nulls as empty strings.
+ When the input data type is `SUPER`, only one input argument is supported. 
+ When the input data type is `SUPER`, the returned data type must also be `SUPER`. 
+ MAX\$1BATCH\$1ROWS is required when both input and returned data types are SUPER. 
+ When the input data type is `SUPER` the content type of the endpoint invocation is either `application/json` when MAX\$1BATCH\$1ROWS is `1` or `application/jsonlines` in all other cases. 
+ When the returned data type is `SUPER` the accept type of the endpoint invocation is either `application/json` when MAX\$1BATCH\$1ROWS is `1` or `application/jsonlines` in all other cases. 

##### CREATE MODEL for remote inference example
<a name="r_remote-create-model-example"></a>

The following example creates a model that uses a SageMaker AI endpoint to make predictions. Make sure that the endpoint is running to make predictions and specify its name in the CREATE MODEL command.

```
CREATE MODEL remote_customer_churn
FUNCTION remote_fn_customer_churn_predict (varchar, int, float, float)
RETURNS int
SAGEMAKER 'customer-churn-endpoint'
IAM_ROLE default;
```

 The following example creates a BYOM with remote inference with a large language model model (LLM). LLMs hosted on Amazon SageMaker AI Jumpstart accept and return the `application/json` content type and they support a single JSON per invocation. The input and returned data types must be `SUPER` and MAX\$1BATCH\$1ROWS must be set to 1. 

```
CREATE MODEL sample_super_data_model
FUNCTION sample_super_data_model_predict(super)
RETURNS super
SAGEMAKER 'sample_super_data_model_endpoint'
IAM_ROLE default
SETTINGS (MAX_BATCH_ROWS 1);
```

## CREATE MODEL with K-MEANS
<a name="r_k-means_create_model"></a>

Amazon Redshift supports the K-Means algorithm that groups data that isn't labeled. This algorithm solves clustering problems where you want to discover groupings in the data. Unclassified data is grouped and partitioned based on its similarities and differences. 

### CREATE MODEL with K-MEANS syntax
<a name="r_k-means-create-model-synposis"></a>

```
CREATE MODEL model_name
FROM { table_name | ( select_statement ) }
FUNCTION function_name
IAM_ROLE { default | 'arn:aws:iam::<account-id>:role/<role-name>' }
AUTO OFF
MODEL_TYPE KMEANS
PREPROCESSORS 'string'
HYPERPARAMETERS DEFAULT EXCEPT ( K 'val' [, ...] )
SETTINGS (
  S3_BUCKET 'amzn-s3-demo-bucket',
  KMS_KEY_ID 'kms_string', |
    -- optional
  S3_GARBAGE_COLLECT on / off, |
    -- optional
  MAX_CELLS integer, |
    -- optional
  MAX_RUNTIME integer
    -- optional);
```

### CREATE MODEL with K-MEANS parameters
<a name="r_k-means-create-model-parameters"></a>

 *AUTO OFF*   
Turns off CREATE MODEL automatic discovery of preprocessor, algorithm, and hyper-parameters selection.

MODEL\$1TYPE KMEANS  
Specifies to use KMEANS to train the model. 

PREPROCESSORS 'string'  
Specifies certain combinations of preprocessors to certain sets of columns. The format is a list of columnSets, and the appropriate transforms to be applied to each set of columns. Amazon Redshift supports 3 K-Means preprocessors, namely StandardScaler, MinMax, and NumericPassthrough. If you don't want to apply any preprocessing for K-Means, choose NumericPassthrough explicitly as a transformer. For more information about supported transformers, see [CREATE MODEL with user guidance parameters](#r_user_guidance-create-model-parameters).  
The K-Means algorithm uses Euclidean distance to calculate similarity. Preprocessing the data ensures that the features of the model stay on the same scale and produce reliable results.

HYPERPARAMETERS DEFAULT EXCEPT ( K 'val' [, ...] )  
Specifies whether the K-Means parameters are used. You must specify the `K` parameter when using the K-Means algorithm. For more information, see [K-Means Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/k-means-api-config.html) in the *Amazon SageMaker AI Developer Guide*

The following example prepares data for K-Means.

```
CREATE MODEL customers_clusters
FROM customers
FUNCTION customers_cluster
IAM_ROLE default
AUTO OFF
MODEL_TYPE KMEANS
PREPROCESSORS '[
{
  "ColumnSet": [ "*" ],
  "Transformers": [ "NumericPassthrough" ]
}
]'
HYPERPARAMETERS DEFAULT EXCEPT ( K '5' )
SETTINGS (S3_BUCKET 'amzn-s3-demo-bucket');

select customer_id, customers_cluster(...) from customers;
customer_id | customers_cluster
--------------------
12345            1
12346            2
12347            4
12348
```

## CREATE MODEL with Forecast
<a name="r_forecast_model"></a>

Forecast models in Redshift ML use Amazon Forecast to create accurate time-series forecasts. Doing so lets you use historical data over a time period to make predictions about future events. Common use cases of Amazon Forecast include using retail product data to decide how to price inventory, manufacturing quantity data to predict how much of one item to order, and web traffic data to forecast how much traffic a web server might receive. 

 [Quota limits from Amazon Forecast](https://docs.aws.amazon.com/forecast/latest/dg/limits.html) are enforced in Amazon Redshift forecast models. For example, the maximum number of forecasts is 100, but it's adjustable. Dropping a forecast model doesn’t automatically delete the associated resources in Amazon Forecast. If you delete a Redshift cluster, all associated models are dropped as well. 

Note that Forecast models are currently only available in the following Regions:
+ US East (Ohio) (us-east-2)
+ US East (N. Virginia) (us-east-1)
+ US West (Oregon) (us-west-2)
+ Asia Pacific (Mumbai) (ap-south-1)
+ Asia Pacific (Seoul) (ap-northeast-2)
+ Asia Pacific (Singapore) (ap-southeast-1)
+ Asia Pacific (Sydney) (ap-southeast-2)
+ Asia Pacific (Tokyo) (ap-northeast-1)
+ Europe (Frankfurt) (eu-central-1)
+ Europe (Ireland) (eu-west-1)

### CREATE MODEL with Forecast syntax
<a name="r_forecast_model-synopsis"></a>

```
CREATE [ OR REPLACE ] MODEL forecast_model_name 
FROM { table_name | ( select_query ) } 
TARGET column_name
IAM_ROLE { default | 'arn:aws:iam::<account-id>:role/<role-name>'} 
AUTO ON
MODEL_TYPE FORECAST
SETTINGS (
  S3_BUCKET 'amzn-s3-demo-bucket',
  HORIZON integer,
  FREQUENCY forecast_frequency
  [PERCENTILES '0.1', '0.5', '0.9']
  )
```

### CREATE MODEL with Forecast parameters
<a name="r_forecast_model-parameters"></a>

 *forecast\$1model\$1name*   
The name of the model. The model name must be unique.

FROM \$1 table\$1name \$1 ( select\$1query ) \$1  
The table\$1name or the query that specifies the training data. This can either be an existing table in the system, or an Amazon Redshift compatible SELECT query enclosed with parentheses. The table or query result must have at least three columns: (1) a varchar column that specifies the name of the time-series. Each dataset can have multiple time-series; (2) a datetime column; and (3) the target column to predict. This target column must be either an int or a float. If you supply a dataset that has more than three columns, Amazon Redshift assumes that all additional columns are part of a related time series. Note that related time series must be of type int or float. For more information about related time series, see [Using Related Time Series Datasets](https://docs.aws.amazon.com/forecast/latest/dg/related-time-series-datasets.html).

TARGET column\$1name  
The name of the column that becomes the prediction target. The column must exist in the FROM clause.

IAM\$1ROLE \$1 default \$1 'arn:aws:iam::<account-id>:role/<role-name>' \$1  
Use the default keyword to have Amazon Redshift use the IAM role that is set as default and associated with the cluster when the CREAT MODEL command runs. Alternatively, you can specify an ARN of an IAM role to use that role. 

AUTO ON  
Turns on the CREATE MODEL automatic discovery of algorithm and hyper-parameters selection. Specifying on when creating a Forecast model indicates to use a Forecast AutoPredictor, where Amazon Forecast applies the optimal combinations of algorithms to each time series in your dataset.

MODEL\$1TYPE FORECAST  
Specifies to use FORECAST to train the model.

S3\$1BUCKET 'amzn-s3-demo-bucket'  
The name of the Amazon Simple Storage Service bucket that you previously created and that’s used to share training data and artifacts between Amazon Redshift and Amazon Forecast. Amazon Redshift creates a subfolder in this bucket before unloading the training data. When training is complete, Amazon Redshift deletes the created subfolder and its contents.

HORIZON integer  
The maximum number of predictions the forecast model can return. Once the model is trained, you can't change this integer.

FREQUENCY forecast\$1frequency  
Specifies how granular you want the forecasts to be. Available options are `Y | M | W | D | H | 30min | 15min | 10min | 5min | 1min`. Required if you’re training a forecast model.

PERCENTILES string  
A comma-delimited string that specifies the forecast types used to train a predictor. Forecast types can be quantiles from 0.01 to 0.99, by increments of 0.01 or higher. You can also specify the mean forecast with mean. You can specify a maximum of five forecast types.

The following example demonstrates how to create a simple forecast model.

```
CREATE MODEL forecast_example
FROM forecast_electricity_
TARGET target 
IAM_ROLE 'arn:aws:iam::<account-id>:role/<role-name>'
AUTO ON 
MODEL_TYPE FORECAST
SETTINGS (S3_BUCKET 'amzn-s3-demo-bucket',
          HORIZON 24,
          FREQUENCY 'H',
          PERCENTILES '0.25,0.50,0.75,mean',
          S3_GARBAGE_COLLECT OFF);
```

After you create the forecast model, you can create a new table with the prediction data.

```
CREATE TABLE forecast_model_results as SELECT Forecast(forecast_example)
```

You can then query the new table to get predictions.

```
SELECT * FROM forecast_model_results
```