# TabTransformer
TabTransformer Algorithm

[TabTransformer](https://arxiv.org/abs/2012.06678) is a novel deep tabular data modeling architecture for supervised learning. The TabTransformer architecture is built on self-attention-based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. This page includes information about Amazon EC2 instance recommendations and sample notebooks for TabTransformer.

# How to use SageMaker AI TabTransformer
How to use TabTransformer

You can use TabTransformer as an Amazon SageMaker AI built-in algorithm. The following section describes how to use TabTransformer with the SageMaker Python SDK. For information on how to use TabTransformer from the Amazon SageMaker Studio Classic UI, see [SageMaker JumpStart pretrained models](studio-jumpstart.md).
+ **Use TabTransformer as a built-in algorithm**

  Use the TabTransformer built-in algorithm to build a TabTransformer training container as shown in the following code example. You can automatically spot the TabTransformer built-in algorithm image URI using the SageMaker AI `image_uris.retrieve` API (or the `get_image_uri` API if using [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) version 2). 

  After specifying the TabTransformer image URI, you can use the TabTransformer container to construct an estimator using the SageMaker AI Estimator API and initiate a training job. The TabTransformer built-in algorithm runs in script mode, but the training script is provided for you and there is no need to replace it. If you have extensive experience using script mode to create a SageMaker training job, then you can incorporate your own TabTransformer training scripts.

  ```
  from sagemaker import image_uris, model_uris, script_uris
  
  train_model_id, train_model_version, train_scope = "pytorch-tabtransformerclassification-model", "*", "training"
  training_instance_type = "ml.p3.2xlarge"
  
  # Retrieve the docker image
  train_image_uri = image_uris.retrieve(
      region=None,
      framework=None,
      model_id=train_model_id,
      model_version=train_model_version,
      image_scope=train_scope,
      instance_type=training_instance_type
  )
  
  # Retrieve the training script
  train_source_uri = script_uris.retrieve(
      model_id=train_model_id, model_version=train_model_version, script_scope=train_scope
  )
  
  train_model_uri = model_uris.retrieve(
      model_id=train_model_id, model_version=train_model_version, model_scope=train_scope
  )
  
  # Sample training data is available in this bucket
  training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
  training_data_prefix = "training-datasets/tabular_binary/"
  
  training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}/train"
  validation_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}/validation"
  
  output_bucket = sess.default_bucket()
  output_prefix = "jumpstart-example-tabular-training"
  
  s3_output_location = f"s3://{output_bucket}/{output_prefix}/output"
  
  from sagemaker import hyperparameters
  
  # Retrieve the default hyperparameters for training the model
  hyperparameters = hyperparameters.retrieve_default(
      model_id=train_model_id, model_version=train_model_version
  )
  
  # [Optional] Override default hyperparameters with custom values
  hyperparameters[
      "n_epochs"
  ] = "50"
  print(hyperparameters)
  
  from sagemaker.estimator import Estimator
  from sagemaker.utils import name_from_base
  
  training_job_name = name_from_base(f"built-in-algo-{train_model_id}-training")
  
  # Create SageMaker Estimator instance
  tabular_estimator = Estimator(
      role=aws_role,
      image_uri=train_image_uri,
      source_dir=train_source_uri,
      model_uri=train_model_uri,
      entry_point="transfer_learning.py",
      instance_count=1,
      instance_type=training_instance_type,
      max_run=360000,
      hyperparameters=hyperparameters,
      output_path=s3_output_location
  )
  
  # Launch a SageMaker Training job by passing the S3 path of the training data
  tabular_estimator.fit(
      {
          "training": training_dataset_s3_path,
          "validation": validation_dataset_s3_path,
      }, logs=True, job_name=training_job_name
  )
  ```

  For more information about how to set up the TabTransformer as a built-in algorithm, see the following notebook examples.
  + [Tabular classification with Amazon SageMaker AI TabTransformer algorithm](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Classification_TabTransformer.ipynb)
  + [Tabular regression with Amazon SageMaker AI TabTransformer algorithm](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Regression_TabTransformer.ipynb)

# Input and Output interface for the TabTransformer algorithm


TabTransformer operates on tabular data, with the rows representing observations, one column representing the target variable or label, and the remaining columns representing features. 

The SageMaker AI implementation of TabTransformer supports CSV for training and inference:
+ For **Training ContentType**, valid inputs must be *text/csv*.
+ For **Inference ContentType**, valid inputs must be *text/csv*.

**Note**  
For CSV training, the algorithm assumes that the target variable is in the first column and that the CSV does not have a header record.   
For CSV inference, the algorithm assumes that CSV input does not have the label column. 

**Input format for training data, validation data, and categorical features**

Be mindful of how to format your training data for input to the TabTransformer model. You must provide the path to an Amazon S3 bucket that contains your training and validation data. You can also include a list of categorical features. Use both the `training` and `validation` channels to provide your input data. Alternatively, you can use only the `training` channel.

**Use both the `training` and `validation` channels**

You can provide your input data by way of two S3 paths, one for the `training` channel and one for the `validation` channel. Each S3 path can either be an S3 prefix that points to one or more CSV files or a full S3 path pointing to one specific CSV file. The target variables should be in the first column of your CSV file. The predictor variables (features) should be in the remaining columns. If multiple CSV files are provided for the `training` or `validation` channels, the TabTransformer algorithm concatenates the files. The validation data is used to compute a validation score at the end of each boosting iteration. Early stopping is applied when the validation score stops improving.

If your predictors include categorical features, you can provide a JSON file named `categorical_index.json` in the same location as your training data file or files. If you provide a JSON file for categorical features, your `training` channel must point to an S3 prefix and not a specific CSV file. This file should contain a Python dictionary where the key is the string `"cat_index_list"` and the value is a list of unique integers. Each integer in the value list should indicate the column index of the corresponding categorical features in your training data CSV file. Each value should be a positive integer (greater than zero because zero represents the target value), less than the `Int32.MaxValue` (2147483647), and less than the total number of columns. There should only be one categorical index JSON file.

**Use only the `training` channel**:

You can alternatively provide your input data by way of a single S3 path for the `training` channel. This S3 path should point to a directory with a subdirectory named `training/` that contains one or more CSV files. You can optionally include another subdirectory in the same location called `validation/` that also has one or more CSV files. If the validation data is not provided, then 20% of your training data is randomly sampled to serve as the validation data. If your predictors include categorical features, you can provide a JSON file named `categorical_index.json` in the same location as your data subdirectories.

**Note**  
For CSV training input mode, the total memory available to the algorithm (instance count multiplied by the memory available in the `InstanceType`) must be able to hold the training dataset.

## Amazon EC2 instance recommendation for the TabTransformer algorithm


SageMaker AI TabTransformer supports single-instance CPU and single-instance GPU training. Despite higher per-instance costs, GPUs train more quickly, making them more cost effective. To take advantage of GPU training, specify the instance type as one of the GPU instances (for example, P3). SageMaker AI TabTransformer currently does not support multi-GPU training.

## TabTransformer sample notebooks
Sample Notebooks

The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker AI TabTransformer algorithm.


****  

| **Notebook Title** | **Description** | 
| --- | --- | 
|  [Tabular classification with Amazon SageMaker AI TabTransformer algorithm](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Classification_TabTransformer.ipynb)  |  This notebook demonstrates the use of the Amazon SageMaker AI TabTransformer algorithm to train and host a tabular classification model.   | 
|  [Tabular regression with Amazon SageMaker AI TabTransformer algorithm](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Regression_TabTransformer.ipynb)  |  This notebook demonstrates the use of the Amazon SageMaker AI TabTransformer algorithm to train and host a tabular regression model.   | 

For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). After you have created a notebook instance and opened it, choose the **SageMaker AI Examples** tab to see a list of all of the SageMaker AI samples. To open a notebook, choose its **Use** tab and choose **Create copy**.

# How TabTransformer works
How It Works

TabTransformer is a novel deep tabular data modeling architecture for supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability.

TabTransformer performs well in machine learning competitions because of its robust handling of a variety of data types, relationships, distributions, and the diversity of hyperparameters that you can fine-tune. You can use TabTransformer for regression, classification (binary and multiclass), and ranking problems.

The following diagram illustrates the TabTransformer architecture.

![\[The architecture of TabTransformer.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/tabtransformer_illustration.png)


For more information, see *[TabTransformer: Tabular Data Modeling Using Contextual Embeddings](https://arxiv.org/abs/2012.06678)*.

# TabTransformer hyperparameters
Hyperparameters

The following table contains the subset of hyperparameters that are required or most commonly used for the Amazon SageMaker AI TabTransformer algorithm. Users set these parameters to facilitate the estimation of model parameters from data. The SageMaker AI TabTransformer algorithm is an implementation of the open-source [TabTransformer](https://github.com/jrzaurin/pytorch-widedeep) package.

**Note**  
The default hyperparameters are based on example datasets in the [TabTransformer sample notebooks](tabtransformer.md#tabtransformer-sample-notebooks).

The SageMaker AI TabTransformer algorithm automatically chooses an evaluation metric and objective function based on the type of classification problem. The TabTransformer algorithm detects the type of classification problem based on the number of labels in your data. For regression problems, the evaluation metric is r square and the objective function is mean square error. For binary classification problems, the evaluation metric and objective function are both binary cross entropy. For multiclass classification problems, the evaluation metric and objective function are both multiclass cross entropy.

**Note**  
The TabTransformer evaluation metric and objective functions are not currently available as hyperparameters. Instead, the SageMaker AI TabTransformer built-in algorithm automatically detects the type of classification task (regression, binary, or multiclass) based on the number of unique integers in the label column and assigns an evaluation metric and objective function.


| Parameter Name | Description | 
| --- | --- | 
| n\$1epochs |  Number of epochs to train the deep neural network. Valid values: integer, range: Positive integer. Default value: `5`.  | 
| patience |  The training will stop if one metric of one validation data point does not improve in the last `patience` round. Valid values: integer, range: (`2`, `60`). Default value: `10`.  | 
| learning\$1rate |  The rate at which the model weights are updated after working through each batch of training examples. Valid values: float, range: Positive floating point number. Default value: `0.001`.  | 
| batch\$1size |  The number of examples propagated through the network.  Valid values: integer, range: (`1`, `2048`). Default value: `256`.  | 
| input\$1dim |  The dimension of embeddings to encode the categorical and/or continuous columns. Valid values: string, any of the following: `"16"`, `"32"`, `"64"`, `"128"`, `"256"`, or `"512"`. Default value: `"32"`.  | 
| n\$1blocks |  The number of Transformer encoder blocks. Valid values: integer, range: (`1`, `12`). Default value: `4`.  | 
| attn\$1dropout |  Dropout rate applied to the Multi-Head Attention layers. Valid values: float, range: (`0`, `1`). Default value: `0.2`.  | 
| mlp\$1dropout |  Dropout rate applied to the FeedForward network within the encoder layers and the final MLP layers on top of Transformer encoders. Valid values: float, range: (`0`, `1`). Default value: `0.1`.  | 
| frac\$1shared\$1embed |  The fraction of embeddings shared by all the different categories for one particular column. Valid values: float, range: (`0`, `1`). Default value: `0.25`.  | 

# Tune a TabTransformer model
Model Tuning

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your training and validation datasets. Model tuning focuses on the following hyperparameters: 

**Note**  
The learning objective function and evaluation metric are both automatically assigned based on the type of classification task, which is determined by the number of unique integers in the label column. For more information, see [TabTransformer hyperparameters](tabtransformer-hyperparameters.md).
+ A learning objective function to optimize during model training
+ An evaluation metric that is used to evaluate model performance during validation
+ A set of hyperparameters and a range of values for each to use when tuning the model automatically

Automatic model tuning searches your chosen hyperparameters to find the combination of values that results in a model that optimizes the chosen evaluation metric.

**Note**  
Automatic model tuning for TabTransformer is only available from the Amazon SageMaker SDKs, not from the SageMaker AI console.

For more information about model tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

## Evaluation metrics computed by the TabTransformer algorithm
Metrics

The SageMaker AI TabTransformer algorithm computes the following metrics to use for model validation. The evaluation metric is automatically assigned based on the type of classification task, which is determined by the number of unique integers in the label column.


| Metric Name | Description | Optimization Direction | Regex Pattern | 
| --- | --- | --- | --- | 
| r2 | r square | maximize | "metrics=\$1'r2': (\$1\$1S\$1)\$1" | 
| f1\$1score | binary cross entropy | maximize | "metrics=\$1'f1': (\$1\$1S\$1)\$1" | 
| accuracy\$1score | multiclass cross entropy | maximize | "metrics=\$1'accuracy': (\$1\$1S\$1)\$1" | 

## Tunable TabTransformer hyperparameters
Tunable Hyperparameters

Tune the TabTransformer model with the following hyperparameters. The hyperparameters that have the greatest effect on optimizing the TabTransformer evaluation metrics are: `learning_rate`, `input_dim`, `n_blocks`, `attn_dropout`, `mlp_dropout`, and `frac_shared_embed`. For a list of all the TabTransformer hyperparameters, see [TabTransformer hyperparameters](tabtransformer-hyperparameters.md).


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| learning\$1rate | ContinuousParameterRanges | MinValue: 0.001, MaxValue: 0.01 | 
| input\$1dim | CategoricalParameterRanges | [16, 32, 64, 128, 256, 512] | 
| n\$1blocks | IntegerParameterRanges | MinValue: 1, MaxValue: 12 | 
| attn\$1dropout | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.8 | 
| mlp\$1dropout | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.8 | 
| frac\$1shared\$1embed | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.5 |