

# BlazingText Hyperparameters
Hyperparameters

When you start a training job with a `CreateTrainingJob` request, you specify a training algorithm. You can also specify algorithm-specific hyperparameters as string-to-string maps. The hyperparameters for the BlazingText algorithm depend on which mode you use: Word2Vec (unsupervised) and Text Classification (supervised).

## Word2Vec Hyperparameters
Word2Vec Hyperparameters

The following table lists the hyperparameters for the BlazingText Word2Vec training algorithm provided by Amazon SageMaker AI.


| Parameter Name | Description | 
| --- | --- | 
| mode |  The Word2vec architecture used for training. **Required** Valid values: `batch_skipgram`, `skipgram`, or `cbow`  | 
| batch\$1size |  The size of each batch when `mode` is set to `batch_skipgram`. Set to a number between 10 and 20. **Optional** Valid values: Positive integer Default value: 11  | 
| buckets |  The number of hash buckets to use for subwords. **Optional** Valid values: positive integer Default value: 2000000  | 
| epochs |  The number of complete passes through the training data. **Optional** Valid values: Positive integer Default value: 5  | 
| evaluation |  Whether the trained model is evaluated using the [WordSimilarity-353 Test](http://www.gabrilovich.com/resources/data/wordsim353/wordsim353.html). **Optional** Valid values: (Boolean) `True` or `False` Default value: `True`  | 
| learning\$1rate |  The step size used for parameter updates. **Optional** Valid values: Positive float Default value: 0.05  | 
| min\$1char |  The minimum number of characters to use for subwords/character n-grams. **Optional** Valid values: positive integer Default value: 3  | 
| min\$1count |  Words that appear less than `min_count` times are discarded. **Optional** Valid values: Non-negative integer Default value: 5  | 
| max\$1char |  The maximum number of characters to use for subwords/character n-grams **Optional** Valid values: positive integer Default value: 6  | 
| negative\$1samples |  The number of negative samples for the negative sample sharing strategy. **Optional** Valid values: Positive integer Default value: 5  | 
| sampling\$1threshold |  The threshold for the occurrence of words. Words that appear with higher frequency in the training data are randomly down-sampled. **Optional** Valid values: Positive fraction. The recommended range is (0, 1e-3] Default value: 0.0001  | 
| subwords |  Whether to learn subword embeddings on not. **Optional** Valid values: (Boolean) `True` or `False` Default value: `False`  | 
| vector\$1dim |  The dimension of the word vectors that the algorithm learns. **Optional** Valid values: Positive integer Default value: 100  | 
| window\$1size |  The size of the context window. The context window is the number of words surrounding the target word used for training. **Optional** Valid values: Positive integer Default value: 5  | 

## Text Classification Hyperparameters
Text Classification Hyperparameters

The following table lists the hyperparameters for the Text Classification training algorithm provided by Amazon SageMaker AI.

**Note**  
Although some of the parameters are common between the Text Classification and Word2Vec modes, they might have different meanings depending on the context.


| Parameter Name | Description | 
| --- | --- | 
| mode |  The training mode. **Required** Valid values: `supervised`  | 
| buckets |  The number of hash buckets to use for word n-grams. **Optional** Valid values: Positive integer Default value: 2000000  | 
| early\$1stopping |  Whether to stop training if validation accuracy doesn't improve after a `patience` number of epochs. Note that a validation channel is required if early stopping is used. **Optional** Valid values: (Boolean) `True` or `False` Default value: `False`  | 
| epochs |  The maximum number of complete passes through the training data. **Optional** Valid values: Positive integer Default value: 5  | 
| learning\$1rate |  The step size used for parameter updates. **Optional** Valid values: Positive float Default value: 0.05  | 
| min\$1count |  Words that appear less than `min_count` times are discarded. **Optional** Valid values: Non-negative integer Default value: 5  | 
| min\$1epochs |  The minimum number of epochs to train before early stopping logic is invoked. **Optional** Valid values: Positive integer Default value: 5  | 
| patience |  The number of epochs to wait before applying early stopping when no progress is made on the validation set. Used only when `early_stopping` is `True`. **Optional** Valid values: Positive integer Default value: 4  | 
| vector\$1dim |  The dimension of the embedding layer. **Optional** Valid values: Positive integer Default value: 100  | 
| word\$1ngrams |  The number of word n-gram features to use. **Optional** Valid values: Positive integer Default value: 2  | 