

# Object Detection Hyperparameters
<a name="object-detection-api-config"></a>

In the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request, you specify the training algorithm that you want to use. You can also specify algorithm-specific hyperparameters that are used to help estimate the parameters of the model from a training dataset. The following table lists the hyperparameters provided by Amazon SageMaker AI for training the object detection algorithm. For more information about how object training works, see [How Object Detection Works](algo-object-detection-tech-notes.md).


| Parameter Name | Description | 
| --- | --- | 
| num\$1classes |  The number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset. **Required** Valid values: positive integer  | 
| num\$1training\$1samples |  The number of training examples in the input dataset.  If there is a mismatch between this value and the number of samples in the training set, then the behavior of the `lr_scheduler_step` parameter will be undefined and distributed training accuracy may be affected.  **Required** Valid values: positive integer  | 
| base\$1network |  The base network architecture to use. **Optional** Valid values: 'vgg-16' or 'resnet-50' Default value: 'vgg-16'  | 
| early\$1stopping |  `True` to use early stopping logic during training. `False` not to use it. **Optional** Valid values: `True` or `False` Default value: `False`  | 
| early\$1stopping\$1min\$1epochs |  The minimum number of epochs that must be run before the early stopping logic can be invoked. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 10  | 
| early\$1stopping\$1patience |  The number of epochs to wait before ending training if no improvement, as defined by the `early_stopping_tolerance` hyperparameter, is made in the relevant metric. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 5  | 
| early\$1stopping\$1tolerance |  The tolerance value that the relative improvement in `validation:mAP`, the mean average precision (mAP), is required to exceed to avoid early stopping. If the ratio of the change in the mAP divided by the previous best mAP is smaller than the `early_stopping_tolerance` value set, early stopping considers that there is no improvement. It is used only when `early_stopping` = `True`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.0  | 
| image\$1shape |  The image size for input images. We rescale the input image to a square image with this size. We recommend using 300 and 512 for better performance. **Optional** Valid values: positive integer ≥300 Default: 300  | 
| epochs |  The number of training epochs.  **Optional** Valid values: positive integer Default: 30  | 
| freeze\$1layer\$1pattern |  The regular expression (regex) for freezing layers in the base network. For example, if we set `freeze_layer_pattern` = `"^(conv1_\|conv2_).*"`, then any layers with a name that contains `"conv1_"` or `"conv2_"` are frozen, which means that the weights for these layers are not updated during training. The layer names can be found in the network symbol files [vgg16-symbol.json](http://data.mxnet.io/models/imagenet/vgg/vgg16-symbol.json ) and [resnet-50-symbol.json](http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-symbol.json). Freezing a layer means that its weights can not be modified further. This can reduce training time significantly in exchange for modest losses in accuracy. This technique is commonly used in transfer learning where the lower layers in the base network do not need to be retrained. **Optional** Valid values: string Default: No layers frozen.  | 
| kv\$1store |  The weight update synchronization mode used for distributed training. The weights can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See the [Distributed Training](https://mxnet.apache.org/api/faq/distributed_training) MXNet tutorial for details.  This parameter is not applicable to single machine training.  **Optional** Valid values: `'dist_sync'` or `'dist_async'` [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html) Default: -  | 
| label\$1width |  The force padding label width used to sync across training and validation data. For example, if one image in the data contains at most 10 objects, and each object's annotation is specified with 5 numbers, [class\$1id, left, top, width, height], then the `label_width` should be no smaller than (10\$15 \$1 header information length). The header information length is usually 2. We recommend using a slightly larger `label_width` for the training, such as 60 for this example. **Optional** Valid values: Positive integer large enough to accommodate the largest annotation information length in the data. Default: 350  | 
| learning\$1rate |  The initial learning rate. **Optional** Valid values: float in (0, 1] Default: 0.001  | 
| lr\$1scheduler\$1factor |  The ratio to reduce learning rate. Used in conjunction with the `lr_scheduler_step` parameter defined as `lr_new` = `lr_old` \$1 `lr_scheduler_factor`. **Optional** Valid values: float in (0, 1) Default: 0.1  | 
| lr\$1scheduler\$1step |  The epochs at which to reduce the learning rate. The learning rate is reduced by `lr_scheduler_factor` at epochs listed in a comma-delimited string: "epoch1, epoch2, ...". For example, if the value is set to "10, 20" and the `lr_scheduler_factor` is set to 1/2, then the learning rate is halved after 10th epoch and then halved again after 20th epoch. **Optional** Valid values: string Default: empty string  | 
| mini\$1batch\$1size |  The batch size for training. In a single-machine multi-gpu setting, each GPU handles `mini_batch_size`/`num_gpu` training samples. For the multi-machine training in `dist_sync` mode, the actual batch size is `mini_batch_size`\$1number of machines. A large `mini_batch_size` usually leads to faster training, but it may cause out of memory problem. The memory usage is related to `mini_batch_size`, `image_shape`, and `base_network` architecture. For example, on a single p3.2xlarge instance, the largest `mini_batch_size` without an out of memory error is 32 with the base\$1network set to "resnet-50" and an `image_shape` of 300. With the same instance, you can use 64 as the `mini_batch_size` with the base network `vgg-16` and an `image_shape` of 300. **Optional** Valid values: positive integer Default: 32  | 
| momentum |  The momentum for `sgd`. Ignored for other optimizers. **Optional** Valid values: float in (0, 1] Default: 0.9  | 
| nms\$1threshold |  The non-maximum suppression threshold. **Optional** Valid values: float in (0, 1] Default: 0.45  | 
| optimizer |  The optimizer types. For details on optimizer values, see [MXNet's API](https://mxnet.apache.org/api/python/docs/api/). **Optional** Valid values: ['sgd', 'adam', 'rmsprop', 'adadelta'] Default: 'sgd'  | 
| overlap\$1threshold |  The evaluation overlap threshold. **Optional** Valid values: float in (0, 1] Default: 0.5  | 
| use\$1pretrained\$1model |  Indicates whether to use a pre-trained model for training. If set to 1, then the pre-trained model with corresponding architecture is loaded and used for training. Otherwise, the network is trained from scratch. **Optional** Valid values: 0 or 1 Default: 1  | 
| weight\$1decay |  The weight decay coefficient for `sgd` and `rmsprop`. Ignored for other optimizers. **Optional** Valid values: float in (0, 1) Default: 0.0005  | 