Model requirements for training and validation datasets
The following sections list the requirements for training and validation datasets for a model. For information about dataset constraints for Amazon Nova models, see Fine-tuning Amazon Nova models.
| Description | Maximum (Fine-tuning) |
|---|---|
| Sum of input and output tokens when batch size is 1 | 4,096 |
| Sum of input and output tokens when batch size is 2, 3, or 4 | N/A |
| Character quota per sample in dataset | Token quota x 6 (estimated) |
| Training dataset file size | 1 GB |
| Validation dataset file size | 100 MB |
| Description | Maximum (Continued Pre-training) | Maximum (Fine-tuning) |
|---|---|---|
| Sum of input and output tokens when batch size is 1 | 4,096 | 4,096 |
| Sum of input and output tokens when batch size is 2, 3, or 4 | 2,048 | 2,048 |
| Character quota per sample in dataset | Token quota x 6 (estimated) | Token quota x 6 (estimated) |
| Training dataset file size | 10 GB | 1 GB |
| Validation dataset file size | 100 MB | 100 MB |
| Description | Maximum (Continued Pre-training) | Maximum (Fine-tuning) |
|---|---|---|
| Sum of input and output tokens when batch size is 1 or 2 | 4,096 | 4,096 |
| Sum of input and output tokens when batch size is 3, 4, 5, or 6 | 2,048 | 2,048 |
| Character quota per sample in dataset | Token quota x 6 (estimated) | Token quota x 6 (estimated) |
| Training dataset file size | 10 GB | 1 GB |
| Validation dataset file size | 100 MB | 100 MB |
| Description | Minimum (Fine-tuning) | Maximum (Fine-tuning) |
|---|---|---|
| Text prompt length in training sample, in characters | 3 | 1,024 |
| Records in a training dataset | 5 | 10,000 |
| Input image size | 0 | 50 MB |
| Input image height in pixels | 512 | 4,096 |
| Input image width in pixels | 512 | 4,096 |
| Input image total pixels | 0 | 12,582,912 |
| Input image aspect ratio | 1:4 | 4:1 |
| Description | Minimum (Fine-tuning) | Maximum (Fine-tuning) |
|---|---|---|
| Text prompt length in training sample, in characters | 0 | 2,560 |
| Records in a training dataset | 1,000 | 500,000 |
| Input image size | 0 | 5 MB |
| Input image height in pixels | 128 | 4096 |
| Input image width in pixels | 128 | 4096 |
| Input image total pixels | 0 | 12,528,912 |
| Input image aspect ratio | 1:4 | 4:1 |
| Description | Minimum (Fine-tuning) | Maximum (Fine-tuning) |
|---|---|---|
| Input tokens | 0 | 16,000 |
| Output tokens | 0 | 16,000 |
| Character quota per sample in dataset | 0 | Token quota x 6 (estimated) |
| Sum of Input and Output tokens | 0 | 16,000 |
| Sum of training and validation records | 100 | 10,000 (adjustable using service quotas) |
Supported image formats for Meta Llama-3.2 11B Vision Instruct and Meta
Llama-3.2 90B Vision Instruct include: gif, jpeg,
png, and webp. For estimating the image-to-token conversion during
fine-tuning of these models, you can use this formula as an approximation: Tokens = min(2,
max(Height // 560, 1)) * min(2, max(Width // 560, 1)) * 1601. Images are converted
into approximately 1,601 to 6,404 tokens based on their size.
| Description | Minimum (Fine-tuning) | Maximum (Fine-tuning) |
|---|---|---|
| Sum of Input and Output tokens | 0 | 16,000 (10000 for Meta Llama 3.2 90B) |
| Sum of training and validation records | 100 | 10,000 (adjustable using service quotas) |
| Input image size for Meta Llama 11B and 90B instruct models) | 0 | 10 MB |
| Input image height in pixels for Meta Llama 11B and 90B instruct models | 10 | 8192 |
| Input image width in pixels for Meta Llama 11B and 90B90B instruct models | 10 | 8192 |
| Description | Minimum (Fine-tuning) | Maximum (Fine-tuning) |
|---|---|---|
| Sum of Input and output tokens | 0 | 16000 |
| Sum of training and validation records | 100 | 10,000 (adjustable using Service Quotas) |
| Description | Maximum (Fine-tuning) |
|---|---|
| Input tokens | 4,096 |
| Output tokens | 2,048 |
| Character quota per sample in dataset | Token quota x 6 (estimated) |
| Records in a training dataset | 10,000 |
| Records in a validation dataset | 1,000 |
| Description | Maximum (Fine-tuning) |
|---|---|
| Minimum number of records | 32 |
| Maximum training records | 10,000 |
| Maximum validation records | 1,000 |
| Maximum total records | 10,000 (adjustable using service quotas) |
| Maximum tokens | 32,000 |
| Maximum training dataset size | 10 GB |
| Maximum validation dataset size | 1 GB |