

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 支持的框架 AWS 区域、实例类型和经过测试的模型
<a name="training-compiler-support"></a>

**重要**  
Amazon Web Services (AWS) 宣布， SageMaker 训练编译器将没有新版本或新版本。你可以继续通过现有的 Dee AWS p Learning Containers (DLCs) 使用 SageMaker SageMaker 训练编译器进行训练。值得注意的是，根据[AWS 深度学习容器（Deep Learning Containers Framework Support）政策 AWS](https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/support-policy.html)，虽然现有内容 DLCs 仍然可以访问，但它们将不再收到来自的补丁或更新。

在使用 T SageMaker raining Compiler 之前，请检查您选择的框架是否受支持，实例类型是否在您的支持框架中可用 AWS 账户，您的实例类型 AWS 账户 是否在支持的框架中 AWS 区域。

**注意**  
SageMaker 训练编译器在 SageMaker Python SDK v2.70.0 或更高版本中可用。

## 支持的框架
<a name="training-compiler-supported-frameworks"></a>

SageMaker Training Compiler 支持以下深度学习框架，可通过 Deep Learning C AWS ontainers 获得。

**Topics**
+ [PyTorch](#training-compiler-supported-frameworks-pytorch)
+ [TensorFlow](#training-compiler-supported-frameworks-tensorflow)

### PyTorch
<a name="training-compiler-supported-frameworks-pytorch"></a>

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/training-compiler-support.html)

### TensorFlow
<a name="training-compiler-supported-frameworks-tensorflow"></a>

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/training-compiler-support.html)

有关更多信息，请参阅 Dee *AWS p Learning Containers GitHub 存储库中的*[可用镜像](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)。

## AWS 区域
<a name="training-compiler-availablity-zone"></a>

[SageMaker 训练编译器容器](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-training-compiler-containers)可在使用 Dee [AWS p Le AWS 区域 arning](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) Containers 的地区使用，但中国地区除外。

## 支持的实例类型
<a name="training-compiler-supported-instance-types"></a>

SageMaker 训练编译器经过测试，支持以下 ML 实例类型。
+ P4 实例
+ P3 实例
+ G4dn 实例
+ G5 实例

有关实例类型的规格，请参阅 [Amazon EC2 实例类型](https://aws.amazon.com/ec2/instance-types/)页面中的**加速计算**部分。有关实例定价的信息，请参阅 [Amazon SageMaker 定价](https://aws.amazon.com/sagemaker/pricing/)。

如果您遇到类似以下内容的错误消息，请按照[请求增加 SageMaker AI 资源的服务配额中的说明进行](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure)操作。

```
ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling
the CreateTrainingJob operation: The account-level service limit 'ml.p3dn.24xlarge
for training job usage' is 0 Instances, with current utilization of 0 Instances
and a request delta of 1 Instances.
Please contact AWS support to request an increase for this limit.
```

## 经过测试的模型
<a name="training-compiler-tested-models"></a>

下表列出了使用 SageMaker 训练编译器测试过的模型。作为参考，内存中能够容纳的最大批量也包含在其他训练参数旁边。 SageMaker Training Compiler 可以更改模型训练过程的内存占用；因此，在训练过程中通常可以使用更大的批次大小，从而进一步缩短总训练时间。在某些情况下，Tra SageMaker ining Compiler 会智能地促进缓存，从而减少可容纳 GPU 的最大批量大小。您必须重新调整模型超参数并找到最适合您的案例的批处理大小。为了节省时间，请使用以下参考表来查找批处理大小，这将是您的使用案例的良好起点。

**注意**  
批处理大小是适合相应实例类型中的每个 GPU 的本地批处理大小。在更改批处理大小时，您还应调整学习率。

### PyTorch 1.13.1
<a name="training-compiler-tested-models-pt1131"></a>

**自然语言处理 (NLP) 模型**

在单节点和多节点、单或多 GPU 核心以及所示自动混合精度 (AMP) 的所有组合下，针对训练作业测试了以下模型。


| 单显node/multi-node single-GPU/multi卡 | 模型 | 数据集 | 实例类型 | 精度 | 序列长度 | 原生框架的批处理大小  |  SageMaker 训练编译器的批次大小  | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
| albert-base-v2 | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 128 | 80 | 192 | 
| albert-base-v2 | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 128 | 332 | 
| albert-base-v2 | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 80 | 224 | 
| bert-base-uncased | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 160 | 288 | 
| camembert-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 160 | 280 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 240 | 472 | 
| distilgpt2 | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 128 | 77 | 128 | 
| distilgpt2 | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 138 | 390 | 
| distilgpt2 | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 96 | 256 | 
| distilroberta-base | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 128 | 96 | 192 | 
| distilroberta-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 171 | 380 | 
| distilroberta-base | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 112 | 256 | 
| gpt2 | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 128 | 52 | 152 | 
| gpt2 | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 84 | 240 | 
| gpt2 | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 58 | 164 | 
| microsoft/deberta-base | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 128 | 48 | 128 | 
| microsoft/deberta-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 84 | 207 | 
| microsoft/deberta-base | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 53 | 133 | 
| roberta-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 125 | 224 | 
| xlm-roberta-base | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 128 | 16 | 31 | 
| xlm-roberta-base | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 18 | 50 | 
| xlnet-base-cased | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 128 | 240 | 
| bert-base-uncased | wikitext-103-v1 | g5.48xlarge | float16 | 512 | 29 | 50 | 
| distilbert-base-uncased | wikitext-103-v1 | g5.48xlarge | float16 | 512 | 45 | 64 | 
| gpt2 | wikitext-103-v1 | g5.48xlarge | float16 | 512 | 18 | 45 | 
| roberta-base | wikitext-103-v1 | g5.48xlarge | float16 | 512 | 23 | 44 | 
| gpt2 | wikitext-103-v1 | p4d.24xlarge | float16 | 512 | 36 | 64 | 

**计算机视觉 (CV) 模型**

如图所示，使用具有自动混合精度 (AMP) 的 M [TensorFlowodel Garden](https://github.com/tensorflow/models) 进行了测试。


| Single/multi-node single/multi-GPU | 模型 | 数据集 | 实例类型 | 精度 | 原生框架的批处理大小  |  SageMaker 训练编译器的批次大小  | 
| --- | --- | --- | --- | --- | --- | --- | 
| ResNet152 | food101 | g4dn.16xlarge | float16 | 128 | 144 | 
| ResNet152 | food101 | g5.4xlarge | float16 | 128 | 192 | 
| ResNet152 | food101 | p3.2xlarge | float16 | 152 | 156 | 
| ViT | food101 | g4dn.16xlarge | float16 | 512 | 512 | 
| ViT | food101 | g5.4xlarge | float16 | 992 | 768 | 
| ViT | food101 | p3.2xlarge | float16 | 848 | 768 | 

### PyTorch 1.12.0
<a name="training-compiler-tested-models-pt1120"></a>

**自然语言处理 (NLP) 模型**

在单节点和多节点、单或多 GPU 核心以及所示自动混合精度 (AMP) 的所有组合下，针对训练作业测试了以下模型。


| 单显node/multi-node single-GPU/multi卡 | 模型 | 数据集 | 实例类型 | 精度 | 序列长度 | 原生框架的批处理大小  |  SageMaker 训练编译器的批次大小  | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
| albert-base-v2 | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 128 | 248 | 
| bert-base-uncased | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 160 | 288 | 
| camembert-base | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 160 | 279 | 
| camembert-base | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 128 | 105 | 164 | 
| distilgpt2 | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 136 | 256 | 
| distilgpt2 | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 128 | 80 | 118 | 
| gpt2 | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 84 | 240 | 
| gpt2 | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 128 | 80 | 119 | 
| microsoft/deberta-base | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 93 | 197 | 
| microsoft/deberta-base | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 128 | 113 | 130 | 
| roberta-base | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 125 | 224 | 
| roberta-base | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 128 | 78 | 112 | 
| xlnet-base-cased | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 138 | 240 | 
| bert-base-uncased | wikitext-103-v1 | ml.p4d.24xlarge | float16 | 512 |  | 52 | 
| distilbert-base-uncased | wikitext-103-v1 | ml.p4d.24xlarge | float16 | 512 |  | 160 | 
| gpt2 | wikitext-103-v1 | ml.p4d.24xlarge | float16 | 512 |  | 25 | 
| roberta-base | wikitext-103-v1 | ml.p4d.24xlarge | float16 | 512 |  | 64 | 

### TensorFlow 2.11.0
<a name="training-compiler-tested-models-tf2110"></a>

**计算机视觉 (CV) 模型**

如图所示，使用具有自动混合精度 (AMP) 的 M [TensorFlowodel Garden](https://github.com/tensorflow/models) 进行了测试。


| Single/multi-node single/multi-GPU | 模型 | 数据集 | 实例类型 | 精度 | 原生框架的批处理大小  |  SageMaker 训练编译器的批次大小  | 
| --- | --- | --- | --- | --- | --- | --- | 
| maskrCNN-50-FPN ResNet | COCO-2017 | ml.g5.2xlarge | float16 | 6 | 8 | 
| maskrCNN-50-FPN ResNet | COCO-2017 | ml.p3.2xlarge | float16 | 4 | 6 | 
| ResNet50 | ImageNet | ml.g5.2xlarge | float16 | 192 | 256 | 
| ResNet50 | ImageNet | ml.p3.2xlarge | float16 | 256 | 256 | 
| ResNet101 | ImageNet | ml.g5.2xlarge | float16 | 128 | 256 | 
| ResNet101 | ImageNet | ml.p3.2xlarge | float16 | 128 | 128 | 
| ResNet152 | ImageNet | ml.g5.2xlarge | float16 | 128 | 224 | 
| ResNet152 | ImageNet | ml.p3.2xlarge | float16 | 128 | 128 | 
| VisionTransformer | ImageNet | ml.g5.2xlarge | float16 | 112 | 144 | 
| VisionTransformer | ImageNet | ml.p3.2xlarge | float16 | 96 | 128 | 

**自然语言处理 (NLP) 模型**

已结合使用带 `Sequence_Len=128` 的[转换器模型](https://github.com/huggingface/transformers)和自动混合精度 (AMP) 进行测试，如下所示。


| Single/multi-node single/multi-GPU | 模型 | 数据集 | 实例类型 | 精度 | 原生框架的批处理大小  |  SageMaker 训练编译器的批次大小  | 
| --- | --- | --- | --- | --- | --- | --- | 
| albert-base-v2 | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 160 | 197 | 
| albert-base-v2 | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 95 | 127 | 
| bert-base-uncased | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 160 | 128 | 
| bert-base-uncased | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 104 | 111 | 
| bert-large-uncased | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 65 | 48 | 
| bert-large-uncased | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 40 | 35 | 
| camembert-base | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 162 | 
| camembert-base | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 105 | 111 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 256 | 264 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 128 | 169 | 
| gpt2 | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 120 | 
| gpt2 | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 80 | 83 | 
| jplu/ tf-xlm-roberta-base | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 32 | 32 | 
| jplu/ tf-xlm-roberta-base | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 32 | 36 | 
| microsoft/mpnet-base | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 144 | 160 | 
| microsoft/mpnet-base | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 106 | 110 | 
| roberta-base | wikitext-2-raw-v1 | ml.g5.2xlarge | float16 | 128 | 128 | 
| roberta-base | wikitext-2-raw-v1 | ml.p3.2xlarge | float16 | 72 | 98 | 
| albert-base-v2 | wikitext-2-raw-v1 | ml.g5.48xlarge | float16 | 128 | 192 | 
| albert-base-v2 | wikitext-2-raw-v1 | ml.p3.16xlarge | float16 | 95 | 96 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | ml.g5.48xlarge | float16 | 256 | 256 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | ml.p3.16xlarge | float16 | 140 | 184 | 
| 谷歌/ electra-small-discriminator | wikitext-2-raw-v1 | ml.g5.48xlarge | float16 | 256 | 384 | 
| 谷歌/ electra-small-discriminator | wikitext-2-raw-v1 | ml.p3.16xlarge | float16 | 256 | 268 | 
| gpt2 | wikitext-2-raw-v1 | ml.g5.48xlarge | float16 | 116 | 116 | 
| gpt2 | wikitext-2-raw-v1 | ml.p3.16xlarge | float16 | 85 | 83 | 
| gpt2 | wikitext-2-raw-v1 | ml.p4d.24xlarge | float16 | 94 | 110 | 
| microsoft/mpnet-base | wikitext-2-raw-v1 | ml.g5.48xlarge | float16 | 187 | 164 | 
| microsoft/mpnet-base | wikitext-2-raw-v1 | ml.p3.16xlarge | float16 | 106 | 111 | 

### TensorFlow 2.10.0
<a name="training-compiler-tested-models-tf2100"></a>

**计算机视觉 (CV) 模型**

如图所示，使用具有自动混合精度 (AMP) 的 M [TensorFlowodel Garden](https://github.com/tensorflow/models) 进行了测试。


| 单节点单 GPU/多 GPU | 模型 | 数据集 | 实例类型 | 精度 | 原生框架的批处理大小  |  SageMaker 训练编译器的批次大小  | 
| --- | --- | --- | --- | --- | --- | --- | 
| DetectionTransformer-ResNet 50 | COCO-2017 | ml.g4dn.2xlarge | float32 | 2 | 4 | 
| DetectionTransformer-ResNet 50 | COCO-2017 | ml.g5.2xlarge | float32 | 3 | 6 | 
| DetectionTransformer-ResNet 50 | COCO-2017 | ml.p3.2xlarge | float32 | 2 | 4 | 
| maskrCNN-50-FPN ResNet | COCO-2017 | ml.g4dn.2xlarge | float16 | 4 | 6 | 
| maskrCNN-50-FPN ResNet | COCO-2017 | ml.g5.2xlarge | float16 | 6 | 8 | 
| maskrCNN-50-FPN ResNet | COCO-2017 | ml.g5.48xlarge | float16 | 48 | 64 | 
| maskrCNN-50-FPN ResNet | COCO-2017 | ml.p3.2xlarge | float16 | 4 | 6 | 
| ResNet50 | ImageNet | ml.g4dn.2xlarge | float16 | 224 | 256 | 
| ResNet50 | ImageNet | ml.g5.2xlarge | float16 | 192 | 160 | 
| ResNet50 | ImageNet | ml.g5.48xlarge | float16 | 2048 | 2048 | 
| ResNet50 | ImageNet | ml.p3.2xlarge | float16 | 224 | 160 | 
| ResNet101 | ImageNet | ml.g4dn.2xlarge | float16 | 160 | 128 | 
| ResNet101 | ImageNet | ml.g5.2xlarge | float16 | 192 | 256 | 
| ResNet101 | ImageNet | ml.g5.48xlarge | float16 | 2048 | 2048 | 
| ResNet101 | ImageNet | ml.p3.2xlarge | float16 | 160 | 224 | 
| ResNet152 | ImageNet | ml.g4dn.2xlarge | float16 | 128 | 128 | 
| ResNet152 | ImageNet | ml.g5.2xlarge | float16 | 192 | 224 | 
| ResNet152 | ImageNet | ml.g5.48xlarge | float16 | 1536 | 1792 | 
| ResNet152 | ImageNet | ml.p3.2xlarge | float16 | 128 | 160 | 
| VisionTransformer | ImageNet | ml.g4dn.2xlarge | float16 | 80 | 128 | 
| VisionTransformer | ImageNet | ml.g5.2xlarge | float16 | 112 | 144 | 
| VisionTransformer | ImageNet | ml.g5.48xlarge | float16 | 896 | 1152 | 
| VisionTransformer | ImageNet | ml.p3.2xlarge | float16 | 80 | 128 | 

**自然语言处理 (NLP) 模型**

已结合使用带 `Sequence_Len=128` 的[转换器模型](https://github.com/huggingface/transformers)和自动混合精度 (AMP) 进行测试，如下所示。


| 单节点单 GPU/多 GPU | 模型 | 数据集 | 实例类型 | 精度 | 原生框架的批处理大小  |  SageMaker 训练编译器的批次大小  | 
| --- | --- | --- | --- | --- | --- | --- | 
| albert-base-v2 | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 128 | 112 | 
| albert-base-v2 | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 128 | 
| albert-base-v2 | wikitext-2-raw-v1 | p3.8xlarge | float16 | 128 | 135 | 
| albert-base-v2 | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 191 | 
| bert-base-uncased | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 64 | 94 | 
| bert-base-uncased | wikitext-2-raw-v1 | p3.2xlarge | float16 | 96 | 101 | 
| bert-base-uncased | wikitext-2-raw-v1 | p3.8xlarge | float16 | 96 | 96 | 
| bert-base-uncased | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 128 | 
| bert-large-uncased | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 35 | 21 | 
| bert-large-uncased | wikitext-2-raw-v1 | p3.2xlarge | float16 | 39 | 26 | 
| bert-large-uncased | wikitext-2-raw-v1 | g5.4xlarge | float16 | 60 | 50 | 
| camembert-base | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 96 | 90 | 
| camembert-base | wikitext-2-raw-v1 | p3.2xlarge | float16 | 96 | 98 | 
| camembert-base | wikitext-2-raw-v1 | p3.8xlarge | float16 | 96 | 96 | 
| camembert-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 128 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 256 | 160 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | p3.2xlarge | float16 | 128 | 176 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | p3.8xlarge | float16 | 128 | 160 | 
| distilbert-base-uncased | wikitext-2-raw-v1 | g5.4xlarge | float16 | 256 | 258 | 
| google\$1 electra-small-discriminator | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 256 | 216 | 
| google\$1 electra-small-discriminator | wikitext-2-raw-v1 | p3.2xlarge | float16 | 256 | 230 | 
| google\$1 electra-small-discriminator | wikitext-2-raw-v1 | p3.8xlarge | float16 | 256 | 224 | 
| google\$1 electra-small-discriminator | wikitext-2-raw-v1 | g5.4xlarge | float16 | 256 | 320 | 
| gpt2 | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 80 | 64 | 
| gpt2 | wikitext-2-raw-v1 | p3.2xlarge | float16 | 80 | 77 | 
| gpt2 | wikitext-2-raw-v1 | p3.8xlarge | float16 | 80 | 72 | 
| gpt2 | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 120 | 
| jplu\$1 tf-xlm-roberta-base | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 28 | 24 | 
| jplu\$1 tf-xlm-roberta-base | wikitext-2-raw-v1 | p3.2xlarge | float16 | 32 | 24 | 
| jplu\$1 tf-xlm-roberta-base | wikitext-2-raw-v1 | p3.8xlarge | float16 | 32 | 26 | 
| jplu\$1 tf-xlm-roberta-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 66 | 52 | 
| microsoft\$1mpnet-base | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 96 | 92 | 
| microsoft\$1mpnet-base | wikitext-2-raw-v1 | p3.2xlarge | float16 | 96 | 101 | 
| microsoft\$1mpnet-base | wikitext-2-raw-v1 | p3.8xlarge | float16 | 96 | 101 | 
| microsoft\$1mpnet-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 152 | 
| roberta-base | wikitext-2-raw-v1 | g4dn.16xlarge | float16 | 64 | 72 | 
| roberta-base | wikitext-2-raw-v1 | p3.2xlarge | float16 | 64 | 84 | 
| roberta-base | wikitext-2-raw-v1 | p3.8xlarge | float16 | 64 | 86 | 
| roberta-base | wikitext-2-raw-v1 | g5.4xlarge | float16 | 128 | 128 | 

### TensorFlow 2.9.1
<a name="training-compiler-tested-models-tf291"></a>

使用具有自动混合精度 (AMP) 的 [TensorFlowModel Garden](https://github.com/tensorflow/models) 进行了测试。

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/training-compiler-support.html)

\$1 标有星号 (\$1) 的批量大小表示 SageMaker 训练编译器开发团队测试的最大批量。对于已标记的单元格，该实例可能能够容纳比所示批处理大小更大的批处理大小。

### 变形金刚 4.21.1 和 1.11.0 PyTorch
<a name="training-compiler-tested-models-hf421-pt111"></a>

已通过 `Sequence_Len=512` 和自动混合精度 (AMP) 进行测试。

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/training-compiler-support.html)

### 变形金刚 4.17.0 和 1.10.2 PyTorch
<a name="training-compiler-tested-models-hf417-pt110"></a>

已通过 `Sequence_Len=512` 和自动混合精度 (AMP) 进行测试。

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/training-compiler-support.html)

### 变形金刚 4.11.0 和 1.9.0 PyTorch
<a name="training-compiler-tested-models-hf411-pt190"></a>

已通过 `Sequence_Len=512` 和自动混合精度 (AMP) 进行测试。


| 单节点单 GPU | 模型  | 实例类型 | 本机的批处理大小 | Training Compiler 的批处理大小 | 
| --- | --- | --- | --- | --- | 
| albert-base-v2  | ml.p3.2xlarge | 12 | 32 | 
| bert-base-cased  | ml.p3.2xlarge | 14 | 24 | 
| bert-base-chinese | ml.p3.2xlarge | 16 | 24 | 
| bert-base-multilingual-cased  | ml.p3.2xlarge | 4 | 16 | 
| bert-base-multilingual-uncased  | ml.p3.2xlarge | 8 | 16 | 
| bert-base-uncased  | ml.p3.2xlarge | 12 | 24 | 
| cl-tohoku/-word-masking bert-base-japanese-whole | ml.p3.2xlarge | 12 | 24 | 
| cl-tohoku/ bert-base-japanese  | ml.p3.2xlarge | 12 | 24 | 
| distilbert-base-uncased  | ml.p3.2xlarge | 28 | 32 | 
| distilbert-base-uncased-finetuned-sst-2-english | ml.p3.2xlarge | 28 | 32 | 
| distilgpt2  | ml.p3.2xlarge | 16 | 32 | 
| facebook/bart-base  | ml.p3.2xlarge | 4 | 8 | 
| gpt2 | ml.p3.2xlarge | 6 | 20 | 
| nreimers/ LMv2 mini-L6-H384-distilled-from-RoBERTa-Large  | ml.p3.2xlarge | 20 | 32 | 
| roberta-base  | ml.p3.2xlarge | 12 | 20 | 


| 单节点多 GPU | 模型  | 实例类型 | 本机的批处理大小 | Training Compiler 的批处理大小 | 
| --- | --- | --- | --- | --- | 
| bert-base-chinese  | ml.p3.8xlarge | 16 | 26 | 
| bert-base-multilingual-cased  | ml.p3.8xlarge | 6 | 16 | 
| bert-base-multilingual-uncased | ml.p3.8xlarge | 6 | 16 | 
| bert-base-uncased  | ml.p3.8xlarge | 14 | 24 | 
| distilbert-base-uncased  | ml.p3.8xlarge | 14 | 32 | 
| distilgpt2 | ml.p3.8xlarge | 6 | 32 | 
| facebook/bart-base | ml.p3.8xlarge | 8 | 16 | 
| gpt2  | ml.p3.8xlarge | 8 | 20 | 
| roberta-base  | ml.p3.8xlarge | 12 | 20 | 

### 变形金刚 4.17.0 与 2.6.3 TensorFlow
<a name="training-compiler-tested-models-hf417-tf263"></a>

已通过 `Sequence_Len=128` 和自动混合精度 (AMP) 进行测试。


| 模型  | 实例类型 | 原生框架的批处理大小 | Training Compiler 的批处理大小 | 
| --- | --- | --- | --- | 
| albert-base-v2 | ml.g4dn.16xlarge | 136 | 208 | 
| albert-base-v2 | ml.g5.4xlarge | 219 | 312 | 
| albert-base-v2 | ml.p3.2xlarge | 152 | 208 | 
| albert-base-v2 | ml.p3.8xlarge | 152 | 192 | 
| bert-base-uncased | ml.g4dn.16xlarge | 120 | 101 | 
| bert-base-uncased | ml.g5.4xlarge | 184 | 160 | 
| bert-base-uncased | ml.p3.2xlarge | 128 | 108 | 
| bert-large-uncased | ml.g4dn.16xlarge | 37 | 28 | 
| bert-large-uncased | ml.g5.4xlarge | 64 | 55 | 
| bert-large-uncased | ml.p3.2xlarge | 40 | 32 | 
| camembert-base | ml.g4dn.16xlarge | 96 | 100 | 
| camembert-base | ml.g5.4xlarge | 190 | 160 | 
| camembert-base | ml.p3.2xlarge | 129 | 108 | 
| camembert-base | ml.p3.8xlarge | 128 | 104 | 
| distilbert-base-uncased | ml.g4dn.16xlarge | 210 | 160 | 
| distilbert-base-uncased | ml.g5.4xlarge | 327 | 288 | 
| distilbert-base-uncased | ml.p3.2xlarge | 224 | 196 | 
| distilbert-base-uncased | ml.p3.8xlarge | 192 | 182 | 
| google\$1 electra-small-discriminator | ml.g4dn.16xlarge | 336 | 288 | 
| google\$1 electra-small-discriminator | ml.g5.4xlarge | 504 | 384 | 
| google\$1 electra-small-discriminator | ml.p3.2xlarge | 352 | 323 | 
| gpt2 | ml.g4dn.16xlarge | 89 | 64 | 
| gpt2 | ml.g5.4xlarge | 140 | 146 | 
| gpt2 | ml.p3.2xlarge | 94 | 96 | 
| gpt2 | ml.p3.8xlarge | 96 | 88 | 
| jplu\$1 tf-xlm-roberta-base | ml.g4dn.16xlarge | 52 | 16 | 
| jplu\$1 tf-xlm-roberta-base | ml.g5.4xlarge | 64 | 44 | 
| microsoft\$1mpnet-base | ml.g4dn.16xlarge | 120 | 100 | 
| microsoft\$1mpnet-base | ml.g5.4xlarge | 192 | 160 | 
| microsoft\$1mpnet-base | ml.p3.2xlarge | 128 | 104 | 
| microsoft\$1mpnet-base | ml.p3.8xlarge | 130 | 92 | 
| roberta-base | ml.g4dn.16xlarge | 108 | 64 | 
| roberta-base | ml.g5.4xlarge | 176 | 142 | 
| roberta-base | ml.p3.2xlarge | 118 | 100 | 
| roberta-base | ml.p3.8xlarge | 112 | 88 | 

### 变形金刚 4.11.0 和 2.5.1 TensorFlow
<a name="training-compiler-tested-models-hf411-tf251"></a>

已通过 `Sequence_Len=128` 和自动混合精度 (AMP) 进行测试。


| 单节点单 GPU | 模型  | 实例类型 | 本机的批处理大小 | Training Compiler 的批处理大小 | 
| --- | --- | --- | --- | --- | 
| albert-base-v2  | ml.p3.2xlarge | 128 | 128 | 
| bart-base  | ml.p3.2xlarge | 12 | 64 | 
| bart-large  | ml.p3.2xlarge | 4 | 28 | 
| bert-base-cased  | ml.p3.2xlarge | 16 | 128 | 
| bert-base-chinese | ml.p3.2xlarge | 16 | 128 | 
| bert-base-multilingual-cased  | ml.p3.2xlarge | 12 | 64 | 
| bert-base-multilingual-uncased  | ml.p3.2xlarge | 16 | 96 | 
| bert-base-uncased | ml.p3.2xlarge | 16 | 96 | 
| bert-large-uncased  | ml.p3.2xlarge | 4 | 24 | 
| cl-tohoku/ bert-base-japanese  | ml.p3.2xlarge | 16 | 128 | 
| cl-tohoku/-word-masking bert-base-japanese-whole  | ml.p3.2xlarge | 16 | 128 | 
| distilbert-base-sst2  | ml.p3.2xlarge | 32 | 128 | 
| distilbert-base-uncased  | ml.p3.2xlarge | 32 | 128 | 
| distilgpt2 | ml.p3.2xlarge | 32 | 128 | 
| gpt2  | ml.p3.2xlarge | 12 | 64 | 
| gpt2-large  | ml.p3.2xlarge | 2 | 24 | 
| jplu/ tf-xlm-roberta-base  | ml.p3.2xlarge | 12 | 32 | 
| roberta-base  | ml.p3.2xlarge | 4 | 64 | 
| roberta-large  | ml.p3.2xlarge | 4 | 64 | 
| t5-base  | ml.p3.2xlarge | 64 | 64 | 
| t5-small  | ml.p3.2xlarge | 128 | 128 | 