本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。 # TensorFlow 攜帶您自己的 TensorFlow 模型至 SageMaker AI，並使用 SageMaker Training Compiler 執行訓練任務。 ## TensorFlow 模型 SageMaker Training Compiler 會自動最佳化建置在原生 TensorFlow API 或高階 Keras API 上的模型訓練工作負載。 **提示** 若要預先處理輸入資料集，請確保您使用靜態輸入形狀。動態輸入形狀可以啟動模型的重新編譯，並可能增加總訓練時間。 ### 使用 Keras (建議) 為了最佳的編譯器加速，我們建議您使用 TensorFlow Keras ([tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model)) 子類別的模型。 #### 適用於單一 GPU 訓練您不需要在訓練指令碼中進行其他變更。 ### 不使用 Keras SageMaker Training Compiler 不支援 TensorFlow 的快速執行。因此，您應該使用 TensorFlow 函式裝飾項目 (`@tf.function`) 來包裝模型和訓練迴路，以利用編譯器加速。 SageMaker Training Compiler 執行圖形層級最佳化，並使用裝飾器來確保您的 TensorFlow 函式設定為以[圖形模式](https://www.tensorflow.org/guide/intro_to_graphs)執行。 #### 適用於單一 GPU 訓練 TensorFlow 2.0 或更新版本預設為啟用快速執行，因此您應該在用於建構 TensorFlow 模型的每個函式前面加入 `@tf.function` 裝飾器。 ## 配備 Hugging Face 轉換器的 TensorFlow 模型配備 [Hugging Face 轉換器](https://huggingface.co/docs/transformers/index)的 TensorFlow 模型基於 TensorFlow 的 [tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model) API。Hugging Face 轉換器也為 TensorFlow 提供預先訓練的模型類別，有助於減少設定自然語言處理 (NLP) 模型的工作量。使用轉換器程式庫建立自己的訓練指令碼後，您可以使用 SageMaker AI `HuggingFace` 估算器搭配 SageMaker Training Compiler 組態類別執行訓練指令碼，如 [使用 SageMaker Training Compiler 執行 TensorFlow 訓練任務](training-compiler-enable-tensorflow.md) 的上一個主題所示。 SageMaker Training Compiler 會自動最佳化建置在原生 TensorFlow API 或高階 Keras API (例如 TensorFlow 轉換器模型) 的模型訓練工作負載。 **提示** 當您在訓練指令碼中使用轉換器為 NLP 模型建立權杖化工具時，請確保您透過指定 `padding='max_length'` 來使用靜態輸入張量形狀。請勿使用 `padding='longest'`，因為填補至批次中最長的序列可能會變更每個訓練批次的張量形狀。動態輸入形狀可啟動模型的重新編譯，並可能增加總訓練時間。如需轉換器權杖化工具選項的更多相關資訊，請參閱 *Hugging Face 轉換器文件*中的[填補和截斷](https://huggingface.co/docs/transformers/pad_truncation)。 **Topics** + [使用 Keras](#training-compiler-tensorflow-models-transformers-keras) + [不使用 Keras](#training-compiler-tensorflow-models-transformers-no-keras) ### 使用 Keras 為了最佳的編譯器加速，我們建議您使用 TensorFlow Keras ([tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model)) 子類別的模型。正如 *Hugging Face 轉換器文件*中的[快速導覽](https://huggingface.co/docs/transformers/quicktour)頁面所述，您可以使用模型做為一般 TensorFlow Keras 模型。 #### 適用於單一 GPU 訓練您不需要在訓練指令碼中進行其他變更。 #### 適用於分散式訓練在 [https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy](https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy) 呼叫範圍內使用 Keras API 建構和訓練模型時，SageMaker Training Compiler 加速可透明針對多重 GPU 工作負載運作。 1. 選擇正確的分散式訓練策略。 1. 對於單一節點多重 GPU，請使用 `tf.distribute.MirroredStrategy` 設定策略。 ``` strategy = tf.distribute.MirroredStrategy() ``` 1. 對於多重節點多重 GPU，請在建立策略之前新增下列程式碼，以正確設定 TensorFlow 分散式訓練組態。 ``` def set_sm_dist_config(): DEFAULT_PORT = '8890' DEFAULT_CONFIG_FILE = '/opt/ml/input/config/resourceconfig.json' with open(DEFAULT_CONFIG_FILE) as f: config = json.loads(f.read()) current_host = config['current_host'] tf_config = { 'cluster': { 'worker': [] }, 'task': {'type': 'worker', 'index': -1} } for i, host in enumerate(config['hosts']): tf_config['cluster']['worker'].append("%s:%s" % (host, DEFAULT_PORT)) if current_host == host: tf_config['task']['index'] = i os.environ['TF_CONFIG'] = json.dumps(tf_config) set_sm_dist_config() ``` 使用 `tf.distribute.MultiWorkerMirroredStrategy` 設定策略。 ``` strategy = tf.distribute.MultiWorkerMirroredStrategy() ``` 1. 使用您選擇的策略，包裝模型。 ``` with strategy.scope(): # create a model and do fit ``` ### 不使用 Keras 如果您想要使用沒有 Keras 的 TensorFlow 來具有自訂訓練迴路的自訂模型，您應該使用 TensorFlow 函式裝飾器 (`@tf.function`) 來包裝模型和訓練迴路，以利用編譯器加速。 SageMaker Training Compiler 執行圖形層級最佳化，並使用裝飾器來確保您的 TensorFlow 函式設定為以圖形模式執行。 #### 適用於單一 GPU 訓練 TensorFlow 2.0 或更新版本預設為啟用快速執行，因此您應該在用於建構 TensorFlow 模型的每個函式前面加入 `@tf.function` 裝飾器。 #### 適用於分散式訓練除了[使用 Keras 進行分散式訓練](https://docs.aws.amazon.com/sagemaker/latest/dg/training-compiler-tensorflow-models.html#training-compiler-tensorflow-models-transformers-keras)所需的變更之外，您還需要確保在每個 GPU 上執行的功能均已註釋 `@tf.function`，但不會註釋跨 GPU 通訊功能。範例訓練程式碼看起來應該如下所示： ``` @tf.function() def compiled_step(inputs, outputs): with tf.GradientTape() as tape: pred=model(inputs, training=True) total_loss=loss_object(outputs, pred)/args.batch_size gradients=tape.gradient(total_loss, model.trainable_variables) return total_loss, pred, gradients def train_step(inputs, outputs): total_loss, pred, gradients=compiled_step(inputs, outputs) if args.weight_decay > 0.: gradients=[g+v*args.weight_decay for g,v in zip(gradients, model.trainable_variables)] optimizer.apply_gradients(zip(gradients, model.trainable_variables)) train_loss.update_state(total_loss) train_accuracy.update_state(outputs, pred) @tf.function() def train_step_dist(inputs, outputs): strategy.run(train_step, args= (inputs, outputs)) ``` 請注意，此指示可用於單一節點多重 GPU 和多重節點多重 GPU。