本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。 # 以 SageMaker 訓練工作方式執行本機代碼您可以大型單一節點 Amazon SageMaker 訓練工作方式或多個平行工作方式來執行本機機器學習 (ML) Python 程式碼。若要達成此操作，您可以利用 @remote 裝飾項目來註釋代碼，如下列代碼範例所示。遠端函式不支援[分散式訓練](https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html) (跨多個執行個體)。 ``` @remote(**settings) def divide(x, y): return x / y ``` SageMaker Python SDK 會將現有工作區環境以及任何關聯資料處理代碼與資料集自動翻譯成 SageMaker 訓練工作，並在 SageMaker 訓練平台執行。您還可以啟用持續快取功能，藉由快取先前下載的相依性套件，進一步減少任務開始延遲。相較於單獨使用 SageMaker AI 受管暖集區，此方法可更大幅減少工作延遲。如需詳細資訊，請參閱[使用持久性快取](train-warm-pools.md#train-warm-pools-persistent-cache)。 **注意** 遠端函式不支援分散式訓練工作。下列區段說明如何利用 @remote 裝飾項目來註釋本機機器學習 (ML) 程式碼，並針對您的使用案例量身打造您的體驗。這包含自訂環境以及整合 SageMaker Experiments。 **Topics** + [設定您的環境](#train-remote-decorator-env) + [調用遠端函式](train-remote-decorator-invocation.md) + [組態檔案](train-remote-decorator-config.md) + [自訂執行期環境](train-remote-decorator-customize.md) + [容器映像相容性](train-remote-decorator-container.md) + [使用 Amazon SageMaker Experiments 記錄參數與指標](train-remote-decorator-experiments.md) + [搭配 @remote 裝飾項目使用模組化代碼](train-remote-decorator-modular.md) + [適用執行期相依性的私有儲存庫](train-remote-decorator-private.md) + [範例筆記本](train-remote-decorator-examples.md) ## 設定您的環境請從下列三個選項選擇一個來設定環境。 ### 從 Amazon SageMaker Studio Classic 執行代碼您可以從 SageMaker Studio Classic 註釋及執行本機機器學習 (ML) 程式碼，方法是建立 SageMaker 筆記本並連接 SageMaker Studio Classic 映像可用的任何映像。下列指示可協助您建立 SageMaker 筆記本、安裝 SageMaker Python SDK，以及使用裝飾項目來註釋代碼。 1. 按照以下指示建立 SageMaker 筆記本並連接 SageMaker Studio Classic 映像： 1. 請參閱 *Amazon SageMaker AI 開發人員指南*，遵循[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html) 的指示。 1. 從左側導覽窗格選取 **Studio**。這會開啟新視窗。 1. 在**入門**對話方塊，從向下箭頭選取使用者設定檔。這會開啟新視窗。 1. 選取**開啟 Studio Classic**。 1. 從主要工作區選取**開啟啟動器**。這會開啟新頁面。 1. 從主要工作區選取**建立筆記本**。 1. 在**變更環境**對話方塊，從**映像**旁邊的向下箭頭選取**基本 Python 3.0**。 @remote 裝飾項目會自動偵測已連接至 SageMaker Studio Classic 筆記本的映像，並用以執行 SageMaker 訓練任務。如在裝飾項目或組態檔案指定 `image_uri` 為引數，則會採用 `image_uri` 指定的值，而非偵測到的映像。如需更多相關資訊了解如何在 SageMaker Studio Classic 建立筆記本，請參閱[建立或開啟 Amazon SageMaker Studio Classic 筆記本](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-create-open.html#notebooks-create-file-menu)的**從檔案函式表建立筆記本**區段。如需可用映像檔清單，請參閱[支援的 Docker 映像](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator-container.html)。 1. 安裝 SageMaker Python SDK。若要在 SageMaker Studio Classic 筆記本內部使用 @remote 函式來註釋代碼，您必須安裝 SageMaker Python SDK。如下列代碼範例所示，安裝 SageMaker Python SDK。 ``` !pip install sagemaker ``` 1. 利用 @remote 裝飾項目在 SageMaker 訓練工作執行函式。若要執行本機機器學習 (ML) 程式碼，請先建立相依性檔案，以便指示 SageMaker AI 於何處尋找本機程式碼。若要執行此作業，請遵循下列步驟： 1. 從 SageMaker Studio Classic 啟動器主要工作區域的**公用程式與檔案**，選擇**文字檔案**。這會在新索引標籤開啟名為 `untitled.txt.` 的文字檔案如需更多相關資訊了解 SageMaker Studio Classic 使用者介面 (UI)，請參閱 [Amazon SageMaker Studio Classic 使用者介面概觀](https://docs.aws.amazon.com//sagemaker/latest/dg/studio-ui.html)。 1. 重新命名 `untitled.txt ` 為 `requirements.txt`。 1. 新增代碼所需的所有相依性以及 SageMaker AI 程式庫至 `requirements.txt`。下個區段將針對範例 `divide` 函式的 `requirements.txt` 提供最小代碼範例，如下所示。 ``` sagemaker ``` 1. 透過傳遞相依性檔案，使用遠端裝飾項目來執行代碼，如下所示。 ``` from sagemaker.remote_function import remote @remote(instance_type="ml.m5.xlarge", dependencies='./requirements.txt') def divide(x, y): return x / y divide(2, 3.0) ``` 如需其他代碼範例，請參閱範例筆記本 [quick\$1start.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-remote-function/quick_start/quick_start.ipynb)。如您已在執行 SageMaker Studio Classic 筆記本，且您按照 **2 的指示安裝 Python SDK。在安裝 SageMaker Python SDK** 時，您必須重新啟動核心。如需更多資訊，請參閱 *Amazon SageMaker AI 開發人員指南*的[使用 SageMaker Studio Classic 筆記本工具列](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-menu.html)。 ### 從 Amazon SageMaker 筆記本執行代碼您可以從 SageMaker 筆記本執行個體註釋本機機器學習 (ML) 程式碼。下列指示說明如何使用自訂核心建立筆記本執行個體、如何安裝 SageMaker Python SDK，以及如何使用裝飾項目註釋代碼。 1. 利用自訂 `conda` 核心建立筆記本執行個體。您可以利用 @remote 裝飾項目註釋本機機器學習 (ML) 程式碼，以便用於 SageMaker 訓練工作內部。首先，您必須建立並自訂 SageMaker 筆記本執行個體，才能使用 Python 3.7 或更高版本 (最高為 3.10.x) 的核心。若要執行此作業，請遵循下列步驟： 1. 開啟位在 [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) 的 SageMaker AI 主控台。 1. 從左側導覽面板選擇**筆記本**並展開選項。 1. 從展開的選項選擇**筆記本執行個體**。 1. 選擇**建立筆記本執行個體**按鈕。這會開啟新頁面。 1. 在**筆記本執行個體名稱**，輸入名稱 (上限為 63 個字元且無空格)。有效字元：**A-Z**、**a-z**、**0-9** 以及 **.****:****\$1****=****@**** \$1****%****-** (連字號)。 1. 在**筆記本執行個體設定**對話方塊，展開**其他組態**旁邊的向右箭頭。 1. 在**生命週期組態 - 選擇性**，展開向下箭頭並選取**建立新生命週期組態**。這將開啟新對話方塊。 1. 針對**名稱**，輸入組態設定名稱。 1. 在**指令碼**對話方塊的**開始筆記本**標籤，以下列指令碼取代文字方塊的現有內容。 ``` #!/bin/bash set -e sudo -u ec2-user -i <<'EOF' unset SUDO_UID WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda/ source "$WORKING_DIR/miniconda/bin/activate" for env in $WORKING_DIR/miniconda/envs/*; do BASENAME=$(basename "$env") source activate "$BASENAME" python -m ipykernel install --user --name "$BASENAME" --display-name "Custom ($BASENAME)" done EOF echo "Restarting the Jupyter server.." # restart command is dependent on current running Amazon Linux and JupyterLab CURR_VERSION_AL=$(cat /etc/system-release) CURR_VERSION_JS=$(jupyter --version) if [[ $CURR_VERSION_JS == *$"jupyter_core : 4.9.1"* ]] && [[ $CURR_VERSION_AL == *$" release 2018"* ]]; then sudo initctl restart jupyter-server --no-wait else sudo systemctl --no-block restart jupyter-server.service fi ``` 1. 在**指令碼**對話方塊的**建立筆記本**標籤，以下列指令碼取代文字方塊的現有內容。 ``` #!/bin/bash set -e sudo -u ec2-user -i <<'EOF' unset SUDO_UID # Install a separate conda installation via Miniconda WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda mkdir -p "$WORKING_DIR" wget https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh -O "$WORKING_DIR/miniconda.sh" bash "$WORKING_DIR/miniconda.sh" -b -u -p "$WORKING_DIR/miniconda" rm -rf "$WORKING_DIR/miniconda.sh" # Create a custom conda environment source "$WORKING_DIR/miniconda/bin/activate" KERNEL_NAME="custom_python310" PYTHON="3.10" conda create --yes --name "$KERNEL_NAME" python="$PYTHON" pip conda activate "$KERNEL_NAME" pip install --quiet ipykernel # Customize these lines as necessary to install the required packages EOF ``` 1. 選擇視窗底部右方的**建立組態**按鈕。 1. 選擇視窗底部右方的**建立筆記本執行個體**按鈕。 1. 等待筆記本執行個體的**狀態**從**待處理**變更為 **InService**。 1. 在筆記本執行個體建立 Jupyter 筆記本。下列指示說明如何在新建立的 SageMaker 執行個體運用 Python 3.10 建立 Jupyter 筆記本。 1. 在上一步的筆記本執行個體**狀態**變為 **InService** 之後，請執行下列動作： 1. 在包含新建立筆記本執行個體**名稱**的那一列，選取**動作**下的**開啟 Jupyter**。這會開啟新 Jupyter 伺服器。 1. 在 Jupyter 伺服器，從頂部右方的函式表選取**新增**。 1. 從向下箭頭選取 **conda\$1custom\$1python310**。這會建立採用 Python 3.10 核心的新 Jupyter 筆記本。您現可以運用本機 Jupyter 筆記本的類似方式使用此新 Jupyter 筆記本。 1. 安裝 SageMaker Python SDK。在執行虛擬環境之後，請利用下列代碼範例來安裝 SageMaker Python SDK。 ``` !pip install sagemaker ``` 1. 利用 @remote 裝飾項目在 SageMaker 訓練工作執行函式。當您利用 SageMaker 筆記本內部的 @remote 裝飾項目註釋本機機器學習 (ML) 程式碼時，SageMaker 訓練會自動解譯程式碼的函式，並以 SageMaker 訓練工作的方式加以執行。執行下列操作以設定筆記本： 1. 從您在步驟 1 (**利用自訂核心建立 SageMaker 筆記本執行個體**) 建立的 SageMaker 筆記本執行個體，選取筆記本函式表的核心名稱。如需更多資訊，請參閱[變更映像或核心](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-run-and-manage-change-image.html)。 1. 從向下箭頭，選擇自訂 `conda` 核心 (需採用 Python 3.7 版本或更高版本)。例如，選取 `conda_custom_python310` 即可選擇 Python 3.10 作為核心。 1. 選擇**選取**。 1. 等待核心狀態顯示為閒置，這表示核心已啟動。 1. 在 Jupyter 伺服器首頁，從頂部右方的函式表選取**新增**。 1. 選取向下箭頭旁邊的**文字檔案**。這會建立新文字檔案，名為 `untitled.txt.` 1. 重新命名 `untitled.txt` 為 `requirements.txt`，並新增代碼所需的任何相依性以及 `sagemaker`。 1. 透過傳遞相依性檔案，使用遠端裝飾項目來執行代碼，如下所示。 ``` from sagemaker.remote_function import remote @remote(instance_type="ml.m5.xlarge", dependencies='./requirements.txt') def divide(x, y): return x / y divide(2, 3.0) ``` 如需其他代碼範例，請參閱範例筆記本 [quick\$1start.ipnyb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-remote-function/quick_start/quick_start.ipynb)。 ### 從本機 IDE 執行代碼您可以在偏好的本機 IDE 內部利用 @remote 裝飾項目註釋本機機器學習 (ML) 程式碼。下列步驟顯示必備先決條件、如何安裝 Python SDK，以及如何使用 @remote 裝飾項目註釋代碼。 1. 透過設定 AWS Command Line Interface (AWS CLI) 和建立角色來安裝先決條件，如下所示： + 請參閱[設定 Amazon SageMaker AI 先決條件](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html#gs-cli-prereq)的 **AWS CLI 先決條件**區段，按照指示加入 SageMaker AI 網域。 + 按照 [SageMaker AI 角色](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)的**建立執行角色**區段建立 IAM 角色。 1. 透過使用 PyCharm 或 `conda` 並採用 Python 3.7 或更高版本 (最高為 3.10.x) 來建立虛擬環境。 + 利用 PyCharm 設定虛擬環境，如下所示： 1. 從主功能表選取**檔案**。 1. 選擇**新專案**。 1. 從**新環境使用**下的向下箭頭選擇 **Conda**。 1. 在 **Python 版本**欄位，利用向下箭頭選取 Python 版本，需為 3.7 或更高版本。您可以從清單選取的最高版本為 3.10.x。 ![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/training-pycharm-ide.png) + 如您已安裝 Anaconda，您可以利用 `conda` 來設定虛擬環境，如下所示： + 開啟 Anaconda 提示終端機介面。 + 利用 Python 3.7 或更高版本 (最高為 3.10x) 建立並啟用新 `conda` 環境。下列代碼範例示範如何利用 Python 3.10 版本建立 `conda` 環境。 ``` conda create -n sagemaker_jobs_quick_start python=3.10 pip conda activate sagemaker_jobs_quick_start ``` 1. 安裝 SageMaker Python SDK。若要從偏好的 IDE 封裝代碼，您必須利用 Python 3.7 或更高版本 (最高為 3.10x) 來設定虛擬環境。您還需要相容的容器映像。利用下列代碼範例安裝 SageMaker Python SDK。 ``` pip install sagemaker ``` 1. 將代碼包裝在 @remote 裝飾項目內部。SageMaker Python SDK 會自動解譯代碼的函式，並以 SageMaker 訓練工作的方式加以執行。下列代碼範例顯示如何匯入必要程式庫、設定 SageMaker 工作階段，以及如何使用 @remote 裝飾項目註釋函式。您可以直接提供所需的相依性，或利用已啟用 `conda` 環境的相依性來執行代碼。 + 若要直接提供相依性，請執行下列操作： + 在代碼所在的工作目錄建立 `requirements.txt` 檔案。 + 新增代碼所需的所有相依性以及 SageMaker 程式庫。下個區段將針對範例 `divide` 函式的 `requirements.txt` 提供最小代碼範例。 ``` sagemaker ``` + 透過傳遞相依性檔案，使用 @remote 裝飾項目來執行代碼。在下列代碼範例，請以您希望 SageMaker 用來執行工作的 AWS Identity and Access Management (IAM) 角色 ARN 來取代 `The IAM role name`。 ``` import boto3 import sagemaker from sagemaker.remote_function import remote sm_session = sagemaker.Session(boto_session=boto3.session.Session(region_name="us-west-2")) settings = dict( sagemaker_session=sm_session, role=, instance_type="ml.m5.xlarge", dependencies='./requirements.txt' ) @remote(**settings) def divide(x, y): return x / y if __name__ == "__main__": print(divide(2, 3.0)) ``` + 若要使用已啟用 `conda` 環境的相依性，請針對 `dependencies` 參數採用值 `auto_capture`，如下所示。 ``` import boto3 import sagemaker from sagemaker.remote_function import remote sm_session = sagemaker.Session(boto_session=boto3.session.Session(region_name="us-west-2")) settings = dict( sagemaker_session=sm_session, role=, instance_type="ml.m5.xlarge", dependencies="auto_capture" ) @remote(**settings) def divide(x, y): return x / y if __name__ == "__main__": print(divide(2, 3.0)) ``` **注意** 您還可以在 Jupyter 筆記本內部實作先前的代碼。PyCharm Professional版本支援本機 Jupyter。有關更多指引，請參閱 PyCharm 文件的 [Jupyter 筆記本支援](https://www.jetbrains.com/help/pycharm/ipython-notebook-support.html)。 # 調用遠端函式若要在 @remote 裝飾項目內部調用函式，請採用下列其中一種方法： + [利用 @remote 裝飾項目調用函式](#train-remote-decorator-invocation-decorator). + [使用 `RemoteExecutor` API 來調用函式](#train-remote-decorator-invocation-api). 如果您使用 @remote 裝飾項目方法來調用函式，則訓練工作將等待函式完成，然後再開始新工作。然而，如果您使用 `RemoteExecutor` API，則可平行執行多項工作。以下區段展示調用函式的兩種方法。 ## 利用 @remote 裝飾項目調用函式您可以透過 @remote 裝飾項目來註釋函式。SageMaker AI 會轉換裝飾項目內部的代碼為 SageMaker 訓練任務。然後，訓練工作會調用裝飾項目內部的函式，並等待工作完成。下列代碼範例顯示如何匯入所需的程式庫、如何啟動 SageMaker AI 執行個體，以及如何使用 @remote 裝飾項目註釋矩陣乘法。 ``` from sagemaker.remote_function import remote import numpy as np @remote(instance_type="ml.m5.large") def matrix_multiply(a, b): return np.matmul(a, b) a = np.array([[1, 0], [0, 1]]) b = np.array([1, 2]) assert (matrix_multiply(a, b) == np.array([1,2])).all() ``` 裝飾項目定義如下。 ``` def remote( *, **kwarg): ... ``` 當您調用裝飾函式時，SageMaker Python SDK 會將由錯誤引發的任何例外狀況載入本機記憶體。下列代碼範例成功完成第一次呼叫除法函式，並將結果載入本機記憶體。在第二次呼叫除法函式時，代碼傳回錯誤，並將此錯誤載入本機記憶體。 ``` from sagemaker.remote_function import remote import pytest @remote() def divide(a, b): return a/b # the underlying job is completed successfully # and the function return is loaded assert divide(10, 5) == 2 # the underlying job fails with "AlgorithmError" # and the function exception is loaded into local memory with pytest.raises(ZeroDivisionError): divide(10, 0) ``` **注意** 裝飾的函式以遠端工作方式執行。如執行緒被中斷，基礎工作將不會停止。 ### 如何變更本機變數的值透過遠端機器執行裝飾項目函式。變更裝飾函式內部的非本機變數或輸入引數不會變更本機值。在下列代碼範例，清單與字典會附加於裝飾項目函式內部。當調用裝飾項目函式時，這點不會變更。 ``` a = [] @remote def func(): a.append(1) # when func is invoked, a in the local memory is not modified func() func() # a stays as [] a = {} @remote def func(a): # append new values to the input dictionary a["key-2"] = "value-2" a = {"key": "value"} func(a) # a stays as {"key": "value"} ``` 若要變更在裝飾項目函式內部宣告的本機變數值，請從函式傳回該變數。下列代碼範例示範當從函式傳回本機變數時，會變更其值。 ``` a = {"key-1": "value-1"} @remote def func(a): a["key-2"] = "value-2" return a a = func(a) -> {"key-1": "value-1", "key-2": "value-2"} ``` ### 資料序列化及還原序列化當您調用遠端函式時，SageMaker AI 會在輸入與輸出階段自動序列化函式引數。函式引數與傳回會利用 [cloudpickle](https://github.com/cloudpipe/cloudpickle) 來序列化。SageMaker AI 支援序列化下列 Python 物件與函式。 + 內建 Python 物件，包含字典、清單、浮點數、ints、字串、布林值以及元組 + Numpy 陣列 + Pandas Dataframes + Scikit-learn 資料集與估算器 + PyTorch 模型 + TensorFlow 模型 + XGBoost 的提升類別以下內容可於部分限制下使用。 + Dask DataFrames + XGBoost Dmatrix 類別 + TensorFlow 資料集與子類別 + PyTorch 模型下個區段包含運用先前 Python 類別的最佳實務 (針對遠端函式有部分限制)、SageMaker AI 儲存序列化資料的位置，以及如何管理其存取權的資訊。 #### Python 類別的最佳實務 (針對遠端資料序列化提供有限支援) 您可以在有限制的情況使用本區段所列的 Python 類別。下個區段將討論使用下列 Python 類別的最佳實務。 + [Dask](https://www.dask.org/) DataFrames + XGBoost DMatric 類別 + TensorFlow 資料集與子類別 + PyTorch 模型 ##### Dask 最佳實務 [Dask](https://www.dask.org/) 是開放原始碼程式庫，可用於 Python 平行運算。本區段顯示以下內容。 + 如何傳遞 Dask DataFrame 至遠端函式 + 如何將總結統計資料從 Dask DataFrame 轉換為 Pandas DataFrame ##### 如何傳遞 Dask DataFrame 至遠端函式 [Dask DataFrame](https://docs.dask.org/en/latest/dataframe.html) 通常用於處理大型資料集，因為即使資料集所需的記憶體比可用量更多，其仍可容納。這是因為 Dask DataFrame 不會將本機資料載入記憶體。如您將 Dask DataFrame 作為函式引數傳遞給遠端函式，Dask 可能會傳遞區域磁碟或雲端儲存空間中資料的參考，而不是資料本身。以下代碼範例顯示在遠端函式內部傳遞 Dask DataFrame，該函式將在空白 DataFrame 上操作。 ``` #Do not pass a Dask DataFrame to your remote function as follows def clean(df: dask.DataFrame ): cleaned = df[] \ ... ``` 僅當您使用 DataFrame 時，Dask 才會將資料從 Dask DataFrame 載入記憶體。若要在遠端函式內部使用 Dask DataFrame，請提供資料路徑。然後，Dask 將於執行代碼時，直接從您指定的資料路徑讀取資料集。下列代碼範例示範如何在遠端函式 `clean` 內部使用 Dask DataFrame。在此代碼範例，`raw_data_path` 傳遞給清理，而非 Dask DataFrame。當代碼執行時，會從 `raw_data_path` 指定的 Amazon S3 儲存貯體位置直接讀取資料集。然後，`persist` 函式會將資料集保留於記憶體，以便執行後續 `random_split` 函式，並利用 Dask DataFrame API 函式寫回 S3 儲存貯體的輸出資料路徑。 ``` import dask.dataframe as dd @remote( instance_type='ml.m5.24xlarge', volume_size=300, keep_alive_period_in_seconds=600) #pass the data path to your remote function rather than the Dask DataFrame itself def clean(raw_data_path: str, output_data_path: str: split_ratio: list[float]): df = dd.read_parquet(raw_data_path) #pass the path to your DataFrame cleaned = df[(df.column_a >= 1) & (df.column_a < 5)]\ .drop(['column_b', 'column_c'], axis=1)\ .persist() #keep the data in memory to facilitate the following random_split operation train_df, test_df = cleaned.random_split(split_ratio, random_state=10) train_df.to_parquet(os.path.join(output_data_path, 'train') test_df.to_parquet(os.path.join(output_data_path, 'test')) clean("s3://amzn-s3-demo-bucket/raw/", "s3://amzn-s3-demo-bucket/cleaned/", split_ratio=[0.7, 0.3]) ``` ##### 如何將總結統計資料從 Dask DataFrame 轉換為 Pandas DataFrame Dask DataFrame 的總結統計資料可透過調用 `compute` 方法轉換為 Pandas DataFrame，如以下範例代碼所示。在此範例，S3 儲存貯體包含大型 Dask DataFrame，且其無法放入記憶體或 Pandas dataframe。在以下範例，遠端函式掃描資料集，並傳回 Dask DataFrame (其中包含來自 `describe` 的輸出統計資料) 至 Pandas DataFrame。 ``` executor = RemoteExecutor( instance_type='ml.m5.24xlarge', volume_size=300, keep_alive_period_in_seconds=600) future = executor.submit(lambda: dd.read_parquet("s3://amzn-s3-demo-bucket/raw/").describe().compute()) future.result() ``` ##### XGBoost DMatric 類別最佳實務 DMatrix 是 XGBoost 用於載入資料的內部資料結構。為在運算工作階段之間輕鬆移動，無法保存 DMatrix 物件。直接傳遞 DMatrix 執行個體會失敗，並顯示 `SerializationError`。 ##### 如何傳遞資料物件至遠端函式，並使用 XGBoost 進行訓練若要轉換 Pandas DataFrame 為 DMatrix 執行個體，並將其用於遠端函式訓練，請將其直接傳遞至遠端函式，如下列代碼範例所示。 ``` import xgboost as xgb @remote def train(df, params): #Convert a pandas dataframe into a DMatrix DataFrame and use it for training dtrain = DMatrix(df) return xgb.train(dtrain, params) ``` ##### TensorFlow 資料集與子類別最佳實務 TensorFlow 資料集與子類別是內部物件，由 TensorFlow 在訓練期間用於載入資料。為在運算工作階段之間輕鬆移動，無法保存 TensorFlow 資料集與子類別。直接傳遞 Tensorflow 資料集或子類別會失敗，並顯示 `SerializationError`。使用 Tensorflow I/O API 從儲存載入資料，如下列代碼範例所示。 ``` import tensorflow as tf import tensorflow_io as tfio @remote def train(data_path: str, params): dataset = tf.data.TextLineDataset(tf.data.Dataset.list_files(f"{data_path}/*.txt")) ... train("s3://amzn-s3-demo-bucket/data", {}) ``` ##### PyTorch 模型最佳實務 PyTorch 模型可序列化，且可在本機環境與遠端函式之間傳遞。如本機環境與遠端環境採用不同裝置類型，例如 (GPU 與 CPU)，則無法將訓練過的模型傳回本機環境。例如，若下列代碼在無 GPU 的本機環境進行開發，但在具 GPU 的執行個體執行，則直接傳回訓練過的模型會導致 `DeserializationError`。 ``` # Do not return a model trained on GPUs to a CPU-only environment as follows @remote(instance_type='ml.g4dn.xlarge') def train(...): if torch.cuda.is_available(): device = torch.device("cuda") else: device = torch.device("cpu") # a device without GPU capabilities model = Net().to(device) # train the model ... return model model = train(...) #returns a DeserializationError if run on a device with GPU ``` 若要將在 GPU 環境訓練的模型傳回僅包含 CPU 功能的模型，請直接使用 PyTorch 模型 I/O API，如下列代碼範例所示。 ``` import s3fs model_path = "s3://amzn-s3-demo-bucket/folder/" @remote(instance_type='ml.g4dn.xlarge') def train(...): if torch.cuda.is_available(): device = torch.device("cuda") else: device = torch.device("cpu") model = Net().to(device) # train the model ... fs = s3fs.FileSystem() with fs.open(os.path.join(model_path, 'model.pt'), 'wb') as file: torch.save(model.state_dict(), file) #this writes the model in a device-agnostic way (CPU vs GPU) train(...) #use the model to train on either CPUs or GPUs model = Net() fs = s3fs.FileSystem()with fs.open(os.path.join(model_path, 'model.pt'), 'rb') as file: model.load_state_dict(torch.load(file, map_location=torch.device('cpu'))) ``` #### SageMaker AI 儲存序列化資料的位置當您調用遠端函式時，SageMaker AI 會在輸入與輸出階段自動序列化函式引數並傳回值。此序列化資料會儲存於 S3 儲存貯體的根目錄。您可在組態檔案指定根目錄 ``。系統會自動為您產生參數 `job_name`。 SageMaker AI 會在根目錄建立 `` 資料夾，其中包含您目前的工作目錄、序列化函式、序列化函式的引數、結果，以及因調用序列化函式而產生的任何例外狀況。在 `` 下方，目錄 `workdir` 會包含目前工作目錄的已壓縮封存。已壓縮封存包含工作目錄與 `requirements.txt` 檔案的任何 Python 檔案，該檔案會指定執行遠端函式所需的任何相依性。以下範例針對您在組態檔案指定的 S3 儲存貯體顯示其資料夾結構。 ``` / # specified by s3_root_uri or S3RootUri / #automatically generated for you workdir/workspace.zip # archive of the current working directory (workdir) function/ # serialized function arguments/ # serialized function arguments results/ # returned output from the serialized function including the model exception/ # any exceptions from invoking the serialized function ``` 您在 S3 儲存貯體指定的根目錄並不適用長期儲存。序列化資料與序列化期間所用的 Python 版本與機器學習 (ML) 架構版本緊密關聯。如您升級 Python 版本或機器學習 (ML) 架構，則可能無法使用序列化資料。相反地，請執行下列動作。 + 以與 Python 版本與機器學習 (ML) 架構無關的格式儲存模型及模型成品。 + 如您升級 Python 或機器學習 (ML) 架構，請從長期儲存存取模型結果。 **重要** 若要在指定的時間量後刪除序列化資料，請在 S3 儲存貯體設定[存留期組態](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html)。 **注意** 相較於其他資料格式 (包含 CSV、Parquet 與 JSON)，使用 Python [保存](https://docs.python.org/3/library/pickle.html)模組進行序列化的檔案可攜性較低。當從未知來源載入保存檔案時，請小心。如需更多資訊了解遠端函式組態檔案所應包含的內容，請參閱[組態檔案](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator-config.html)。 #### 存取序列化資料管理員可為序列化資料提供設定，包含其位置及組態檔案的任何加密設定。根據預設，序列化資料會使用 AWS Key Management Service (AWS KMS) 金鑰加密。管理員也可利用[儲存貯體政策](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html)來限制存取您在組態檔案指定的根目錄。可在專案與工作之間共用及使用組態檔案。如需更多資訊，請參閱[組態檔案](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator-config.html)。 ## 使用 `RemoteExecutor` API 來調用函式您可以透過 `RemoteExecutor` 來調用函式。SageMaker AI Python SDK 會將 `RemoteExecutor` 呼叫內部的代碼轉換為 SageMaker AI 訓練任務。然後，訓練工作會調用該函式作為非同步操作，並傳回未來。如果您使用 `RemoteExecutor` API，則可平行執行多個訓練工作。有關 Python 未來的更多相關資訊，請參閱[未來](https://docs.python.org/3/library/asyncio-future.html)。下列代碼範例顯示如何匯入必要的程式庫、定義函式、啟動 SageMaker AI 執行個體，以及使用 API 提交請求以便平行執行 `2` 任務。 ``` from sagemaker.remote_function import RemoteExecutor def matrix_multiply(a, b): return np.matmul(a, b) a = np.array([[1, 0], [0, 1]]) b = np.array([1, 2]) with RemoteExecutor(max_parallel_job=2, instance_type="ml.m5.large") as e: future = e.submit(matrix_multiply, a, b) assert (future.result() == np.array([1,2])).all() ``` `RemoteExecutor` 類別是 [concurrent.futures.Executor](https://docs.python.org/3/library/concurrent.futures.html) 程式庫的實作。下列代碼範例示範如何定義函式並使用 `RemoteExecutorAPI` 來呼叫函式。在此範例，`RemoteExecutor` 將總共提交 `4` 項任務，但僅 `2` 個為平行處理。最後兩個任務將以最小額外負荷重複使用叢集。 ``` from sagemaker.remote_function.client import RemoteExecutor def divide(a, b): return a/b with RemoteExecutor(max_parallel_job=2, keep_alive_period_in_seconds=60) as e: futures = [e.submit(divide, a, 2) for a in [3, 5, 7, 9]] for future in futures: print(future.result()) ``` `max_parallel_job` 參數僅做為速率限制機制，而不會最佳化運算資源配置。在先前的代碼範例，在提交任何工作之前，`RemoteExecutor` 不會為兩個平行工作保留運算資源。如需更多相關資訊了解 `max_parallel_job` 或 @remote 裝飾項目的其他參數，請參閱[遠端函式類別與方法規格](https://sagemaker.readthedocs.io/en/stable/remote_function/sagemaker.remote_function.html)。 ### `RemoteExecutor` API 的未來類別未來類別是公有類別，代表訓練工作於非同步調用時的傳回函式。未來類別實作 [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html) 類別。此類可用來針對基礎工作進行操作，並載入資料至記憶體。 # 組態檔案 Amazon SageMaker Python SDK 支援設定 AWS 基礎設施基本類型的預設值。在管理員設定這些預設值之後，當 SageMaker Python SDK 呼叫支援的 API 時，系統會自動傳遞這些預設值。可將裝飾項目函式的引數置於組態檔案內部。這樣您就可以將與基礎設施相關的設定與代碼基底分隔開來。如需更多相關資訊了解遠端函式與方法的參數及引數，請參閱[遠端函式類別與方法規格](https://sagemaker.readthedocs.io/en/stable/remote_function/sagemaker.remote_function.html)。您可以針對網路組態、IAM 角色、用於輸入與輸出資料的 Amazon S3 資料夾，以及組態檔案內部的標籤編輯基礎設施設定。當您使用 @remote 裝飾項目或 `RemoteExecutor` API 調用函式時，可運用組態檔案。以下範例組態檔案定義相依性、資源以及其他引數。此範例組態檔案是用來調用使用 @remote 裝飾項目或 RemoteExecutor API 初始化的函式。 ``` SchemaVersion: '1.0' SageMaker: PythonSDK: Modules: RemoteFunction: Dependencies: 'path/to/requirements.txt' EnableInterContainerTrafficEncryption: true EnvironmentVariables: {'EnvVarKey': 'EnvVarValue'} ImageUri: '366666666666.dkr.ecr.us-west-2.amazonaws.com/my-image:latest' IncludeLocalWorkDir: true CustomFileFilter: IgnoreNamePatterns: - "*.ipynb" - "data" InstanceType: 'ml.m5.large' JobCondaEnvironment: 'your_conda_env' PreExecutionCommands: - 'command_1' - 'command_2' PreExecutionScript: 'path/to/script.sh' RoleArn: 'arn:aws:iam::366666666666:role/MyRole' S3KmsKeyId: 'yourkmskeyid' S3RootUri: 's3://amzn-s3-demo-bucket/my-project' VpcConfig: SecurityGroupIds: - 'sg123' Subnets: - 'subnet-1234' Tags: [{'Key': 'yourTagKey', 'Value':'yourTagValue'}] VolumeKmsKeyId: 'yourkmskeyid' ``` @remote 裝飾項目與 `RemoteExecutor` 將在以下組態檔案查找 `Dependencies`： + 管理員定義的組態檔案。 + 使用者定義的組態檔案。這些組態檔案的預設位置取決於且相對於您的環境。下列代碼範例會傳回管理員與使用者組態檔案的預設位置。這些命令必須在您使用 SageMaker Python SDK 的相同環境執行。 ``` import os from platformdirs import site_config_dir, user_config_dir #Prints the location of the admin config file print(os.path.join(site_config_dir("sagemaker"), "config.yaml")) #Prints the location of the user config file print(os.path.join(user_config_dir("sagemaker"), "config.yaml")) ``` 您可以透過分別針對管理員定義及使用者定義的組態檔案路徑設定 `SAGEMAKER_ADMIN_CONFIG_OVERRIDE` 與 `SAGEMAKER_USER_CONFIG_OVERRIDE` 環境變數，來覆寫這些檔案的預設位置。如果系統管理員定義及使用者定義的組態檔案均存在金鑰，則會採用使用者定義檔案的值。 # 自訂執行期環境您可以自訂執行期環境，以便使用您偏好的本機整合式開發環境 (IDE)、SageMaker 筆記本或 SageMaker Studio Classic 筆記本來撰寫機器學習 (ML) 程式碼。SageMaker AI 將協助封裝並提交您的函式及其相依性，作為 SageMaker 訓練任務。這可讓您存取 SageMaker 訓練伺服器的容量，以便執行訓練工作。遠端裝飾項目與調用函式的 `RemoteExecutor` 方法都允許使用者定義及自訂其執行期環境。您可以利用 `requirements.txt` 檔案或 conda 環境 YAML 檔案。若要同時利用 conda 環境 YAML 檔案與 `requirements.txt` 檔案來自訂執行期環境，請參閱下列程式碼範例。 ``` # specify a conda environment inside a yaml file @remote(instance_type="ml.m5.large", image_uri = "my_base_python:latest", dependencies = "./environment.yml") def matrix_multiply(a, b): return np.matmul(a, b) # use a requirements.txt file to import dependencies @remote(instance_type="ml.m5.large", image_uri = "my_base_python:latest", dependencies = './requirements.txt') def matrix_multiply(a, b): return np.matmul(a, b) ``` 或者，您可以設定 `dependencies` 為 `auto_capture`，讓 SageMaker Python SDK 在已啟用的 conda 環境擷取已安裝的相依性。若要讓 `auto_capture` 以可靠的方式運作，需要以下內容： + 您必須擁有己啟用的 conda 環境。建議不要將 `base` conda 環境用於遠端工作，以便減少潛在的相依性衝突。不採用 `base` conda 環境還可讓您以更快速度進行遠端工作環境設定。 + 您不能使用帶有參數 `--extra-index-url` 值的 pip 來安裝任何相依性。 + 在本機開發環境，使用 conda 安裝的套件與使用 pip 安裝的套件之間，不得有任何相依性衝突。 + 本機開發環境不得包含與 Linux 不相容的作業系統特定相依性。如果 `auto_capture` 無法運作，建議您以 requirement.txt 或 conda environment.yaml 檔案形式傳入相依性，如本區段第一個程式碼範例所述。 # 容器映像相容性下表顯示相容 @remote 裝飾項目的 SageMaker 訓練映像清單。 | 名稱 | Python 版本 | 映像 URI - CPU | 映像 URI - GPU | | --- | --- | --- | --- | | 資料科學 | 3.7(py37) | 僅適用 SageMaker Studio Classic 筆記本。當作為 SageMaker Studio Classic 筆記本核心映像使用時，Python SDK 會自動選取映像 URI。 | 僅適用 SageMaker Studio Classic 筆記本。當作為 SageMaker Studio Classic 筆記本核心映像使用時，Python SDK 會自動選取映像 URI。 | | 資料科學 2.0 | 3.8(py38) | 僅適用 SageMaker Studio Classic 筆記本。當作為 SageMaker Studio Classic 筆記本核心映像使用時，Python SDK 會自動選取映像 URI。 | 僅適用 SageMaker Studio Classic 筆記本。當作為 SageMaker Studio Classic 筆記本核心映像使用時，Python SDK 會自動選取映像 URI。 | | 資料科學 3.0 | 3.10(py310) | 僅適用 SageMaker Studio Classic 筆記本。當作為 SageMaker Studio Classic 筆記本核心映像使用時，Python SDK 會自動選取映像 URI。 | 僅適用 SageMaker Studio Classic 筆記本。當作為 SageMaker Studio Classic 筆記本核心映像使用時，Python SDK 會自動選取映像 URI。 | | 基本 Python 2.0 | 3.8(py38) | 當 Python SDK 偵測到開發環境正在使用 Python 3.8 執行期時，其會選擇此映像。否則，Python SDK 會在用作 SageMaker Studio Classic 筆記本核心映像時，自動選擇此映像 | 僅適用 SageMaker Studio Classic 筆記本。當作為 SageMaker Studio Classic 筆記本核心映像使用時，Python SDK 會自動選取映像 URI。 | | 基本 Python 3.0 | 3.10(py310) | 當 Python SDK 偵測到開發環境正在使用 Python 3.8 執行期時，其會選擇此映像。否則，Python SDK 會在用作 SageMaker Studio Classic 筆記本核心映像時，自動選擇此映像 | 僅適用 SageMaker Studio Classic 筆記本。Python SDK 會在用作 Studio Classic 筆記本核心映像時，自動選擇此映像 URI。 | | DLC-TensorFlow 2.12.0 用於 SageMaker 訓練 | 3.10(py310) | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.12.0-cpu-py310-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.12.0-gpu-py310-cu118-ubuntu20.04-sagemaker | | DLC-Tensorflow 2.11.0 用於 SageMaker 訓練 | 3.9(py39) | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.11.0-cpu-py39-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.11.0-gpu-py39-cu112-ubuntu20.04-sagemaker | | DLC-TensorFlow 2.10.1 用於 SageMaker 訓練 | 3.9(py39) | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.10.1-cpu-py39-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.10.1-gpu-py39-cu112-ubuntu20.04-sagemaker | | DLC-TensorFlow 2.9.2 用於 SageMaker 訓練 | 3.9(py39) | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.9.2-cpu-py39-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.9.2-gpu-py39-cu112-ubuntu20.04-sagemaker | | DLC-TensorFlow 2.8.3 用於 SageMaker 訓練 | 3.9(py39) | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.8.3-cpu-py39-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/tensorflow-training:2.8.3-gpu-py39-cu112-ubuntu20.04-sagemaker | | DLC-PyTorch 2.0.0 用於 SageMaker 訓練 | 3.10(py310) | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:2.0.0-cpu-py310-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker | | DLC-PyTorch 1.13.1 用於 SageMaker 訓練 | 3.9(py39) | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.13.1-cpu-py39-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker | | DLC-PyTorch 1.12.1 用於 SageMaker 訓練 | 3.8(py38) | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.12.1-cpu-py38-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker | | DLC-PyTorch 1.11.0 用於 SageMaker 訓練 | 3.8(py38) | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.11.0-cpu-py38-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker | | DLC-MXNet 1.9.0 用於 SageMaker 訓練 | 3.8(py38) | 763104351884.dkr.ecr..amazonaws.com/mxnet-training:1.9.0-cpu-py38-ubuntu20.04-sagemaker | 763104351884.dkr.ecr..amazonaws.com/mxnet-training:1.9.0-gpu-py38-cu112-ubuntu20.04-sagemaker | **注意** 若要使用 AWS 深度學習容器 (DLC) 映像在本機執行任務，請使用 [DLC 文件](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)中找到的映像 URIs。DLC 映像不支援相依性的 `auto_capture` 值。 [SageMaker Studio 中具有 SageMaker AI Distribution](https://github.com/aws/sagemaker-distribution#amazon-sagemaker-studio) 的任務會以名為 `sagemaker-user` 的非根使用者身分在容器中執行。此使用者需要 `/opt/ml` 和 `/tmp` 的完整存取權限。將 `sudo chmod -R 777 /opt/ml /tmp` 新增至 `pre_execution_commands` 清單以授予此權限，如下程式碼片段所示： ``` @remote(pre_execution_commands=["sudo chmod -R 777 /opt/ml /tmp"]) def func(): pass ``` 您還可以使用自訂映像執行遠端函式。為相容遠端函式，自訂映像應採用 Python 版本 3.7.x-3.10.x 構建。以下最小 Dockerfile 範例顯示如何運用 Python 3.10 來使用 Docker 映像。 ``` FROM python:3.10 #... Rest of the Dockerfile ``` 若要在映像建立 `conda` 環境並用以執行工作，請設定環境變數 `SAGEMAKER_JOB_CONDA_ENV` 為 `conda` 環境名稱。如果映像已設定為 `SAGEMAKER_JOB_CONDA_ENV` 值，則在訓練工作執行期期間遠端函式無法建立新 conda 環境。請參閱以下 Dockerfile 範例，其使用 Python 版本 3.10 的 `conda` 環境。 ``` FROM continuumio/miniconda3:4.12.0 ENV SHELL=/bin/bash \ CONDA_DIR=/opt/conda \ SAGEMAKER_JOB_CONDA_ENV=sagemaker-job-env RUN conda create -n $SAGEMAKER_JOB_CONDA_ENV \ && conda install -n $SAGEMAKER_JOB_CONDA_ENV python=3.10 -y \ && conda clean --all -f -y \ ``` 若要讓 SageMaker AI 使用 [mamba](https://mamba.readthedocs.io/en/latest/user_guide/mamba.html) 來管理容器映像的 Python 虛擬環境，請安裝[微型鑄造的 mamba 工具組](https://github.com/conda-forge/miniforge)。若要使用 mamba，請新增以下代碼範例至 Dockerfile。然後，SageMaker AI 將在執行期偵測 `mamba` 的可用性，並用其代替 `conda`。 ``` #Mamba Installation RUN curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh" \ && bash Mambaforge-Linux-x86_64.sh -b -p "/opt/conda" \ && /opt/conda/bin/conda init bash ``` 當使用遠端函式時，在 Amazon S3 儲存貯體使用自訂 conda 頻道與 mamba 不相容。如果您選擇使用 mamba，請確保您未在 Amazon S3 使用自訂 conda 頻道。如需更多資訊，請參閱**使用 Amazon S3 自訂 conda 儲存庫**的**先決條件**區段。以下是完整 Docerfile 範例，顯示如何建立相容 Docker 映像。 ``` FROM python:3.10 RUN apt-get update -y \ # Needed for awscli to work # See: https://github.com/aws/aws-cli/issues/1957#issuecomment-687455928 && apt-get install -y groff unzip curl \ && pip install --upgrade \ 'boto3>1.0<2' \ 'awscli>1.0<2' \ 'ipykernel>6.0.0<7.0.0' \ #Use ipykernel with --sys-prefix flag, so that the absolute path to #/usr/local/share/jupyter/kernels/python3/kernel.json python is used # in kernelspec.json file && python -m ipykernel install --sys-prefix #Install Mamba RUN curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh" \ && bash Mambaforge-Linux-x86_64.sh -b -p "/opt/conda" \ && /opt/conda/bin/conda init bash #cleanup RUN apt-get clean \ && rm -rf /var/lib/apt/lists/* \ && rm -rf ${HOME}/.cache/pip \ && rm Mambaforge-Linux-x86_64.sh ENV SHELL=/bin/bash \ PATH=$PATH:/opt/conda/bin ``` 執行前述 Dockerfile 範例所產生的映像也可以用作 [SageMaker Studio Classic 核心映像](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi.html)。 # 使用 Amazon SageMaker Experiments 記錄參數與指標本指南說明如何使用 Amazon SageMaker Experiments 記錄參數與指標。SageMaker AI 實驗由執行組成，每次執行都包含單一模型訓練互動的所有輸入、參數、組態與結果。您可以使用 @remote 裝飾項目或 `RemoteExecutor` API 從遠端函式記錄參數與指標。若要從遠端函式記錄參數與指標，請選擇下列其中一個方法： + 使用 SageMaker AI Experiments 程式庫的 `Run`，在遠端函式內部具現化 SageMaker 實驗執行。如需更多資訊，請參閱[建立 Amazon SageMaker AI Experiment](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-create.html)。 + 從 SageMaker AI Experiments 程式庫的遠端函式內部使用 `load_run` 函式。這將載入在遠端函式外部宣告的 `Run` 執行個體。以下區段說明如何運用先前列出的方法，使用 SageMaker AI 實驗執行來建立並追蹤歷程。以下區段同時描述 SageMaker 訓練不支援的案例。 ## 使用 @remote 裝飾項目整合 SageMaker Experiments 您可以在 SageMaker AI 具現化實驗，也可以從遠端函式內部載入目前的 SageMaker AI 實驗。下列區段顯示如何使用任一方法。 ### 利用 SageMaker Experiments 建立實驗您可以建立在 SageMaker AI 實驗建立實驗執行。若要做到這點，請傳遞實驗名稱、執行名稱與其他參數至遠端函式。下列代碼範例會匯入實驗名稱、執行名稱，以及每次執行期間要記錄的參數。參數 `param_1` 與 `param_2` 會隨著時間記錄於訓練迴路內部。常見參數可能包含批次大小或週期。在此範例，指標 `metric_a` 與 `metric_b` 會隨著執行時間記錄於訓練迴路內部。其他常見指標可能包含 `accuracy` 或 `loss`。 ``` from sagemaker.remote_function import remote from sagemaker.experiments.run import Run # Define your remote function @remote def train(value_1, value_2, exp_name, run_name): ... ... #Creates the experiment with Run( experiment_name=exp_name, run_name=run_name, ) as run: ... #Define values for the parameters to log run.log_parameter("param_1", value_1) run.log_parameter("param_2", value_2) ... #Define metrics to log run.log_metric("metric_a", 0.5) run.log_metric("metric_b", 0.1) # Invoke your remote function train(1.0, 2.0, "my-exp-name", "my-run-name") ``` ### 使用 @remote 裝飾項目初始化的工作載入目前的 SageMaker Experiments 使用 SageMaker Experiments 程式庫的 `load_run()` 函式，從執行內容載入目前執行物件。您還可以在遠端函式內使用 `load_run()` 函式。如下列代碼範例所示，將 `with` 陳述式在本機初始化的執行物件載入執行物件。 ``` from sagemaker.experiments.run import Run, load_run # Define your remote function @remote def train(value_1, value_2): ... ... with load_run() as run: run.log_metric("metric_a", value_1) run.log_metric("metric_b", value_2) # Invoke your remote function with Run( experiment_name="my-exp-name", run_name="my-run-name", ) as run: train(0.5, 1.0) ``` ## 在使用 `RemoteExecutor` API 初始化的工作內載入目前的實驗執行如工作是使用 `RemoteExecutor` API 初始化，您還可以載入目前的 SageMaker AI 實驗執行。下列代碼範例示範如何運用 SageMaker Experiments `load_run` 函式來使用 `RemoteExecutor` API。這樣做是為了載入目前的 SageMaker AI 實驗執行，並在 `RemoteExecutor` 提交的工作擷取指標。 ``` from sagemaker.experiments.run import Run, load_run def square(x): with load_run() as run: result = x * x run.log_metric("result", result) return result with RemoteExecutor( max_parallel_job=2, instance_type="ml.m5.large" ) as e: with Run( experiment_name="my-exp-name", run_name="my-run-name", ): future_1 = e.submit(square, 2) ``` ## 當使用 @remote 裝飾項目註釋代碼時，SageMaker Experiments 不支援的使用 SageMaker AI 不支援傳遞 `Run` 類型物件至 @remote 函式或使用全域 `Run` 物件。下列範例顯示將擲回 `SerializationError` 的代碼。下列代碼範例會嘗試傳遞 `Run` 類型物件至 @remote 裝飾項目，但會產生錯誤。 ``` @remote def func(run: Run): run.log_metrics("metric_a", 1.0) with Run(...) as run: func(run) ---> SerializationError caused by NotImplementedError ``` 下列代碼範例會嘗試使用在遠端函式外部具現化的全域 `run` 物件。在此代碼範例，`train()` 函式在 `with Run` 內容內部進行定義，從內部參考全域執行物件。當呼叫 `train()` 時，其會產生錯誤。 ``` with Run(...) as run: @remote def train(metric_1, value_1, metric_2, value_2): run.log_parameter(metric_1, value_1) run.log_parameter(metric_2, value_2) train("p1", 1.0, "p2", 0.5) ---> SerializationError caused by NotImplementedError ``` # 搭配 @remote 裝飾項目使用模組化代碼您可以將代碼組織為模組，以便於開發期間進行工作區管理，且仍可使用 @remote 函式來調用函式。您還可以將本機模組從開發環境複寫到遠端工作環境。若要這麼做，請設定參數 `include_local_workdir` 為 `True`，如下列代碼範例所示。 ``` @remote( include_local_workdir=True, ) ``` **注意** @remote 裝飾項目與參數必須出現在主檔案，而非在任何相依檔案。當設定 `include_local_workdir` 為 `True` 時，SageMaker AI 會封裝所有 Python 指令碼，同時維護處理程序目前目錄的目錄結構。其也會讓相依性可用於工作的工作目錄。例如，假設處理 MNIST 資料集的 Python 指令碼分為 `main.py` 指令碼和相依 `pytorch_mnist.py` 指令碼。`main.py` 會呼叫相依指令碼。此外，`main.py` 指令碼包含用於匯入相依性的程式碼，如下所示。 ``` from mnist_impl.pytorch_mnist import ... ``` `main.py` 檔案也必須包含 `@remote` 裝飾項目，且其必須將 `include_local_workdir` 參數設為 `True`。根據預設，`include_local_workdir` 參數包含目錄中的所有 Python 指令碼。您可以使用此參數搭配 `custom_file_filter` 參數，自訂要上傳至任務的檔案。您可以傳遞篩選要上傳至 S3 的任務相依性的函式，或傳遞指定遠端函式要忽略的本機目錄和檔案的 `CustomFileFilter` 物件。您只能在 `include_local_workdir` 設為 `True` 時使用 `custom_file_filter`，否則參數會遭忽略。下列範例使用 `CustomFileFilter`，在將檔案上傳至 S3 時，忽略所有筆記本檔案和名為 `data` 的資料夾或檔案。 ``` @remote( include_local_workdir=True, custom_file_filter=CustomFileFilter( ignore_name_patterns=[ # files or directories to ignore "*.ipynb", # all notebook files "data", # folter or file named data ] ) ) ``` 下列範例說明如何封裝整個工作區。 ``` @remote( include_local_workdir=True, custom_file_filter=CustomFileFilter( ignore_pattern_names=[] # package whole workspace ) ) ``` 下列範例說明如何使用函式來篩選檔案。 ``` import os def my_filter(path: str, files: List[str]) -> List[str]: to_ignore = [] for file in files: if file.endswith(".txt") or file.endswith(".ipynb"): to_ignore.append(file) return to_ignore @remote( include_local_workdir=True, custom_file_filter=my_filter ) ``` ## 建構工作目錄最佳實務下列最佳實務說明如何在模組化程式碼中使用 `@remote` 裝飾項目時，組織目錄結構。 + 將 @remote 裝飾項目放在位於工作區根層級目錄的檔案。 + 在根層級建構本機模組。下列範例映像顯示建議的目錄結構。在此範例結構，`main.py` 指令碼位於根層級目錄。 ``` . ├── config.yaml ├── data/ ├── main.py <----------------- @remote used here ├── mnist_impl │ ├── __pycache__/ │ │ └── pytorch_mnist.cpython-310.pyc │ ├── pytorch_mnist.py <-------- dependency of main.py ├── requirements.txt ``` 下列範例映像顯示的目錄結構說明，當使用該目錄結構來運用 @remote 裝飾項目註釋代碼時，會導致行為不一致。在此範例結構，包含 @remote 裝飾項目的 `main.py` 指令碼並**不**位於根層級目錄。**不**推薦採用以下結構。 ``` . ├── config.yaml ├── entrypoint │ ├── data │ └── main.py <----------------- @remote used here ├── mnist_impl │ ├── __pycache__ │ │ └── pytorch_mnist.cpython-310.pyc │ └── pytorch_mnist.py <-------- dependency of main.py ├── requirements.txt ``` # 適用執行期相依性的私有儲存庫您可以使用預先執行命令或指令碼在工作環境設定像 pip 或 conda 這樣的相依性管理器。若要達成網路隔離，請使用以下任一選項重新導向相依性管理器來存取私有儲存庫並在 VPC 執行遠端函式。在遠端函式執行之前，將先執行預先執行命令或指令碼。您可以使用 @remote 裝飾項目、`RemoteExecutor` API 或組態檔案來加以定義。下列各節說明如何存取使用管理的私有 Python 套件索引 (PyPI) 儲存庫 AWS CodeArtifact。這些區段還顯示如何存取 Amazon Simple Storage Service (Amazon S3) 託管的自訂 conda 頻道。 ## 如何使用透過 AWS CodeArtifact 管理的自訂 PyPI 儲存庫若要使用 CodeArtifact 來管理自訂 PyPI 儲存庫，需要以下先決條件： + 您應已建立私有 PyPI 儲存庫。您可以使用 AWS CodeArtifact 來建立和管理私有套件儲存庫。若要進一步了解 CodeArtifact，請參閱 [CodeArtifact 使用者指南](https://docs.aws.amazon.com/codeartifact/latest/ug/welcome.html)。 + 您的 VPC 應可存取 CodeArtifact 儲存庫。若要允許從您的 VPC 連接 CodeArtifact 儲存庫，您必須執行以下操作： + [為 CodeArtifact 建立 VPC 端點](https://docs.aws.amazon.com/codeartifact/latest/ug/create-vpc-endpoints.html)。 + 為您的 VPC [建立 Amazon S3 閘道端點](https://docs.aws.amazon.com/codeartifact/latest/ug/create-s3-gateway-endpoint.html)，以便讓 CodeArtifact 儲存套件資產。下列預先執行命令範例顯示如何在 SageMaker AI 訓練任務設定 pip，以便指向 CodeArtifact 儲存庫。如需更多資訊，請參閱[使用 CodeArtifact 設定及使用 pip](https://docs.aws.amazon.com/codeartifact/latest/ug/python-configure-pip.html)。 ``` # use a requirements.txt file to import dependencies @remote( instance_type="ml.m5.large" image_uri = "my_base_python:latest", dependencies = './requirements.txt', pre_execution_commands=[ "aws codeartifact login --tool pip --domain my-org --domain-owner <000000000000> --repository my-codeartifact-python-repo --endpoint-url https://vpce-xxxxx.api.codeartifact.us-east-1.vpce.amazonaws.com" ] ) def matrix_multiply(a, b): return np.matmul(a, b) ``` ## 如何使用 Amazon S3 託管的自訂 conda 頻道若要使用 Amazon S3 管理自訂 conda 儲存庫，必須符合下列先決條件： + 您的 Amazon S3 儲存貯體必須已設定私有 conda 頻道，且所有相依套件均須已編製索引並上傳至 Amazon S3 儲存貯體。如需有關如何編製索引 conda 套件的指示，請參閱[建立自訂頻道](https://conda.io/projects/conda/en/latest/user-guide/tasks/create-custom-channels.html)。 + 您的 VPC 應具有 Amazon S3 儲存貯體的存取權。如需更多資訊，請參閱 [Amazon S3 的端點](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html)。 + 應已安裝 `boto3` 至工作映像的基本 conda 環境。若要檢查您的環境，請在 Anaconda 提示輸入以下內容，即可檢查產生的清單是否包含 `boto3`。 ``` conda list -n base ``` + 您的工作映像應與 conda 一起安裝，而非 [mamba](https://mamba.readthedocs.io/en/latest/installation.html)。若要檢查環境，請確保先前的代碼提示不會傳回 `mamba`。以下預先執行命令範例顯示如何在 SageMaker 訓練工作設定 conda，以便指向 Amazon S3 的私有頻道。預先執行命令會移除預設頻道，並新增自訂頻道至 `.condarc` conda 組態檔案。 ``` # specify your dependencies inside a conda yaml file @remote( instance_type="ml.m5.large" image_uri = "my_base_python:latest", dependencies = "./environment.yml", pre_execution_commands=[ "conda config --remove channels 'defaults'" "conda config --add channels 's3://my_bucket/my-conda-repository/conda-forge/'", "conda config --add channels 's3://my_bucket/my-conda-repository/main/'" ] ) def matrix_multiply(a, b): return np.matmul(a, b) ``` # 範例筆記本您可以將現有工作區環境的訓練代碼，以及任何關聯的資料處理代碼與資料集轉換為 SageMaker 訓練工作。下列筆記本顯示如何使用 XGBoost 演算法與 Hugging Face 來針對映像分類問題自訂環境、任務設定等等。 [quick\$1start 筆記本](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-remote-function/quick_start/quick_start.ipynb)包含下列代碼範例： + 如何利用組態檔案自訂任務設定。 + 如何非同步調用 Python 函式作為任務。 + 如何透過引入其他相依性來自訂任務執行期環境。 + 如何利用 @remote 函式方法使用本機相依性。下列筆記本針對不同機器學習 (ML) 問題類型及實作提供其他程式碼範例。 + 若要針對使用 @remote 裝飾項目來解決映像分類問題查看代碼範例，請開啟 [pytorch\$1mnist.ipynb](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-remote-function/pytorch_mnist_sample_notebook) 筆記本。此分類問題使用已修改的國家標準技術研究所 (MNIST) 範例資料集來辨識手寫數字。 + 若要針對利用指令碼使用 @remote 裝飾項目來解決先前的映像分類問題查看代碼範例，請參閱 Pytorch MNIST 範例指令碼 [train.py](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-remote-function/pytorch_mnist_sample_script)。 + 若要查看 XGBoost 演算法如何使用 @remote 裝飾項目來實作：請開啟 [xgboost\$1abalone.ipynb](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-remote-function/xgboost_abalone) 筆記本。 + 若要查看 Hugging Face 如何整合 @remote 裝飾項目：請開啟 [huggingface.ipynb](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-remote-function/huggingface_text_classification) 筆記本。