使用 `modeltraining` 命令进行模型训练

您可以使用 Neptune ML modeltraining 命令创建模型训练任务、检查其状态、停止它或列出所有活动的模型训练任务。

使用 Neptune ML `modeltraining` 命令创建模型训练任务

用于创建全新任务的 Neptune ML modeltraining 命令如下所示：

用于为增量模型训练创建更新任务的 Neptune ML modeltraining 命令如下所示：

AWS CLI


aws neptunedata start-ml-model-training-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --id "(a unique model-training job ID)" \
  --data-processing-job-id "(the data-processing job-id of a completed job)" \
  --train-model-s3-location "s3://(your S3 bucket)/neptune-model-graph-autotrainer" \
  --previous-model-training-job-id "(the job ID of a completed model-training job to update)"

有关更多信息，请参阅《 AWS CLI 命令参考》中的 start-ml-model-training-job。

SDK


import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_model_training_job(
    id='(a unique model-training job ID)',
    dataProcessingJobId='(the data-processing job-id of a completed job)',
    trainModelS3Location='s3://(your S3 bucket)/neptune-model-graph-autotrainer',
    previousModelTrainingJobId='(the job ID of a completed model-training job to update)'
)

print(response)

awscurl


awscurl https://your-neptune-endpoint:port/ml/modeltraining \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your S3 bucket)/neptune-model-graph-autotrainer",
        "previousModelTrainingJobId" : "(the job ID of a completed model-training job to update)"
      }'

注意

此示例假设您的 AWS 证书是在您的环境中配置的。us-east-1替换为 Neptune 集群的区域。

curl


curl \
  -X POST https://your-neptune-endpoint:port/ml/modeltraining \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your S3 bucket)/neptune-model-graph-autotrainer",
        "previousModelTrainingJobId" : "(the job ID of a completed model-training job to update)"
      }'

使用用户提供的自定义模型实现创建新任务的 Neptune ML modeltraining 命令如下所示：

AWS CLI


aws neptunedata start-ml-model-training-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --id "(a unique model-training job ID)" \
  --data-processing-job-id "(the data-processing job-id of a completed job)" \
  --train-model-s3-location "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer" \
  --model-name "custom" \
  --custom-model-training-parameters '{
    "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
    "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
    "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
  }'

有关更多信息，请参阅《 AWS CLI 命令参考》中的 start-ml-model-training-job。

SDK


import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_model_training_job(
    id='(a unique model-training job ID)',
    dataProcessingJobId='(the data-processing job-id of a completed job)',
    trainModelS3Location='s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer',
    modelName='custom',
    customModelTrainingParameters={
        'sourceS3DirectoryPath': 's3://(your Amazon S3 bucket)/(path to your Python module)',
        'trainingEntryPointScript': '(your training script entry-point name in the Python module)',
        'transformEntryPointScript': '(your transform script entry-point name in the Python module)'
    }
)

print(response)

awscurl


awscurl https://your-neptune-endpoint:port/ml/modeltraining \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer",
        "modelName": "custom",
        "customModelTrainingParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'

注意

此示例假设您的 AWS 证书是在您的环境中配置的。us-east-1替换为 Neptune 集群的区域。

curl


curl \
  -X POST https://your-neptune-endpoint:port/ml/modeltraining \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer",
        "modelName": "custom",
        "customModelTrainingParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'

用于创建 `modeltraining` 任务的参数

id –（可选）新任务的唯一标识符。

类型：字符串。默认：自动生成的 UUID。
dataProcessingJobId –（必需）已完成的数据处理任务的任务 ID，该任务已创建训练将使用的数据。

类型：字符串。
trainModelS3Location –（必需）Amazon S3 中要存储模型构件的位置。

类型：字符串。
previousModelTrainingJobId –（可选）已完成的模型训练任务的任务 ID，您要根据更新的数据以递增方式更新该任务。

类型：字符串。默认值：无。
sagemakerIamRoleArn—（可选）用于 A SageMaker I 执行的 IAM 角色的 ARN。

类型：字符串。注意：必须将其列在您的数据库集群参数组中，否则将发生错误。
neptuneIamRoleArn—（可选）向 Neptune 提供 AI 和 A SageMaker mazon S3 资源访问权限的 IAM 角色的 ARN。

类型：字符串。注意：必须将其列在您的数据库集群参数组中，否则将发生错误。
modelName –（可选）用于训练的模型类型。默认情况下，机器学习模型是根据数据处理中使用的 modelType 自动生成的，但您可以在此处指定不同的模型类型。

类型：字符串。默认：rgcn 用于异构图形，kge 用于知识图谱。有效值：对于异构图形：rgcn。对于 kge 图形：transe、distmult 或 rotate。对于自定义模型实现：custom。
baseProcessingInstanceType –（可选）用于准备和管理机器学习模型训练的机器学习实例的类型。

类型：字符串。注意：这是根据用于处理训练数据和模型的内存要求选择的 CPU 实例。请参阅为模型训练和模型转换选择实例。
trainingInstanceType –（可选）用于模型训练的 ML 实例的类型。所有 Neptune ML 模型都支持 CPU、GPU 和多 GPU 训练。

类型：字符串。默认值：ml.p3.2xlarge。

注意：为训练选择正确的实例类型取决于任务类型、图形大小和预算。请参阅为模型训练和模型转换选择实例。
trainingInstanceVolumeSizeInGB –（可选）训练实例的磁盘卷大小。输入数据和输出模型都存储在磁盘上，因此卷大小必须足够大，以容纳两个数据集。

类型：整数。默认值：0。

注意：如果未指定或为 0，Neptune ML 会根据数据处理步骤中生成的建议选择磁盘卷大小。请参阅为模型训练和模型转换选择实例。
trainingTimeOutInSeconds –（可选）训练任务的超时（以秒为单位）。

类型：整数。默认值：86,400（1 天）。
maxHPONumberOfTrainingJobs - 超参数调整任务要启动的最大训练任务总数。

类型：整数。默认值：2。

注意：Neptune ML 会自动调整机器学习模型的超参数。要获得性能良好的模型，请至少使用 10 个任务（换句话说，将 maxHPONumberOfTrainingJobs 设置为 10）。通常，调整次数越多，结果越好。
maxHPOParallelTrainingJobs – 为超参数调整任务启动的最大并行训练任务数。

类型：整数。默认值：2。

注意：您可以运行的并行任务数量受训练实例上可用资源的限制。
subnets—（可选）Ne IDs ptune VPC 中的子网。

类型：字符串列表。默认值：无。
securityGroupIds—（可选）VPC 安全组 IDs。

类型：字符串列表。默认值：无。
volumeEncryptionKMSKey—（可选） SageMaker AI 用来加密连接到运行训练作业的 ML 计算实例的存储卷上的数据的 AWS Key Management Service (AWS KMS) 密钥。

类型：字符串。默认值：无。
s3OutputEncryptionKMSKey—（可选AWS KMS） SageMaker AI 用来加密处理作业输出的 AWS Key Management Service () 密钥。

类型：字符串。默认值：无。
enableInterContainerTrafficEncryption –（可选）在训练或超参数调整任务中启用或禁用容器间流量加密。

类型：布尔值。默认值：True。

注意
enableInterContainerTrafficEncryption 参数仅在引擎版本 1.2.0.2.R3 中可用。
enableManagedSpotTraining –（可选）使用 Amazon Elastic Compute Cloud 竞价型实例优化训练机器学习模型的成本。有关更多信息，请参阅 Amazon 中的托管竞技训练 SageMaker。

类型：布尔值。默认值：False。
customModelTrainingParameters –（可选）自定义模型训练的配置。这是具有以下字段的 JSON 对象：
- sourceS3DirectoryPath –（必需）实现您的模型的 Python 模块所在的 Amazon S3 位置的路径。这必须指向有效的现有 Amazon S3 位置，其中至少包含训练脚本、转换脚本和 model-hpo-configuration.json 文件。
- trainingEntryPointScript –（可选）执行模型训练并将超参数作为命令行参数（包括固定的超参数）的脚本模块中入口点的名称。
  
  默认值：training.py。
- transformEntryPointScript –（可选）脚本模块中入口点的名称，该脚本应在确定超参数搜索中的最佳模型之后运行，以计算模型部署所需的模型构件。它应该能够在没有命令行参数的情况下运行。
  
  默认值：transform.py。
maxWaitTime –（可选）使用竞价型实例执行模型训练时等待的最长时间，以秒为单位。应大于 trainingTimeOutInSeconds。

类型：整数。

使用 Neptune ML `modeltraining` 命令获取模型训练任务的状态

用于显示任务状态的示例 Neptune ML modeltraining 命令如下所示：

`modeltraining` 任务状态的参数

id –（必需）模型训练任务的唯一标识符。

类型：字符串。
neptuneIamRoleArn—（可选）向 Neptune 提供 AI 和 A SageMaker mazon S3 资源访问权限的 IAM 角色的 ARN。

类型：字符串。注意：必须将其列在您的数据库集群参数组中，否则将发生错误。

使用 Neptune ML `modeltraining` 命令停止模型训练任务

用于停止任务的示例 Neptune ML modeltraining 命令如下所示：

`modeltraining` 停止任务的参数

id –（必需）模型训练任务的唯一标识符。

类型：字符串。
neptuneIamRoleArn—（可选）向 Neptune 提供 AI 和 A SageMaker mazon S3 资源访问权限的 IAM 角色的 ARN。

类型：字符串。注意：必须将其列在您的数据库集群参数组中，否则将发生错误。
clean –（可选）此标志指定在任务停止时应删除所有 Amazon S3 构件。

类型：布尔值。默认值：FALSE。

使用 Neptune ML `modeltraining` 命令列出活动的模型训练任务

用于列出活动任务的示例 Neptune ML modeltraining 命令如下所示：

`modeltraining` 列出任务的参数

maxItems –（可选），表示要返回的最大项目数。

类型：整数。默认值：10。允许的最大值：1024。
neptuneIamRoleArn—（可选）向 Neptune 提供 AI 和 A SageMaker mazon S3 资源访问权限的 IAM 角色的 ARN。

类型：字符串。注意：必须将其列在您的数据库集群参数组中，否则将发生错误。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

dataprocessing 命令

modeltransform 命令

使用 modeltraining 命令进行模型训练

使用 Neptune ML modeltraining 命令创建模型训练任务

注意

注意

注意

用于创建 modeltraining 任务的参数

注意

使用 Neptune ML modeltraining 命令获取模型训练任务的状态

注意

modeltraining 任务状态的参数

使用 Neptune ML modeltraining 命令停止模型训练任务

注意

modeltraining 停止任务的参数

使用 Neptune ML modeltraining 命令列出活动的模型训练任务

注意

modeltraining 列出任务的参数