本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
处理从 Neptune 导出的图形数据以用于训练
数据处理步骤采用导出过程创建的 Neptune 图形数据,并创建深度图表库 (DGL) 在训练期间使用的信息。这包括执行各种数据映射和转换:
管理 Neptune ML 的数据处理步骤
从 Neptune 中导出要用于模型训练的数据后,可以使用如下命令启动数据处理作业:
- AWS CLI
-
aws neptunedata start-ml-data-processing-job \
--endpoint-url https://your-neptune-endpoint:port \
--input-data-s3-location "s3://(S3 bucket name)/(path to your input folder)" \
--id "(a job ID for the new job)" \
--processed-data-s3-location "s3://(S3 bucket name)/(path to your output folder)" \
--config-file-name "training-job-configuration.json"
有关更多信息,请参阅《 AWS CLI 命令参考》中的 start-ml-data-processing-job。
- SDK
-
import boto3
from botocore.config import Config
client = boto3.client(
'neptunedata',
endpoint_url='https://your-neptune-endpoint:port',
config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)
response = client.start_ml_data_processing_job(
inputDataS3Location='s3://(S3 bucket name)/(path to your input folder)',
id='(a job ID for the new job)',
processedDataS3Location='s3://(S3 bucket name)/(path to your output folder)',
configFileName='training-job-configuration.json'
)
print(response)
- awscurl
-
awscurl https://your-neptune-endpoint:port/ml/dataprocessing \
--region us-east-1 \
--service neptune-db \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(S3 bucket name)/(path to your output folder)",
"configFileName" : "training-job-configuration.json"
}'
此示例假设您的 AWS 证书是在您的环境中配置的。us-east-1替换为 Neptune 集群的区域。
- curl
-
curl \
-X POST https://your-neptune-endpoint:port/ml/dataprocessing \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(S3 bucket name)/(path to your output folder)",
"configFileName" : "training-job-configuration.json"
}'
dataprocessing 命令中解释了如何使用此命令的详细信息,以及有关如何获取正在运行的任务的状态、如何停止正在运行的任务以及如何列出所有正在运行的任务的信息。
处理 Neptune ML 的更新图形数据
您也可以向 API 提供 previousDataProcessingJobId,以确保新的数据处理任务使用与先前任务相同的处理方法。当您想通过在新数据上重新训练旧模型,或者在新数据上重新计算模型构件,来获得对 Neptune 中更新的图形数据的预测时,这是必需的。
你可以使用这样的命令来做到这一点:
- AWS CLI
-
aws neptunedata start-ml-data-processing-job \
--endpoint-url https://your-neptune-endpoint:port \
--input-data-s3-location "s3://(Amazon S3 bucket name)/(path to your input folder)" \
--id "(a job ID for the new job)" \
--processed-data-s3-location "s3://(Amazon S3 bucket name)/(path to your output folder)" \
--previous-data-processing-job-id "(the job ID of the previous data-processing job)"
有关更多信息,请参阅《 AWS CLI 命令参考》中的 start-ml-data-processing-job。
- SDK
-
import boto3
from botocore.config import Config
client = boto3.client(
'neptunedata',
endpoint_url='https://your-neptune-endpoint:port',
config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)
response = client.start_ml_data_processing_job(
inputDataS3Location='s3://(Amazon S3 bucket name)/(path to your input folder)',
id='(a job ID for the new job)',
processedDataS3Location='s3://(Amazon S3 bucket name)/(path to your output folder)',
previousDataProcessingJobId='(the job ID of the previous data-processing job)'
)
print(response)
- awscurl
-
awscurl https://your-neptune-endpoint:port/ml/dataprocessing \
--region us-east-1 \
--service neptune-db \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your output folder)",
"previousDataProcessingJobId" : "(the job ID of the previous data-processing job)"
}'
此示例假设您的 AWS 证书是在您的环境中配置的。us-east-1替换为 Neptune 集群的区域。
- curl
-
curl \
-X POST https://your-neptune-endpoint:port/ml/dataprocessing \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your output folder)",
"previousDataProcessingJobId" : "(the job ID of the previous data-processing job)"
}'
将 previousDataProcessingJobId 参数的值设置为与训练后的模型对应的先前数据处理任务的任务 ID。
目前不支持在更新的图形中删除节点。如果在更新的图形中移除了节点,则必须启动一个全新的数据处理任务,而不是使用 previousDataProcessingJobId。