AWS PCS 中的 Job 完成日志 - AWS PC

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

AWS PCS 中的 Job 完成日志

任务完成日志可在 AWS 并行计算服务 (AWS PCS) 任务完成时为您提供有关这些任务的关键细节,无需支付额外费用。您可以使用其他 AWS 服务来访问和处理您的日志数据,例如亚马逊 CloudWatch 日志、亚马逊简单存储服务 (Amazon S3) Service 和 Amazon Data Firehose AWS ;PCS 会记录有关您的任务的元数据,例如以下内容。

  • Job ID 和名称

  • 用户和群组信息

  • Job 状态(例如COMPLETEDFAILEDCANCELLED

  • 使用的分区

  • 时间限制

  • 开始、结束、提交和符合条件的时间

  • 节点列表和计数

  • 处理器数量

  • 工作目录

  • 资源使用情况(CPU、内存)

  • 退出代码

  • 节点详情(名称、实例 IDs、实例类型)

先决条件

管理 AWS PCS 集群的 IAM 委托人必须允许该pcs:AllowVendedLogDeliveryForResource操作。

以下示例 IAM 策略授予所需的权限。

{ "Version": "2012-10-17", "Statement": [ { "Sid": "PcsAllowVendedLogsDelivery", "Effect": "Allow", "Action": ["pcs:AllowVendedLogDeliveryForResource"], "Resource": [ "arn:aws:pcs:::cluster/*" ] } ] }

设置任务完成日志

您可以使用 AWS Management Console 或为 AWS PCS 集群设置任务完成日志 AWS CLI。

AWS Management Console
使用控制台设置任务完成日志
  1. 打开 AWS PCS 控制台

  2. 在导航窗格中,选择集群

  3. 选择要在其中添加任务完成日志的集群。

  4. 在集群详细信息页面上,选择日志选项卡。

  5. 在 “任务完成日志” 下,选择 “添加”,从日志、Amazon S3 和 Firehose 中添加最多 3 个 CloudWatch 日志传输目标。

  6. 选择 “更新日志传送”。

AWS CLI
要使用设置任务完成日志 AWS CLI
  1. 创建日志传输目标:

    aws logs put-delivery-destination --region region \ --name pcs-logs-destination \ --delivery-destination-configuration \ destinationResourceArn=resource-arn

    进行如下替换:

    • region— 您要在 AWS 区域 哪里创建目的地,例如 us-east-1

    • pcs-logs-destination— 目的地的名称

    • resource-arn— CloudWatch 日志组、S3 存储桶或 Firehose 传输流的亚马逊资源名称 (ARN)。

    有关更多信息,请参阅 Amazon CloudWatch 日志 API 参考PutDeliveryDestination中的。

  2. 将 PCS 集群设置为日志传输源:

    aws logs put-delivery-source --region region \ --name cluster-logs-source-name \ --resource-arn cluster-arn \ --log-type PCS_JOBCOMP_LOGS

    进行如下替换:

    • region— 您的 AWS 区域 集群的,例如 us-east-1

    • cluster-logs-source-name— 来源的名称

    • cluster-arn— 您的 PCS 集群的 AWS ARN

    有关更多信息,请参阅 Amazon CloudWatch 日志 API 参考PutDeliverySource中的。

  3. 将传送源连接到传送目的地:

    aws logs create-delivery --region region \ --delivery-source-name cluster-logs-source \ --delivery-destination-arn destination-arn

    进行如下替换:

    • region— 那个 AWS 区域,比如 us-east-1

    • cluster-logs-source— 您的配送来源的名称

    • destination-arn— 您的配送目的地的 ARN

    有关更多信息,请参阅 Amazon CloudWatch 日志 API 参考CreateDelivery中的。

如何查找任务完成日志

您可以在日志和 Amazon S3 中配置 CloudWatch 日志目标。 AWS PCS 使用以下结构化路径名和文件名。

CloudWatch 日志

AWS PCS 对 CloudWatch 日志流使用以下名称格式:

AWSLogs/PCS/cluster-id/jobcomp.log

例如:AWSLogs/PCS/pcs_abc123de45/jobcomp.log

Amazon S3

AWS PCS 对 S3 路径使用以下名称格式:

AWSLogs/account-id/PCS/region/cluster-id/jobcomp/year/month/day/hour/

例如:AWSLogs/111122223333/PCS/us-east-1/pcs_abc123de45/jobcomp/2025/06/19/11/

AWS PCS 对日志文件使用以下名称格式:

PCS_jobcomp_year-month-day-hour_cluster-id_random-id.log.gz

例如:PCS_jobcomp_2025-06-19-11_pcs_abc123de45_04be080b.log.gz

Job 完成日志字段

AWS PCS 将任务完成日志数据写为 JSON 对象。JSON jobcomp 容器包含任务详细信息。下表描述了jobcomp容器内的字段。有些字段仅在特定情况下才会出现,例如用于数组作业或异构作业。

Job 完成日志字段
名称 示例值 必需 备注
job_id 11 始终以价值为本
user "root" 始终以价值为本
user_id 0 始终以价值为本
group "root" 始终以价值为本
group_id 0 始终以价值为本
name "wrap" 始终以价值为本
job_state "COMPLETED" 始终以价值为本
partition "Hydra-MpiQueue-abcdef01-7" 始终以价值为本
time_limit "UNLIMITED" 永远在场,但可能是 "UNLIMITED"
start_time "2025-06-19T10:58:57" 永远在场,但可能是 "Unknown"
end_time "2025-06-19T10:58:57" 永远在场,但可能是 "Unknown"
node_list "Hydra-MpiNG-abcdef01-2345-1" 始终以价值为本
node_cnt 1 始终以价值为本
proc_cnt 1 始终以价值为本
work_dir "/root" 永远在场,但可能是 "Unknown"
reservation_name "weekly_maintenance" 始终存在,但可能是一个空字符串 ""
tres.cpu 1 始终以价值为本
tres.mem.val 600 始终以价值为本
tres.mem.unit "M" 可以是"M""bb"
tres.node 1 始终以价值为本
tres.billing 1 始终以价值为本
account "finance" 始终存在,但可能是一个空字符串 ""
qos "normal" 始终存在,但可能是一个空字符串 ""
wc_key "project_1" 始终存在,但可能是一个空字符串 ""
cluster "unknown" 永远在场,但可能是 "unknown"
submit_time "2025-06-19T10:55:46" 永远在场,但可能是 "Unknown"
eligible_time "2025-06-19T10:55:46" 永远在场,但可能是 "Unknown"
array_job_id 12 仅当作业是阵列作业时才会出现
array_task_id 1 仅当作业是阵列作业时才会出现
het_job_id 10 仅当作业是异构作业时才会出现
het_job_offset 0 仅当作业是异构作业时才会出现
derived_exit_code_status 0 始终以价值为本
derived_exit_code_signal 0 始终以价值为本
exit_code_status 0 始终以价值为本
exit_code_signal 0 始终以价值为本
node_details[0].name "Hydra-MpiNG-abcdef01-2345-1" 永远在场,但node_details可能是 "[]"
node_details[0].instance_id "i-0abcdef01234567a" 永远在场,但node_details可能是 "[]"
node_details[0].instance_type "t4g.micro" 永远在场,但node_details可能是 "[]"

作业完成日志示例

以下示例显示了各种作业类型和状态的任务完成日志:

{ "jobcomp": { "job_id": 1, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:32:57", "end_time": "2025-06-19T16:33:03", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:29:40", "eligible_time": "2025-06-19T16:29:41", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 2, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:33:13", "end_time": "2025-06-19T16:33:14", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:33:13", "eligible_time": "2025-06-19T16:33:13", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 3, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T22:58:57", "end_time": "2025-06-19T22:58:57", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T22:55:46", "eligible_time": "2025-06-19T22:55:46", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 4, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "525600", "start_time": "2025-06-19T23:04:27", "end_time": "2025-06-19T23:04:27", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:01:38", "eligible_time": "2025-06-19T23:01:38", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 5, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "FAILED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:00", "end_time": "2025-06-19T23:09:00", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 1, "unit": "G" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:00", "eligible_time": "2025-06-19T23:09:00", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 6, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:36", "end_time": "2025-06-19T23:09:36", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:35", "eligible_time": "2025-06-19T23:09:36", "het_job_id": 6, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 7, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:10:03", "end_time": "2025-06-19T23:10:03", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:10:03", "eligible_time": "2025-06-19T23:10:03", "het_job_id": 7, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 8, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 9, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 10, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 11, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 600, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 13, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:57", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 12, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:58", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 2, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } }