本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
AWS PCS 中的 Job 完成日志
任务完成日志可在 AWS 并行计算服务 (AWS PCS) 任务完成时为您提供有关这些任务的关键细节,无需支付额外费用。您可以使用其他 AWS 服务来访问和处理您的日志数据,例如亚马逊 CloudWatch 日志、亚马逊简单存储服务 (Amazon S3) Service 和 Amazon Data Firehose AWS ;PCS 会记录有关您的任务的元数据,例如以下内容。
-
Job ID 和名称
-
用户和群组信息
-
Job 状态(例如
COMPLETED
、FAILED
、CANCELLED
) -
使用的分区
-
时间限制
-
开始、结束、提交和符合条件的时间
-
节点列表和计数
-
处理器数量
-
工作目录
-
资源使用情况(CPU、内存)
-
退出代码
-
节点详情(名称、实例 IDs、实例类型)
先决条件
管理 AWS PCS 集群的 IAM 委托人必须允许该pcs:AllowVendedLogDeliveryForResource
操作。
以下示例 IAM 策略授予所需的权限。
{ "Version": "2012-10-17", "Statement": [ { "Sid": "PcsAllowVendedLogsDelivery", "Effect": "Allow", "Action": ["pcs:AllowVendedLogDeliveryForResource"], "Resource": [ "arn:aws:pcs:::cluster/*" ] } ] }
设置任务完成日志
您可以使用 AWS Management Console 或为 AWS PCS 集群设置任务完成日志 AWS CLI。
如何查找任务完成日志
您可以在日志和 Amazon S3 中配置 CloudWatch 日志目标。 AWS PCS 使用以下结构化路径名和文件名。
CloudWatch 日志
AWS PCS 对 CloudWatch 日志流使用以下名称格式:
AWSLogs/PCS/
cluster-id
/jobcomp.log
例如:AWSLogs/PCS/pcs_abc123de45/jobcomp.log
Amazon S3
AWS PCS 对 S3 路径使用以下名称格式:
AWSLogs/
account-id
/PCS/region
/cluster-id
/jobcomp/year
/month
/day
/hour
/
例如:AWSLogs/111122223333/PCS/us-east-1/pcs_abc123de45/jobcomp/2025/06/19/11/
AWS PCS 对日志文件使用以下名称格式:
PCS_jobcomp_
year
-month
-day
-hour
_cluster-id
_random-id
.log.gz
例如:PCS_jobcomp_2025-06-19-11_pcs_abc123de45_04be080b.log.gz
Job 完成日志字段
AWS PCS 将任务完成日志数据写为 JSON 对象。JSON jobcomp
容器包含任务详细信息。下表描述了jobcomp
容器内的字段。有些字段仅在特定情况下才会出现,例如用于数组作业或异构作业。
名称 | 示例值 | 必需 | 备注 |
---|---|---|---|
job_id |
11 |
是 | 始终以价值为本 |
user |
"root" |
是 | 始终以价值为本 |
user_id |
0 |
是 | 始终以价值为本 |
group |
"root" |
是 | 始终以价值为本 |
group_id |
0 |
是 | 始终以价值为本 |
name |
"wrap" |
是 | 始终以价值为本 |
job_state |
"COMPLETED" |
是 | 始终以价值为本 |
partition |
"Hydra-MpiQueue-abcdef01-7" |
是 | 始终以价值为本 |
time_limit |
"UNLIMITED" |
是 | 永远在场,但可能是 "UNLIMITED" |
start_time |
"2025-06-19T10:58:57" |
是 | 永远在场,但可能是 "Unknown" |
end_time |
"2025-06-19T10:58:57" |
是 | 永远在场,但可能是 "Unknown" |
node_list |
"Hydra-MpiNG-abcdef01-2345-1" |
是 | 始终以价值为本 |
node_cnt |
1 |
是 | 始终以价值为本 |
proc_cnt |
1 |
是 | 始终以价值为本 |
work_dir |
"/root" |
是 | 永远在场,但可能是 "Unknown" |
reservation_name |
"weekly_maintenance" |
是 | 始终存在,但可能是一个空字符串 "" |
tres.cpu |
1 |
是 | 始终以价值为本 |
tres.mem.val |
600 |
是 | 始终以价值为本 |
tres.mem.unit |
"M" |
是 | 可以是"M" 或 "bb" |
tres.node |
1 |
是 | 始终以价值为本 |
tres.billing |
1 |
是 | 始终以价值为本 |
account |
"finance" |
是 | 始终存在,但可能是一个空字符串 "" |
qos |
"normal" |
是 | 始终存在,但可能是一个空字符串 "" |
wc_key |
"project_1" |
是 | 始终存在,但可能是一个空字符串 "" |
cluster |
"unknown" |
是 | 永远在场,但可能是 "unknown" |
submit_time |
"2025-06-19T10:55:46" |
是 | 永远在场,但可能是 "Unknown" |
eligible_time |
"2025-06-19T10:55:46" |
是 | 永远在场,但可能是 "Unknown" |
array_job_id |
12 |
否 | 仅当作业是阵列作业时才会出现 |
array_task_id |
1 |
否 | 仅当作业是阵列作业时才会出现 |
het_job_id |
10 |
否 | 仅当作业是异构作业时才会出现 |
het_job_offset |
0 |
否 | 仅当作业是异构作业时才会出现 |
derived_exit_code_status |
0 |
是 | 始终以价值为本 |
derived_exit_code_signal |
0 |
是 | 始终以价值为本 |
exit_code_status |
0 |
是 | 始终以价值为本 |
exit_code_signal |
0 |
是 | 始终以价值为本 |
node_details[0].name |
"Hydra-MpiNG-abcdef01-2345-1" |
否 | 永远在场,但node_details 可能是 "[]" |
node_details[0].instance_id |
"i-0abcdef01234567a" |
否 | 永远在场,但node_details 可能是 "[]" |
node_details[0].instance_type |
"t4g.micro" |
否 | 永远在场,但node_details 可能是 "[]" |
作业完成日志示例
以下示例显示了各种作业类型和状态的任务完成日志:
{ "jobcomp": { "job_id": 1, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:32:57", "end_time": "2025-06-19T16:33:03", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:29:40", "eligible_time": "2025-06-19T16:29:41", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 2, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:33:13", "end_time": "2025-06-19T16:33:14", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:33:13", "eligible_time": "2025-06-19T16:33:13", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 3, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T22:58:57", "end_time": "2025-06-19T22:58:57", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T22:55:46", "eligible_time": "2025-06-19T22:55:46", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 4, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "525600", "start_time": "2025-06-19T23:04:27", "end_time": "2025-06-19T23:04:27", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:01:38", "eligible_time": "2025-06-19T23:01:38", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 5, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "FAILED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:00", "end_time": "2025-06-19T23:09:00", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 1, "unit": "G" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:00", "eligible_time": "2025-06-19T23:09:00", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 6, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:36", "end_time": "2025-06-19T23:09:36", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:35", "eligible_time": "2025-06-19T23:09:36", "het_job_id": 6, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 7, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:10:03", "end_time": "2025-06-19T23:10:03", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:10:03", "eligible_time": "2025-06-19T23:10:03", "het_job_id": 7, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 8, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 9, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 10, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 11, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 600, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 13, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:57", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 12, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:58", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 2, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } }