AWS PCS 中的任務完成日誌 - AWS PCS

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

AWS PCS 中的任務完成日誌

任務完成日誌提供您平行 AWS 運算服務 (AWS PCS) 任務完成時的金鑰詳細資訊,無需額外費用。您可以使用其他 AWS 服務來存取和處理您的日誌資料,例如 Amazon CloudWatch Logs、Amazon Simple Storage Service (Amazon S3) 和 Amazon Data Firehose; AWS PCS 會記錄任務的中繼資料,例如下列項目。

  • 任務 ID 和名稱

  • 使用者和群組資訊

  • 任務狀態 (例如 COMPLETEDFAILEDCANCELLED)

  • 使用的分割區

  • 時間限制

  • 開始、結束、提交和合格時間

  • 節點清單和計數

  • 處理器計數

  • 工作目錄

  • 資源用量 (CPU、記憶體)

  • 結束代碼

  • 節點詳細資訊 (名稱、執行個體 IDs、執行個體類型)

先決條件

管理 AWS PCS 叢集的 IAM 主體必須允許 pcs:AllowVendedLogDeliveryForResource動作。

下列範例 IAM 政策會授予所需的許可。

{ "Version": "2012-10-17", "Statement": [ { "Sid": "PcsAllowVendedLogsDelivery", "Effect": "Allow", "Action": ["pcs:AllowVendedLogDeliveryForResource"], "Resource": [ "arn:aws:pcs:::cluster/*" ] } ] }

設定任務完成日誌

您可以使用 AWS Management Console 或 為您的 AWS PCS 叢集設定任務完成日誌 AWS CLI。

AWS Management Console
使用主控台設定任務完成日誌
  1. 開啟 AWS PCS 主控台

  2. 在導覽窗格中,選擇叢集

  3. 選擇您要新增任務完成日誌的叢集。

  4. 在叢集詳細資訊頁面上,選擇日誌索引標籤。

  5. 任務完成日誌下,選擇新增以在 CloudWatch Logs、Amazon S3 和 Firehose 之間新增最多 3 個日誌交付目的地。

  6. 選擇更新日誌交付

AWS CLI
使用 設定任務完成日誌 AWS CLI
  1. 建立日誌交付目的地:

    aws logs put-delivery-destination --region region \ --name pcs-logs-destination \ --delivery-destination-configuration \ destinationResourceArn=resource-arn

    取代:

    • region — 您要建立目的地 AWS 區域 的 ,例如 us-east-1

    • pcs-logs-destination — 目的地的名稱

    • resource-arn — CloudWatch Logs 日誌群組、S3 儲存貯體或 Firehose 交付串流的 Amazon Resource Name (ARN)。

    如需詳細資訊,請參閱《Amazon CloudWatch Logs API 參考》中的 PutDeliveryDestination

  2. 將 PCS 叢集設定為日誌交付來源:

    aws logs put-delivery-source --region region \ --name cluster-logs-source-name \ --resource-arn cluster-arn \ --log-type PCS_JOBCOMP_LOGS

    取代:

    • region — 叢集 AWS 區域 的 ,例如 us-east-1

    • cluster-logs-source-name — 來源的名稱

    • cluster-arn — AWS PCS 叢集的 ARN

    如需詳細資訊,請參閱《Amazon CloudWatch Logs API 參考》中的 PutDeliverySource

  3. 將交付來源連接至交付目的地:

    aws logs create-delivery --region region \ --delivery-source-name cluster-logs-source \ --delivery-destination-arn destination-arn

    取代:

    • region — AWS 區域,例如 us-east-1

    • cluster-logs-source — 交付來源的名稱

    • destination-arn — 交付目的地的 ARN

    如需詳細資訊,請參閱《Amazon CloudWatch Logs API 參考》中的 CreateDelivery

如何尋找任務完成日誌

您可以在 CloudWatch Logs 和 Amazon S3. AWS PCS 中設定日誌目的地,並使用下列結構化路徑名稱和檔案名稱。

CloudWatch Logs

AWS PCS 使用 CloudWatch Logs 串流的下列名稱格式:

AWSLogs/PCS/cluster-id/jobcomp.log

例如:AWSLogs/PCS/pcs_abc123de45/jobcomp.log

Amazon S3

AWS PCS 會針對 S3 路徑使用下列名稱格式:

AWSLogs/account-id/PCS/region/cluster-id/jobcomp/year/month/day/hour/

例如:AWSLogs/111122223333/PCS/us-east-1/pcs_abc123de45/jobcomp/2025/06/19/11/

AWS PCS 對日誌檔案使用以下名稱格式:

PCS_jobcomp_year-month-day-hour_cluster-id_random-id.log.gz

例如:PCS_jobcomp_2025-06-19-11_pcs_abc123de45_04be080b.log.gz

任務完成日誌欄位

AWS PCS 會將任務完成日誌資料寫入為 JSON 物件。JSON 容器jobcomp會保留任務詳細資訊。下表說明jobcomp容器內的欄位。某些欄位僅在特定情況下存在,例如陣列任務或異質任務。

任務完成日誌欄位
名稱 範例值 必要 備註
job_id 11 永遠具有 值
user "root" 永遠具有 值
user_id 0 永遠具有 值
group "root" 永遠具有 值
group_id 0 永遠具有 值
name "wrap" 永遠具有 值
job_state "COMPLETED" 永遠具有 值
partition "Hydra-MpiQueue-abcdef01-7" 永遠具有 值
time_limit "UNLIMITED" 永遠存在,但可能是 "UNLIMITED"
start_time "2025-06-19T10:58:57" 永遠存在,但可能是 "Unknown"
end_time "2025-06-19T10:58:57" 永遠存在,但可能是 "Unknown"
node_list "Hydra-MpiNG-abcdef01-2345-1" 永遠具有 值
node_cnt 1 永遠具有 值
proc_cnt 1 永遠具有 值
work_dir "/root" 永遠存在,但可能是 "Unknown"
reservation_name "weekly_maintenance" 永遠存在,但可能是空字串 ""
tres.cpu 1 永遠具有 值
tres.mem.val 600 永遠具有 值
tres.mem.unit "M" 可以是 "M""bb"
tres.node 1 永遠具有 值
tres.billing 1 永遠具有 值
account "finance" 永遠存在,但可能是空字串 ""
qos "normal" 永遠存在,但可能是空字串 ""
wc_key "project_1" 永遠存在,但可能是空字串 ""
cluster "unknown" 永遠存在,但可能是 "unknown"
submit_time "2025-06-19T10:55:46" 永遠存在,但可能是 "Unknown"
eligible_time "2025-06-19T10:55:46" 永遠存在,但可能是 "Unknown"
array_job_id 12 編號 只有在任務是陣列任務時才會出現
array_task_id 1 編號 只有在任務是陣列任務時才會出現
het_job_id 10 編號 只有在任務是異質任務時才會出現
het_job_offset 0 編號 只有在任務是異質任務時才會出現
derived_exit_code_status 0 永遠具有 值
derived_exit_code_signal 0 永遠具有 值
exit_code_status 0 永遠具有 值
exit_code_signal 0 永遠具有 值
node_details[0].name "Hydra-MpiNG-abcdef01-2345-1" 編號 永遠存在,但node_details可能是 "[]"
node_details[0].instance_id "i-0abcdef01234567a" 編號 永遠存在,但node_details可能是 "[]"
node_details[0].instance_type "t4g.micro" 編號 永遠存在,但node_details可能是 "[]"

任務完成日誌範例

下列範例顯示各種任務類型和狀態的任務完成日誌:

{ "jobcomp": { "job_id": 1, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:32:57", "end_time": "2025-06-19T16:33:03", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:29:40", "eligible_time": "2025-06-19T16:29:41", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 2, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:33:13", "end_time": "2025-06-19T16:33:14", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:33:13", "eligible_time": "2025-06-19T16:33:13", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 3, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T22:58:57", "end_time": "2025-06-19T22:58:57", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T22:55:46", "eligible_time": "2025-06-19T22:55:46", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 4, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "525600", "start_time": "2025-06-19T23:04:27", "end_time": "2025-06-19T23:04:27", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:01:38", "eligible_time": "2025-06-19T23:01:38", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 5, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "FAILED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:00", "end_time": "2025-06-19T23:09:00", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 1, "unit": "G" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:00", "eligible_time": "2025-06-19T23:09:00", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 6, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:36", "end_time": "2025-06-19T23:09:36", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:35", "eligible_time": "2025-06-19T23:09:36", "het_job_id": 6, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 7, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:10:03", "end_time": "2025-06-19T23:10:03", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:10:03", "eligible_time": "2025-06-19T23:10:03", "het_job_id": 7, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 8, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 9, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 10, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 11, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 600, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 13, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:57", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 12, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:58", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 2, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } }