Log penyelesaian pekerjaan di AWS PCS - AWS PCS

Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

Log penyelesaian pekerjaan di AWS PCS

Log penyelesaian pekerjaan memberi Anda detail penting tentang pekerjaan AWS Parallel Computing Service (AWS PCS) Anda saat selesai, tanpa biaya tambahan. Anda dapat menggunakan AWS layanan lain untuk mengakses dan memproses data log Anda, seperti Amazon CloudWatch Logs, Amazon Simple Storage Service (Amazon S3), dan Amazon Data Firehose AWS ; PCS merekam metadata tentang pekerjaan Anda, seperti berikut ini.

  • ID dan nama Job

  • Informasi pengguna dan grup

  • Status pekerjaan (sepertiCOMPLETED,FAILED,CANCELLED)

  • Partisi yang digunakan

  • Batas waktu

  • Mulai, akhiri, kirim, dan waktu yang memenuhi syarat

  • Daftar dan hitungan simpul

  • Jumlah prosesor

  • Direktori kerja

  • Penggunaan sumber daya (CPU, memori)

  • Kode keluar

  • Detail simpul (nama, contoh IDs, jenis contoh)

Prasyarat

Prinsipal IAM yang mengelola cluster AWS PCS harus memungkinkan pcs:AllowVendedLogDeliveryForResource tindakan.

Contoh berikut kebijakan IAM memberikan izin yang diperlukan.

JSON
{ "Version": "2012-10-17", "Statement": [ { "Sid": "PcsAllowVendedLogsDelivery", "Effect": "Allow", "Action": ["pcs:AllowVendedLogDeliveryForResource"], "Resource": [ "arn:aws:pcs:::cluster/*" ] } ] }

Siapkan log penyelesaian pekerjaan

Anda dapat mengatur log penyelesaian pekerjaan untuk klaster AWS PCS Anda dengan AWS Management Console atau AWS CLI.

AWS Management Console
Untuk menyiapkan log penyelesaian pekerjaan dengan konsol
  1. Buka konsol AWS PCS.

  2. Pada panel navigasi, silakan pilih Klaster.

  3. Pilih cluster tempat Anda ingin menambahkan log penyelesaian pekerjaan.

  4. Pada halaman detail cluster, pilih tab Log.

  5. Di bawah Log Penyelesaian Pekerjaan, pilih Tambah untuk menambahkan hingga 3 tujuan pengiriman log dari antara CloudWatch Log, Amazon S3, dan Firehose.

  6. Pilih Perbarui pengiriman log.

AWS CLI
Untuk mengatur log penyelesaian pekerjaan dengan AWS CLI
  1. Buat tujuan pengiriman log:

    aws logs put-delivery-destination --region region \ --name pcs-logs-destination \ --delivery-destination-configuration \ destinationResourceArn=resource-arn

    Ganti:

    • region— Wilayah AWS Tempat Anda ingin membuat tujuan, seperti us-east-1

    • pcs-logs-destination— Nama untuk tujuan

    • resource-arn— Nama Sumber Daya Amazon (ARN) dari grup CloudWatch log Log, bucket S3, atau aliran pengiriman Firehose.

    Untuk informasi selengkapnya, lihat PutDeliveryDestinationdi Referensi API Amazon CloudWatch Logs.

  2. Tetapkan cluster PCS sebagai sumber pengiriman log:

    aws logs put-delivery-source --region region \ --name cluster-logs-source-name \ --resource-arn cluster-arn \ --log-type PCS_JOBCOMP_LOGS

    Ganti:

    • region— Wilayah AWS Cluster Anda, seperti us-east-1

    • cluster-logs-source-name— Nama untuk sumbernya

    • cluster-arn— ARN dari cluster PCS Anda AWS

    Untuk informasi selengkapnya, lihat PutDeliverySourcedi Referensi API Amazon CloudWatch Logs.

  3. Hubungkan sumber pengiriman ke tujuan pengiriman:

    aws logs create-delivery --region region \ --delivery-source-name cluster-logs-source \ --delivery-destination-arn destination-arn

    Ganti:

    • region— yang Wilayah AWS, seperti us-east-1

    • cluster-logs-source— Nama sumber pengiriman Anda

    • destination-arn— ARN tujuan pengiriman Anda

    Untuk informasi selengkapnya, lihat CreateDeliverydi Referensi API Amazon CloudWatch Logs.

Cara menemukan log penyelesaian pekerjaan

Anda dapat mengonfigurasi tujuan log di CloudWatch Log dan Amazon S3. AWS PCS menggunakan nama jalur terstruktur berikut dan nama file.

CloudWatch Log

AWS PCS menggunakan format nama berikut untuk aliran CloudWatch Log:

AWSLogs/PCS/cluster-id/jobcomp.log

Misalnya: AWSLogs/PCS/pcs_abc123de45/jobcomp.log

Amazon S3

AWS PCS menggunakan format nama berikut untuk jalur S3:

AWSLogs/account-id/PCS/region/cluster-id/jobcomp/year/month/day/hour/

Misalnya: AWSLogs/111122223333/PCS/us-east-1/pcs_abc123de45/jobcomp/2025/06/19/11/

AWS PCS menggunakan format nama berikut untuk file log:

PCS_jobcomp_year-month-day-hour_cluster-id_random-id.log.gz

Misalnya: PCS_jobcomp_2025-06-19-11_pcs_abc123de45_04be080b.log.gz

Bidang log penyelesaian pekerjaan

AWS PCS menulis data log penyelesaian pekerjaan sebagai objek JSON. Wadah JSON jobcomp menyimpan detail pekerjaan. Tabel berikut menjelaskan bidang di dalam jobcomp wadah. Beberapa bidang hanya ada dalam keadaan tertentu, seperti untuk pekerjaan array atau pekerjaan heterogen.

Bidang log penyelesaian pekerjaan
Nama Nilai contoh Wajib Catatan
job_id 11 Ya Selalu hadir dengan nilai
user "root" Ya Selalu hadir dengan nilai
user_id 0 Ya Selalu hadir dengan nilai
group "root" Ya Selalu hadir dengan nilai
group_id 0 Ya Selalu hadir dengan nilai
name "wrap" Ya Selalu hadir dengan nilai
job_state "COMPLETED" Ya Selalu hadir dengan nilai
partition "Hydra-MpiQueue-abcdef01-7" Ya Selalu hadir dengan nilai
time_limit "UNLIMITED" Ya Selalu hadir, tapi mungkin "UNLIMITED"
start_time "2025-06-19T10:58:57" Ya Selalu hadir, tapi mungkin "Unknown"
end_time "2025-06-19T10:58:57" Ya Selalu hadir, tapi mungkin "Unknown"
node_list "Hydra-MpiNG-abcdef01-2345-1" Ya Selalu hadir dengan nilai
node_cnt 1 Ya Selalu hadir dengan nilai
proc_cnt 1 Ya Selalu hadir dengan nilai
work_dir "/root" Ya Selalu hadir, tapi mungkin "Unknown"
reservation_name "weekly_maintenance" Ya Selalu hadir, tetapi mungkin string kosong ""
tres.cpu 1 Ya Selalu hadir dengan nilai
tres.mem.val 600 Ya Selalu hadir dengan nilai
tres.mem.unit "M" Ya Bisa "M" atau "bb"
tres.node 1 Ya Selalu hadir dengan nilai
tres.billing 1 Ya Selalu hadir dengan nilai
account "finance" Ya Selalu hadir, tetapi mungkin string kosong ""
qos "normal" Ya Selalu hadir, tetapi mungkin string kosong ""
wc_key "project_1" Ya Selalu hadir, tetapi mungkin string kosong ""
cluster "unknown" Ya Selalu hadir, tapi mungkin "unknown"
submit_time "2025-06-19T10:55:46" Ya Selalu hadir, tapi mungkin "Unknown"
eligible_time "2025-06-19T10:55:46" Ya Selalu hadir, tapi mungkin "Unknown"
array_job_id 12 tidak Hanya hadir jika pekerjaan itu adalah pekerjaan array
array_task_id 1 tidak Hanya hadir jika pekerjaan itu adalah pekerjaan array
het_job_id 10 tidak Hanya hadir jika pekerjaan itu adalah pekerjaan yang heterogen
het_job_offset 0 tidak Hanya hadir jika pekerjaan itu adalah pekerjaan yang heterogen
derived_exit_code_status 0 Ya Selalu hadir dengan nilai
derived_exit_code_signal 0 Ya Selalu hadir dengan nilai
exit_code_status 0 Ya Selalu hadir dengan nilai
exit_code_signal 0 Ya Selalu hadir dengan nilai
node_details[0].name "Hydra-MpiNG-abcdef01-2345-1" tidak Selalu hadir, tapi node_details mungkin "[]"
node_details[0].instance_id "i-0abcdef01234567a" tidak Selalu hadir, tapi node_details mungkin "[]"
node_details[0].instance_type "t4g.micro" tidak Selalu hadir, tapi node_details mungkin "[]"

Contoh log penyelesaian pekerjaan

Contoh berikut menunjukkan log penyelesaian pekerjaan untuk berbagai jenis dan status pekerjaan:

{ "jobcomp": { "job_id": 1, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:32:57", "end_time": "2025-06-19T16:33:03", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:29:40", "eligible_time": "2025-06-19T16:29:41", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 2, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T16:33:13", "end_time": "2025-06-19T16:33:14", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/usr/bin", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T16:33:13", "eligible_time": "2025-06-19T16:33:13", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc123def45678", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def456abc78901", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 3, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T22:58:57", "end_time": "2025-06-19T22:58:57", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T22:55:46", "eligible_time": "2025-06-19T22:55:46", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 4, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "525600", "start_time": "2025-06-19T23:04:27", "end_time": "2025-06-19T23:04:27", "node_list": "Hydra-MpiNG-abcdef01-2345-[1-2]", "node_cnt": 2, "proc_cnt": 2, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 2, "mem": { "val": 1944, "unit": "M" }, "node": 2, "billing": 2 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:01:38", "eligible_time": "2025-06-19T23:01:38", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" }, { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 5, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "FAILED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:00", "end_time": "2025-06-19T23:09:00", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 1, "unit": "G" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:00", "eligible_time": "2025-06-19T23:09:00", "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 6, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:09:36", "end_time": "2025-06-19T23:09:36", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:09:35", "eligible_time": "2025-06-19T23:09:36", "het_job_id": 6, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 7, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "CANCELLED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:10:03", "end_time": "2025-06-19T23:10:03", "node_list": "(null)", "node_cnt": 0, "proc_cnt": 0, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:10:03", "eligible_time": "2025-06-19T23:10:03", "het_job_id": 7, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 1, "node_details": [] } } { "jobcomp": { "job_id": 8, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 9, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:11:24", "end_time": "2025-06-19T23:11:24", "node_list": "Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:11:23", "eligible_time": "2025-06-19T23:11:23", "het_job_id": 8, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 10, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 400, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 0, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc234def56789", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 11, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:12:24", "end_time": "2025-06-19T23:12:24", "node_list":"Hydra-MpiNG-abcdef01-2345-2", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 600, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:12:14", "eligible_time": "2025-06-19T23:12:14", "het_job_id": 10, "het_job_offset": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-2", "instance_id": "i-0def345abc67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 13, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:57", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 1, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } } { "jobcomp": { "job_id": 12, "user": "root", "user_id": 0, "group": "root", "group_id": 0, "name": "wrap", "job_state": "COMPLETED", "partition": "Hydra-MpiQueue-abcdef01-7", "time_limit": "UNLIMITED", "start_time": "2025-06-19T23:47:58", "end_time": "2025-06-19T23:47:58", "node_list":"Hydra-MpiNG-abcdef01-2345-1", "node_cnt": 1, "proc_cnt": 1, "work_dir": "/root", "reservation_name": "", "tres": { "cpu": 1, "mem": { "val": 972, "unit": "M" }, "node": 1, "billing": 1 }, "account": "", "qos": "", "wc_key": "", "cluster": "unknown", "submit_time": "2025-06-19T23:43:56", "eligible_time": "2025-06-19T23:43:56" , "array_job_id": 12, "array_task_id": 2, "derived_exit_code_status": 0, "derived_exit_code_signal": 0, "exit_code_status": 0, "exit_code_signal": 0, "node_details": [ { "name": "Hydra-MpiNG-abcdef01-2345-1", "instance_id": "i-0abc345def67890", "instance_type": "t4g.micro" } ] } }