Example of an array job workflow
A common workflow for AWS Batch customers is to run a prerequisite setup job, run a series of commands against a large number of input tasks, and then conclude with a job that aggregates results and writes summary data to Amazon S3, DynamoDB, Amazon Redshift, or Aurora.
For example:
- 
   JobA: A standard, non-array job that performs a quick listing and metadata validation of objects in an Amazon S3 bucket,BucketA. The SubmitJob JSON syntax is as follows.{ "jobName": "JobA", "jobQueue": "ProdQueue", "jobDefinition": "JobA-list-and-validate:1" }
- 
   JobB: An array job with 10,000 copies that is dependent uponJobAthat runs CPU-intensive commands against each object inBucketAand uploads results toBucketB. The SubmitJob JSON syntax is as follows.{ "jobName": "JobB", "jobQueue": "ProdQueue", "jobDefinition": "JobB-CPU-Intensive-Processing:1", "containerOverrides": { "resourceRequirements": [ { "type": "MEMORY", "value": "4096" }, { "type": "VCPU", "value": "32" } ] } "arrayProperties": { "size": 10000 }, "dependsOn": [ { "jobId": "JobA_job_ID" } ] }
- 
   JobC: Another 10,000 copy array job that's dependent uponJobBwith anN_TO_Ndependency model, that runs memory-intensive commands against each item inBucketB, writes metadata to DynamoDB, and uploads the resulting output toBucketC. The SubmitJob JSON syntax is as follows.{ "jobName": "JobC", "jobQueue": "ProdQueue", "jobDefinition": "JobC-Memory-Intensive-Processing:1", "containerOverrides": { "resourceRequirements": [ { "type": "MEMORY", "value": "32768" }, { "type": "VCPU", "value": "1" } ] } "arrayProperties": { "size": 10000 }, "dependsOn": [ { "jobId": "JobB_job_ID", "type": "N_TO_N" } ] }
- 
   JobD: An array job that performs 10 validation steps that each need to query DynamoDB and might interact with any of the above Amazon S3 buckets. Each of the steps inJobDrun the same command. However, the behavior is different based on the value of theAWS_BATCH_JOB_ARRAY_INDEXenvironment variable within the job's container. These validation steps run sequentially (for example,JobD:0and thenJobD:1). The SubmitJob JSON syntax is as follows.{ "jobName": "JobD", "jobQueue": "ProdQueue", "jobDefinition": "JobD-Sequential-Validation:1", "containerOverrides": { "resourceRequirements": [ { "type": "MEMORY", "value": "32768" }, { "type": "VCPU", "value": "1" } ] } "arrayProperties": { "size": 10 }, "dependsOn": [ { "jobId": "JobC_job_ID" }, { "type": "SEQUENTIAL" }, ] }
- 
   JobE: A final, non-array job that performs some simple cleanup operations and sends an Amazon SNS notification with a message that the pipeline has completed and a link to the output URL. The SubmitJob JSON syntax is as follows.{ "jobName": "JobE", "jobQueue": "ProdQueue", "jobDefinition": "JobE-Cleanup-and-Notification:1", "parameters": { "SourceBucket": "s3://amzn-s3-demo-source-bucket", "Recipient": "pipeline-notifications@mycompany.com" }, "dependsOn": [ { "jobId": "JobD_job_ID" } ] }