WDL workflow definition specifics - AWS HealthOmics

AWS HealthOmics variant stores and annotation stores will no longer be open to new customers starting November 7th, 2025. If you would like to use variant stores or annotation stores, sign up prior to that date. Existing customers can continue to use the service as normal. For more information, see AWS HealthOmics variant store and annotation store availability change.

WDL workflow definition specifics

The following topics provide details about types and directives available for WDL workflow definitions in HealthOmics.

Namespace definition in input.json

HealthOmics supports fully qualified variables in input.json. For example, if you declare two input variables named number1 and number2 in workflow SumWorkflow:

workflow SumWorkflow { input { Int number1 Int number2 } }

You can use them as fully qualified variables in input.json:

{ "SumWorkflow.number1": 15, "SumWorkflow.number2": 27 }

Primitive types in WDL

The following table shows how inputs in WDL map to the matching primitive types. HealthOmics provides limited support for type coercion, so we recommend that you set explicit types.

Primitive types
WDL type JSON type Example WDL Example JSON key and value Notes
Boolean boolean Boolean b "b": true The value must be lower case and unquoted.
Int integer Int i "i": 7 Must be unquoted.
Float number Float f "f": 42.2 Must be unquoted.
String string String s "s": "characters" JSON strings that are a URI must be mapped to a WDL file to be imported.
File string File f "f": "s3://amzn-s3-demo-bucket1/path/to/file" Amazon S3 and HealthOmics storage URIs are imported as long as the IAM role provided for the workflow has read access to these objects. No other URI schemes are supported (such as file://, https://, and ftp://). The URI must specify an object. It cannot be a directory meaning it can not end with a /.
Directory string Directory d "d": "s3://bucket/path/" The Directory type isn't included in WDL 1.0 or 1.1, so you will need to add version development to the header of the WDL file. The URI must be a Amazon S3 URI and with a prefix that ends with a '/'. All contents of the directory will be recursively copied to the workflow as a single download. The Directory should only contain files related to the workflow.

Complex types in WDL

The following table show how inputs in WDL map to the matching complex JSON types. Complex types in WDL are data structures comprised of primitive types. Data structures such as lists will be converted to arrays.

Complex types
WDL type JSON type Example WDL Example JSON key and value Notes
Array array Array[Int] nums “nums": [1, 2, 3] The members of the array must follow the format of the WDL array type.
Pair object Pair[String, Int] str_to_i “str_to_i": {"left": "0", "right": 1} Each value of the pair must use the JSON format of its matching WDL type.
Map object Map[Int, String] int_to_string "int_to_string": { 2: "hello", 1: "goodbye" } Each entry in the map must use the JSON format of its matching WDL type.
Struct object
struct SampleBamAndIndex { String sample_name File bam File bam_index } SampleBamAndIndex b_and_i
"b_and_i": { "sample_name": "NA12878", "bam": "s3://amzn-s3-demo-bucket1/NA12878.bam", "bam_index": "s3://amzn-s3-demo-bucket1/NA12878.bam.bai" }
The names of the struct members must exactly match the names of the JSON object keys. Each value must use the JSON format of the matching WDL type.
Object N/A N/A N/A The WDL Object type is outdated and should be replaced by Struct in all cases.

Directives in WDL

HealthOmics supports the following directives in all WDL versions that HealthOmics supports.

acceleratorType and acceleratorCount

HealthOmics supports runtime attributes acceleratorType and acceleratorCount with all supported GPU instances. HealthOmics also supports aliases named gpuType and gpuCount, which have the same functionality as their accelerator counterparts. If the WDL definition contains both directives, HealthOmics uses the accelerator values.

The following example shows how to use these directives:

runtime { gpuCount: 2 gpuType: "nvidia-tesla-t4" }

returnCodes

The returnCodes attribute provides a mechanism to specify a return code, or a set of return codes, that indicates a successful execution of a task. The WDL engine honors the return codes that you specify in the runtime section of the WDL definition, and sets the tasks status accordingly.

runtime { returnCodes: 1 }

WDL workflow definition example

The following examples show private workflow definitions for converting from CRAM to BAM in WDL. The CRAM to BAM workflow defines two tasks and uses tools from the genomes-in-the-cloud container, which is shown in the example and is publicly available.

The following example shows how to include the Amazon ECR container as a parameter. This allows HealthOmics to verify the access permissions to your container before it starts the run the run.

{ ... "gotc_docker":"<account_id>.dkr.ecr.<region>.amazonaws.com/genomes-in-the-cloud:2.4.7-1603303710" }

The following example shows how to specify which files to use in your run, when the files are in an Amazon S3 bucket.

{ "input_cram": "s3://amzn-s3-demo-bucket1/inputs/NA12878.cram", "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict", "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta", "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai", "sample_name": "NA12878" }

If you want to specify files from a sequence store, indicate that as shown in the following example, using the URI for the sequence store.

{ "input_cram": "omics://429915189008.storage.us-west-2.amazonaws.com/111122223333/readSet/4500843795/source1", "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict", "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta", "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai", "sample_name": "NA12878" }

You can then define your workflow in WDL as shown in the following.

version 1.0 workflow CramToBamFlow { input { File ref_fasta File ref_fasta_index File ref_dict File input_cram String sample_name String gotc_docker = "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the- cloud:latest" } #Converts CRAM to SAM to BAM and makes BAI. call CramToBamTask{ input: ref_fasta = ref_fasta, ref_fasta_index = ref_fasta_index, ref_dict = ref_dict, input_cram = input_cram, sample_name = sample_name, docker_image = gotc_docker, } #Validates Bam. call ValidateSamFile{ input: input_bam = CramToBamTask.outputBam, docker_image = gotc_docker, } #Outputs Bam, Bai, and validation report to the FireCloud data model. output { File outputBam = CramToBamTask.outputBam File outputBai = CramToBamTask.outputBai File validation_report = ValidateSamFile.report } } #Task definitions. task CramToBamTask { input { # Command parameters File ref_fasta File ref_fasta_index File ref_dict File input_cram String sample_name # Runtime parameters String docker_image } #Calls samtools view to do the conversion. command { set -eo pipefail samtools view -h -T ~{ref_fasta} ~{input_cram} | samtools view -b -o ~{sample_name}.bam - samtools index -b ~{sample_name}.bam mv ~{sample_name}.bam.bai ~{sample_name}.bai } #Runtime attributes: runtime { docker: docker_image } #Outputs a BAM and BAI with the same sample name output { File outputBam = "~{sample_name}.bam" File outputBai = "~{sample_name}.bai" } } #Validates BAM output to ensure it wasn't corrupted during the file conversion. task ValidateSamFile { input { File input_bam Int machine_mem_size = 4 String docker_image } String output_name = basename(input_bam, ".bam") + ".validation_report" Int command_mem_size = machine_mem_size - 1 command { java -Xmx~{command_mem_size}G -jar /usr/gitc/picard.jar \ ValidateSamFile \ INPUT=~{input_bam} \ OUTPUT=~{output_name} \ MODE=SUMMARY \ IS_BISULFITE_SEQUENCED=false } runtime { docker: docker_image } #A text file is generated that lists errors or warnings that apply. output { File report = "~{output_name}" } }