AWS HealthOmics variant stores and annotation stores will no longer be open to new customers starting November 7th, 2025. If you would like to use variant stores or annotation stores, sign up prior to that date. Existing customers can continue to use the service as normal. For more information, see AWS HealthOmics variant store and annotation store availability change.
WDL workflow definition specifics
The following topics provide details about types and directives available for WDL workflow definitions in HealthOmics.
Topics
Namespace definition in input.json
HealthOmics supports fully qualified variables in input.json. For example, if you declare two input variables named number1 and number2 in workflow SumWorkflow:
workflow SumWorkflow { input { Int number1 Int number2 } }
You can use them as fully qualified variables in input.json:
{ "SumWorkflow.number1": 15, "SumWorkflow.number2": 27 }
Primitive types in WDL
The following table shows how inputs in WDL map to the matching primitive types. HealthOmics provides limited support for type coercion, so we recommend that you set explicit types.
WDL type | JSON type | Example WDL | Example JSON key and value | Notes |
---|---|---|---|---|
Boolean |
boolean |
Boolean b |
"b": true |
The value must be lower case and unquoted. |
Int |
integer |
Int i |
"i": 7 |
Must be unquoted. |
Float |
number |
Float f |
"f": 42.2 |
Must be unquoted. |
String |
string |
String s |
"s": "characters" |
JSON strings that are a URI must be mapped to a WDL file to be imported. |
File |
string |
File f |
"f": "s3://amzn-s3-demo-bucket1/path/to/file" |
Amazon S3 and HealthOmics storage URIs are imported as long as the IAM
role provided for the workflow has read access to these objects. No
other URI schemes are supported (such as file:// ,
https:// , and ftp:// ). The URI must
specify an object. It cannot be a directory meaning it can not end with
a / . |
Directory |
string |
Directory d |
"d": "s3://bucket/path/" |
The Directory type isn't included in WDL 1.0 or 1.1, so
you will need to add version development to the header of
the WDL file. The URI must be a Amazon S3 URI and with a prefix that ends
with a '/'. All contents of the directory will be recursively copied to
the workflow as a single download. The Directory should
only contain files related to the workflow. |
Complex types in WDL
The following table show how inputs in WDL map to the matching complex JSON types. Complex types in WDL are data structures comprised of primitive types. Data structures such as lists will be converted to arrays.
WDL type | JSON type | Example WDL | Example JSON key and value | Notes |
---|---|---|---|---|
Array |
array |
Array[Int] nums |
“nums": [1, 2, 3] |
The members of the array must follow the format of the WDL array type. |
Pair |
object |
Pair[String, Int] str_to_i |
“str_to_i": {"left": "0", "right": 1} |
Each value of the pair must use the JSON format of its matching WDL type. |
Map |
object |
Map[Int, String] int_to_string |
"int_to_string": { 2: "hello", 1: "goodbye" } |
Each entry in the map must use the JSON format of its matching WDL type. |
Struct |
object |
|
|
The names of the struct members must exactly match the names of the JSON object keys. Each value must use the JSON format of the matching WDL type. |
Object |
N/A | N/A | N/A | The WDL Object type is outdated and should be replaced
by Struct in all cases. |
Directives in WDL
HealthOmics supports the following directives in all WDL versions that HealthOmics supports.
acceleratorType and acceleratorCount
HealthOmics supports runtime attributes acceleratorType and acceleratorCount with all supported GPU instances. HealthOmics also supports aliases named gpuType and gpuCount, which have the same functionality as their accelerator counterparts. If the WDL definition contains both directives, HealthOmics uses the accelerator values.
The following example shows how to use these directives:
runtime { gpuCount: 2 gpuType: "nvidia-tesla-t4" }
returnCodes
The returnCodes attribute provides a mechanism to specify a return code, or a set of return codes, that indicates a successful execution of a task. The WDL engine honors the return codes that you specify in the runtime section of the WDL definition, and sets the tasks status accordingly.
runtime { returnCodes: 1 }
WDL workflow definition example
The following examples show private workflow definitions for converting from
CRAM
to BAM
in WDL. The CRAM
to
BAM
workflow defines two tasks and uses tools from the
genomes-in-the-cloud
container, which is shown in the example and is
publicly available.
The following example shows how to include the Amazon ECR container as a parameter. This allows HealthOmics to verify the access permissions to your container before it starts the run the run.
{ ... "gotc_docker":"<account_id>.dkr.ecr.<region>.amazonaws.com/genomes-in-the-cloud:2.4.7-1603303710" }
The following example shows how to specify which files to use in your run, when the files are in an Amazon S3 bucket.
{ "input_cram": "s3://amzn-s3-demo-bucket1/inputs/NA12878.cram", "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict", "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta", "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai", "sample_name": "NA12878" }
If you want to specify files from a sequence store, indicate that as shown in the following example, using the URI for the sequence store.
{ "input_cram": "omics://429915189008.storage.us-west-2.amazonaws.com/111122223333/readSet/4500843795/source1", "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict", "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta", "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai", "sample_name": "NA12878" }
You can then define your workflow in WDL as shown in the following.
version 1.0 workflow CramToBamFlow { input { File ref_fasta File ref_fasta_index File ref_dict File input_cram String sample_name String gotc_docker = "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the- cloud:latest" } #Converts CRAM to SAM to BAM and makes BAI. call CramToBamTask{ input: ref_fasta = ref_fasta, ref_fasta_index = ref_fasta_index, ref_dict = ref_dict, input_cram = input_cram, sample_name = sample_name, docker_image = gotc_docker, } #Validates Bam. call ValidateSamFile{ input: input_bam = CramToBamTask.outputBam, docker_image = gotc_docker, } #Outputs Bam, Bai, and validation report to the FireCloud data model. output { File outputBam = CramToBamTask.outputBam File outputBai = CramToBamTask.outputBai File validation_report = ValidateSamFile.report } } #Task definitions. task CramToBamTask { input { # Command parameters File ref_fasta File ref_fasta_index File ref_dict File input_cram String sample_name # Runtime parameters String docker_image } #Calls samtools view to do the conversion. command { set -eo pipefail samtools view -h -T ~{ref_fasta} ~{input_cram} | samtools view -b -o ~{sample_name}.bam - samtools index -b ~{sample_name}.bam mv ~{sample_name}.bam.bai ~{sample_name}.bai } #Runtime attributes: runtime { docker: docker_image } #Outputs a BAM and BAI with the same sample name output { File outputBam = "~{sample_name}.bam" File outputBai = "~{sample_name}.bai" } } #Validates BAM output to ensure it wasn't corrupted during the file conversion. task ValidateSamFile { input { File input_bam Int machine_mem_size = 4 String docker_image } String output_name = basename(input_bam, ".bam") + ".validation_report" Int command_mem_size = machine_mem_size - 1 command { java -Xmx~{command_mem_size}G -jar /usr/gitc/picard.jar \ ValidateSamFile \ INPUT=~{input_bam} \ OUTPUT=~{output_name} \ MODE=SUMMARY \ IS_BISULFITE_SEQUENCED=false } runtime { docker: docker_image } #A text file is generated that lists errors or warnings that apply. output { File report = "~{output_name}" } }