Implicit type conversion in WDL lenient Namespace definition in input.json Primitive types in WDL Complex types in WDL Directives in WDL WDL workflow definition example

WDL workflow definition specifics

The following topics provide details about types and directives available for WDL workflow definitions in HealthOmics.

Implicit type conversion in WDL lenient

HealthOmics supports implicit type conversion in the input.json file and the workflow definition. To use implicit type casting, specify the workflow engine as WDL lenient when you create the workflow. WDL lenient is designed to handle workflows migrated from Cromwell. It supports customer Cromwell directives and some non-conformant logic.

WDL lenient supports type conversion for the following items in the list of WDL’s limited exceptions:

Float to Int, where the coercion results in no loss of precision (such as 1.0 maps to 1).
String to Int/Float, where the coercion results in no loss of precision.
Map[W, X] to Array[Pair[Y, Z]], in the case where W is coercible to Y and X is coercible to Z.
Array[Pair[W, X]] to Map[Y, Z], in the case where W is coercible to Y and X is coercible to Z (such as 1.0 maps to 1).

To use implicit type casting, specify the workflow engine as WDL_LENIENT when you create the workflow or workflow version.

In the console, the workflow engine parameter is named Language. In the API, the workflow engine parameter is named engine. For more information, see Create a private workflow or Create a workflow version.

Namespace definition in input.json

HealthOmics supports fully qualified variables in input.json. For example, if you declare two input variables named number1 and number2 in workflow SumWorkflow:


workflow SumWorkflow {
  input {
    Int number1
    Int number2
  }
}

You can use them as fully qualified variables in input.json:


{
    "SumWorkflow.number1": 15,
    "SumWorkflow.number2": 27
}

Primitive types in WDL

The following table shows how inputs in WDL map to the matching primitive types. HealthOmics provides limited support for type coercion, so we recommend that you set explicit types.

Primitive types
WDL type	JSON type	Example WDL	Example JSON key and value	Notes
`Boolean`	`boolean`	`Boolean b`	`"b": true`	The value must be lower case and unquoted.
`Int`	`integer`	`Int i`	`"i": 7`	Must be unquoted.
`Float`	`number`	`Float f`	`"f": 42.2`	Must be unquoted.
`String`	`string`	`String s`	`"s": "characters"`	JSON strings that are a URI must be mapped to a WDL file to be imported.
`File`	`string`	`File f`	`"f": "s3://amzn-s3-demo-bucket1/path/to/file"`	Amazon S3 and HealthOmics storage URIs are imported as long as the IAM role provided for the workflow has read access to these objects. No other URI schemes are supported (such as `file://`, `https://`, and `ftp://`). The URI must specify an object. It cannot be a directory meaning it can not end with a `/`.
`Directory`	`string`	`Directory d`	`"d": "s3://bucket/path/"`	The `Directory` type isn't included in WDL 1.0 or 1.1, so you will need to add `version development` to the header of the WDL file. The URI must be a Amazon S3 URI and with a prefix that ends with a '/'. All contents of the directory will be recursively copied to the workflow as a single download. The `Directory` should only contain files related to the workflow.

Complex types in WDL

The following table show how inputs in WDL map to the matching complex JSON types. Complex types in WDL are data structures comprised of primitive types. Data structures such as lists will be converted to arrays.

Complex types
WDL type	JSON type	Example WDL	Example JSON key and value	Notes
`Array`	`array`	`Array[Int] nums`	`“nums": [1, 2, 3]`	The members of the array must follow the format of the WDL array type.
`Pair`	`object`	`Pair[String, Int] str_to_i`	`“str_to_i": {"left": "0", "right": 1}`	Each value of the pair must use the JSON format of its matching WDL type.
`Map`	`object`	`Map[Int, String] int_to_string`	`"int_to_string": { 2: "hello", 1: "goodbye" }`	Each entry in the map must use the JSON format of its matching WDL type.
`Struct`	`object`	`struct SampleBamAndIndex { String sample_name File bam File bam_index } SampleBamAndIndex b_and_i`	`"b_and_i": { "sample_name": "NA12878", "bam": "s3://amzn-s3-demo-bucket1/NA12878.bam", "bam_index": "s3://amzn-s3-demo-bucket1/NA12878.bam.bai" }`	The names of the struct members must exactly match the names of the JSON object keys. Each value must use the JSON format of the matching WDL type.
`Object`	N/A	N/A	N/A	The WDL `Object` type is outdated and should be replaced by `Struct` in all cases.

Directives in WDL

HealthOmics supports the following directives in all WDL versions that HealthOmics supports.

Configure GPU resources

HealthOmics supports runtime attributes acceleratorType and acceleratorCount with all supported GPU instances. HealthOmics also supports aliases named gpuType and gpuCount, which have the same functionality as their accelerator counterparts. If the WDL definition contains both directives, HealthOmics uses the accelerator values.

The following example shows how to use these directives:


runtime {
    gpuCount: 2
    gpuType: "nvidia-tesla-t4"
}

Configure task retry for service errors

HealthOmics supports up to two retries for a task that failed because of service errors (5XX HTTP status codes). You can configure the maximum number of retries (1 or 2) and you can opt out of retries for service errors. By default, HealthOmics attempts a maximum of two retries.

The following example sets preemptible to opt out of retries for service errors:


{
  preemptible: 0 
}

For more information about task retries in HealthOmics, see Task Retries.

Configure task retry for out of memory

HealthOmics supports retries for a task that failed because it ran out of memory (container exit code 137, 4XX HTTP status code). HealthOmics doubles the amount of memory for each retry attempt.

By default, HealthOmics doesn't retry for this type of failure. Use the maxRetries directive to specify the maximum number of retries.

The following example sets maxRetries to 3, so that HealthOmics attempts a maximum of four attempts to complete the task (the initial attempt plus three retries):


runtime {
    maxRetries: 3
}

Note

Task retry for out of memory requires GNU findutils 4.2.3+. The default HealthOmics image container includes this package. If you specify a custom image in your WDL definition, make sure that the image includes GNU findutils 4.2.3+.

Configure return codes

The returnCodes attribute provides a mechanism to specify a return code, or a set of return codes, that indicates a successful execution of a task. The WDL engine honors the return codes that you specify in the runtime section of the WDL definition, and sets the tasks status accordingly.


runtime {
    returnCodes: 1
}

HealthOmics also supports an alias named continueOnReturnCode, which has the same capabilities as returnCodes. If you specify both attributes, HealthOmics uses the returnCodes value.

WDL workflow definition example

The following examples show private workflow definitions for converting from CRAM to BAM in WDL. The CRAM to BAM workflow defines two tasks and uses tools from the genomes-in-the-cloud container, which is shown in the example and is publicly available.

The following example shows how to include the Amazon ECR container as a parameter. This allows HealthOmics to verify the access permissions to your container before it starts the run the run.


{
   ...
   "gotc_docker":"<account_id>.dkr.ecr.<region>.amazonaws.com/genomes-in-the-cloud:2.4.7-1603303710"
}

The following example shows how to specify which files to use in your run, when the files are in an Amazon S3 bucket.


{
    "input_cram": "s3://amzn-s3-demo-bucket1/inputs/NA12878.cram",
    "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
    "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
    "sample_name": "NA12878"
}

If you want to specify files from a sequence store, indicate that as shown in the following example, using the URI for the sequence store.


{
    "input_cram": "omics://429915189008.storage.us-west-2.amazonaws.com/111122223333/readSet/4500843795/source1",
    "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
    "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
    "sample_name": "NA12878"
}

You can then define your workflow in WDL as shown in the following example.


 version 1.0
workflow CramToBamFlow {
    input {
        File ref_fasta
        File ref_fasta_index
        File ref_dict
        File input_cram
        String sample_name
        String gotc_docker = "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-
cloud:latest"
    }
    #Converts CRAM to SAM to BAM and makes BAI.
    call CramToBamTask{
         input:
            ref_fasta = ref_fasta,
            ref_fasta_index = ref_fasta_index,
            ref_dict = ref_dict,
            input_cram = input_cram,
            sample_name = sample_name,
            docker_image = gotc_docker,
     }
     #Validates Bam.
     call ValidateSamFile{
        input:
           input_bam = CramToBamTask.outputBam,
           docker_image = gotc_docker,
     }
     #Outputs Bam, Bai, and validation report to the FireCloud data model.
     output {
         File outputBam = CramToBamTask.outputBam
         File outputBai = CramToBamTask.outputBai
         File validation_report = ValidateSamFile.report
      }
}
#Task definitions.
task CramToBamTask {
    input {
       # Command parameters
       File ref_fasta
       File ref_fasta_index
       File ref_dict
       File input_cram
       String sample_name
       # Runtime parameters
       String docker_image
    }
   #Calls samtools view to do the conversion.
   command {
       set -eo pipefail

       samtools view -h -T ~{ref_fasta} ~{input_cram} |
       samtools view -b -o ~{sample_name}.bam -
       samtools index -b ~{sample_name}.bam
       mv ~{sample_name}.bam.bai ~{sample_name}.bai
    }
    
    #Runtime attributes:
    runtime {
        docker: docker_image
    }

    #Outputs a BAM and BAI with the same sample name
     output {
         File outputBam = "~{sample_name}.bam"
         File outputBai = "~{sample_name}.bai"
    }
}

#Validates BAM output to ensure it wasn't corrupted during the file conversion.
task ValidateSamFile {
   input {
      File input_bam
      Int machine_mem_size = 4
      String docker_image
   }
   String output_name = basename(input_bam, ".bam") + ".validation_report"
   Int command_mem_size = machine_mem_size - 1
   command {
       java -Xmx~{command_mem_size}G -jar /usr/gitc/picard.jar \
       ValidateSamFile \
       INPUT=~{input_bam} \
       OUTPUT=~{output_name} \
       MODE=SUMMARY \
       IS_BISULFITE_SEQUENCED=false
    }
    runtime {
    docker: docker_image
    }
   #A text file is generated that lists errors or warnings that apply.
    output {
        File report = "~{output_name}"
    }
}

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Task accelerators

Nextflow specifics