Parameter template files for HealthOmics workflows - AWS HealthOmics

Parameter template files for HealthOmics workflows

Parameter templates define the input parameters for a workflow. You can define input parameters to make your workflow more flexible and versatile. For example, you can define a parameter for the Amazon S3 location of the reference genome files. Users can then run the workflow using various data sets.

You can create the parameter template for your workflow, or HealthOmics can generate the parameter template for you.

The parameter template is a JSON file. In the file, each input parameter is a named object that must match the name of the workflow input. When you start a run, if you don't provide values for all the required parameters, the run fails.

The input parameter object includes the following attributes:

  • description – This required attribute is a string that the console displays in the Start run page. This description is also retained as run metadata.

  • optional – This optional attribute indicates whether the input parameter is optional. If you don't specify the optional field, the input parameter is required.

The following example parameter template shows how to specify the input parameters.

{ "myRequiredParameter1": { "description": "this parameter is required", }, "myRequiredParameter2": { "description": "this parameter is also required", "optional": false }, "myOptionalParameter": { "description": "this parameter is optional", "optional": true } }

Generating parameter templates

HealthOmics generates the parameter template by parsing the workflow definition to detect input parameters. If you provide a parameter template file for a workflow, the parameters in your file override the parameters detected in the workflow definition.

There are slight differences between the parsing logic of the CWL, WDL, and Nextflow engines, as described in the following sections.

Parameter detection for CWL

In the CWL workflow engine, the parsing logic makes the following assumptions:

  • Any nullable supported types are marked as optional input parameters

  • Any non-null supported types are marked as required input parameters

  • Descriptions are extracted from the label section from the main workflow definition. If label is not specified, the description will be blank (an empty string).

The following tables show CWL interpolation examples. For each example, the parameter name is x. If the parameter is required, you must provide a value for the parameter. If the parameter is optional, you don't need to provide a value.

This table shows CWL interpolation examples for primitive types.

Input Example input/output Required
x: type: int
1 or 2 or ... Yes
x: type: int default: 2
Default value is 2. Valid input is 1 or 2 or ... Yes
x: type: int?
Valid input is None or 1 or 2 or ... No
x: type: int? default: 2
Default value is 2. Valid input is None or 1 or 2 or ... No

The following table shows CWL interpolation examples for complex types. A complex type is a collection of primitive types.

Input Example input/output Required
x: type: array items: int
[] or [1,2,3] Yes
x: type: array? items: int
None or [] or [1,2,3] No
x: type: array items: int?

[] or [None, 3, None]

Yes
x: type: array? items: int?

[None] or None or [1,2,3] or [None, 3] but not []

No

Parameter detection for WDL

In the WDL workflow engine, the parsing logic makes the following assumptions:

  • Any nullable supported types are marked as optional input parameters.

  • For non-nullable supported types:

    • Any input variable with assignment of literals or expression are marked as optional parameters. For example:

      Int x = 2 Float f0 = 1.0 + f1
    • If no values or expressions have been been assigned to the input parameters, they will be marked as required parameters.

  • Descriptions are extracted from parameter_meta in the main workflow definition. If parameter_meta is not specified, the description will be blank (an empty string). For more information, see the WDL specification for Parameter metadata.

The following tables show WDL interpolation examples. For each example, the parameter name is x. If the parameter is required, you must provide a value for the parameter. If the parameter is optional, you don't need to provide a value.

This table shows WDL interpolation examples for primitive types.

Input Example input/output Required
Int x 1 or 2 or ... Yes
Int x = 2 2 No
Int x = 1+2 3 No
Int x = y+z y+z No
Int? x None or 1 or 2 or ... Yes
Int? x = 2 None or 2 No
Int? x = 1+2 None or 3 No
Int? x = y+z None or y+z No

The following table shows WDL interpolation examples for complex types. A complex type is a collection of primitive types.

Input Example input/output Required
Array[Int] x [1,2,3] or [] Yes
Array[Int]+ x [1], but not [] Yes
Array[Int]? x None or [] or [1,2,3] No
Array[Int?] x [] or [None, 3, None] Yes
Array[Int?]=? x [None] or None or [1,2,3] or [None, 3] but not [] No
Struct sample {String a, Int y}

later in inputs: Sample mySample

String a = mySample.a Int y = mySample.y
Yes
Struct sample {String a, Int y}

later in inputs: Sample? mySample

if (defined(mySample)) { String a = mySample.a Int y = mySample.y }
No

Parameter detection for Nextflow

For Nextflow, HealthOmics generates the parameter template by parsing the nextflow_schema.json file. If the workflow definition doesn't include a schema file, HealthOmics parses the main workflow definition file.

Parsing the schema file

For parsing to work correctly, make sure the schema file meets the following requirements:

HealthOmics parses the nextflow_schema.json file to generate the parameter template:

  • Extracts all properties that are defined in the schema.

  • Includes the property description if available for the property.

  • Identifies whether each parameter is optional or required, based on the required field of the property.

The following example shows a definition file and the generated parameter file.

{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "$defs": { "input_options": { "title": "Input options", "type": "object", "required": ["input_file"], "properties": { "input_file": { "type": "string", "format": "file-path", "pattern": "^s3://[a-z0-9.-]{3,63}(?:/\\S*)?$", "description": "description for input_file" }, "input_num": { "type": "integer", "default": 42, "description": "description for input_num" } } }, "output_options": { "title": "Output options", "type": "object", "required": ["output_dir"], "properties": { "output_dir": { "type": "string", "format": "file-path", "description": "description for output_dir", } } } }, "properties": { "ungrouped_input_bool": { "type": "boolean", "default": true } }, "required": ["ungrouped_input_bool"], "allOf": [ { "$ref": "#/$defs/input_options" }, { "$ref": "#/$defs/output_options" } ] }

The generated parameter template:

{ "input_file": { "description": "description for input_file", "optional": False }, "input_num": { "description": "description for input_num", "optional": True }, "output_dir": { "description": "description for output_dir", "optional": False }, "ungrouped_input_bool": { "description": None, "optional": False } }

Parsing the main file

If the workflow definition doesn't include a nextflow_schema.json file, HealthOmics parses the main workflow definition file.

HealthOmics analyzes the params expressions found in the main workflow definition file and in the nextflow.config file. All params with default values are marked as optional.

For parsing to work correctly, note the following requirements:

  • HealthOmics parses only the main workflow definition file. To ensure all parameters are captured, we recommend that you wire all params through to any submodules and imported workflows.

  • The config file is optional. If you define one, name it nextflow.config and place it in the same directory as the main workflow definition file.

The following example shows a definition file and the generated parameter template.

params.input_file = "default.txt" params.threads = 4 params.memory = "8GB" workflow { if (params.version) { println "Using version: ${params.version}" } }

The generated parameter template:

{ "input_file": { "description": None, "optional": True }, "threads": { "description": None, "optional": True }, "memory": { "description": None, "optional": True }, "version": { "description": None, "optional": False } }

For default values that are defined in nextflow.config, HealthOmics collects params assignments and parameters declared within params {}, as shown in the following example. In assignment statements, params must appear in the left side of the statement.

params.alpha = "alpha" params.beta = "beta" params { gamma = "gamma" delta = "delta" } env { // ignored, as this assignment isn't in the params block VERSION = "TEST" } // ignored, as params is not on the left side interpolated_image = "${params.cli_image}"

The generated parameter template:

{ // other params in your main workflow defintion "alpha": { "description": None, "optional": True }, "beta": { "description": None, "optional": True }, "gamma": { "description": None, "optional": True }, "delta": { "description": None, "optional": True } }

Nested parameters

Both nextflow_schema.json and nextflow.config allow nested parameters. However, the HealthOmics parameter template requires only the top-level parameters. If your workflow uses a nested parameter, you must provide a JSON object as the input for that parameter.

Nested parameters in schema files

HealthOmics skips nested params when parsing a nextflow_schema.json file. For example, if you define the following nextflow_schema.json file:

{ "properties": { "input": { "properties": { "input_file": { ... }, "input_num": { ... } } }, "input_bool": { ... } } }

HealthOmics ignores input_file and input_num when it generates the parameter template:

{ "input": { "description": None, "optional": True }, "input_bool": { "description": None, "optional": True } }

When you run this workflow, HealthOmics expects an input.json file similar to the following:

{ "input": { "input_file": "s3://bucket/obj", "input_num": 2 }, "input_bool": false }
Nested parameters in config files

HealthOmics doesn't collect nested params in a nextflow.config file, and skips them during parsing. For example, if you define the following nextflow.config file:

params.alpha = "alpha" params.nested.beta = "beta" params { gamma = "gamma" group { delta = "delta" } }

HealthOmics ignores params.nested.beta and params.group.delta when it generates the parameter template:

{ "alpha": { "description": None, "optional": True }, "gamma": { "description": None, "optional": True } }

Examples of Nextflow interpolation

The following table shows Nextflow interpolation examples for params in the main file.

Parameters Required
params.input_file Yes
params.input_file = "s3://bucket/data.json" No
params.nested.input_file N/A
params.nested.input_file = "s3://bucket/data.json" N/A

The following table shows Nextflow interpolation examples for params in the nextflow.config file.

Parameters Required
params.input_file = "s3://bucket/data.json"
No
params { input_file = "s3://bucket/data.json" }
No
params { nested { input_file = "s3://bucket/data.json" } }
N/A
input_file = params.input_file
N/A