

AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. [Learn more](https://aws.amazon.com/blogs/big-data/migrate-workloads-from-aws-data-pipeline/)

# Pipeline Expressions and Functions
<a name="dp-expressions-functions"></a>

This section explains the syntax for using expressions and functions in pipelines, including the associated data types.

## Simple Data Types
<a name="dp-pipeline-datatypes"></a>

The following types of data can be set as field values.

**Topics**
+ [DateTime](#dp-datatype-datetime)
+ [Numeric](#dp-datatype-numeric)
+ [Object References](#dp-datatype-object-reference)
+ [Period](#dp-datatype-period)
+ [String](#dp-datatype-section)

### DateTime
<a name="dp-datatype-datetime"></a>

 AWS Data Pipeline supports the date and time expressed in "YYYY-MM-DDTHH:MM:SS" format in UTC/GMT only. The following example sets the `startDateTime` field of a `Schedule` object to `1/15/2012, 11:59 p.m.`, in the UTC/GMT timezone. 

```
"startDateTime" : "2012-01-15T23:59:00"
```

### Numeric
<a name="dp-datatype-numeric"></a>

 AWS Data Pipeline supports both integers and floating-point values. 

### Object References
<a name="dp-datatype-object-reference"></a>

An object in the pipeline definition. This can either be the current object, the name of an object defined elsewhere in the pipeline, or an object that lists the current object in a field, referenced by the `node` keyword. For more information about `node`, see [Referencing Fields and Objects](dp-pipeline-expressions.md#dp-pipeline-expressions-reference). For more information about the pipeline object types, see [Pipeline Object Reference](dp-pipeline-objects.md). 

### Period
<a name="dp-datatype-period"></a>

 Indicates how often a scheduled event should run. It's expressed in the format "*N* [`years`\$1`months`\$1`weeks`\$1`days`\$1`hours`\$1`minutes`]", where *N* is a positive integer value. 

The minimum period is 15 minutes and the maximum period is 3 years.

The following example sets the `period` field of the `Schedule` object to 3 hours. This creates a schedule that runs every three hours.

```
"period" : "3 hours"
```

### String
<a name="dp-datatype-section"></a>

 Standard string values. Strings must be surrounded by double quotes ("). You can use the backslash character (\$1) to escape characters in a string. Multiline strings are not supported. 

The following examples show examples of valid string values for the `id` field.

```
"id" : "My Data Object"

"id" : "My \"Data\" Object"
```

Strings can also contain expressions that evaluate to string values. These are inserted into the string, and are delimited with: "\$1\$1" and "\$1". The following example uses an expression to insert the name of the current object into a path.

```
"filePath" : "s3://amzn-s3-demo-bucket/#{name}.csv"
```

For more information about using expressions, see [Referencing Fields and Objects](dp-pipeline-expressions.md#dp-pipeline-expressions-reference) and [Expression Evaluation](dp-pipeline-expressions.md#dp-datatype-functions).

# Expressions
<a name="dp-pipeline-expressions"></a>

Expressions enable you to share a value across related objects. Expressions are processed by the AWS Data Pipeline web service at runtime, ensuring that all expressions are substituted with the value of the expression. 

Expressions are delimited by: "\$1\$1" and "\$1". You can use an expression in any pipeline definition object where a string is legal. If a slot is a reference or one of type ID, NAME, TYPE, SPHERE, its value is not evaluated and used verbatim.

The following expression calls one of the AWS Data Pipeline functions. For more information, see [Expression Evaluation](#dp-datatype-functions).

```
#{format(myDateTime,'YYYY-MM-dd hh:mm:ss')}
```

## Referencing Fields and Objects
<a name="dp-pipeline-expressions-reference"></a>

Expressions can use fields of the current object where the expression exists, or fields of another object that is linked by a reference.

A slot format consists of a creation time followed by the object creation time, such as `@S3BackupLocation_2018-01-31T11:05:33`. 

 You can also reference the exact slot ID specified in the pipeline definition, such as the slot ID of the Amazon S3 backup location. To reference the slot ID, use `#{parent.@id}`.

In the following example, the `filePath` field references the `id` field in the same object to form a file name. The value of `filePath` evaluates to "`s3://amzn-s3-demo-bucket/ExampleDataNode.csv`". 

```
{
  "id" : "ExampleDataNode",
  "type" : "S3DataNode",
  "schedule" : {"ref" : "ExampleSchedule"},
  "filePath" : "s3://amzn-s3-demo-bucket/#{parent.@id}.csv",
  "precondition" : {"ref" : "ExampleCondition"},
  "onFail" : {"ref" : "FailureNotify"}
}
```

To use a field that exists on another object linked by a reference, use the `node` keyword. This keyword is only available with alarm and precondition objects.

Continuing with the previous example, an expression in an `SnsAlarm` can refer to the date and time range in a `Schedule`, because the `S3DataNode` references both.

 Specifically, `FailureNotify`'s `message` field can use the `@scheduledStartTime` and `@scheduledEndTime` runtime fields from `ExampleSchedule`, because `ExampleDataNode`'s `onFail` field references `FailureNotify` and its `schedule` field references `ExampleSchedule`.

```
{  
    "id" : "FailureNotify",
    "type" : "SnsAlarm",
    "subject" : "Failed to run pipeline component",
    "message": "Error for interval #{node.@scheduledStartTime}..#{node.@scheduledEndTime}.",
    "topicArn":"arn:aws:sns:us-east-1:28619EXAMPLE:ExampleTopic"
},
```

**Note**  
You can create pipelines that have dependencies, such as tasks in your pipeline that depend on the work of other systems or tasks. If your pipeline requires certain resources, add those dependencies to the pipeline using preconditions that you associate with data nodes and tasks. This makes your pipelines easier to debug and more resilient. Additionally, keep your dependencies within a single pipeline when possible, because cross-pipeline troubleshooting is difficult.

## Nested Expressions
<a name="dp-datatype-nested"></a>

 AWS Data Pipeline allows you to nest values to create more complex expressions. For example, to perform a time calculation (subtract 30 minutes from the `scheduledStartTime`) and format the result to use in a pipeline definition, you could use the following expression in an activity: 

```
#{format(minusMinutes(@scheduledStartTime,30),'YYYY-MM-dd hh:mm:ss')}
```

 and using the `node` prefix if the expression is part of an SnsAlarm or Precondition: 

```
#{format(minusMinutes(node.@scheduledStartTime,30),'YYYY-MM-dd hh:mm:ss')}
```

## Lists
<a name="dp-datatype-list-function"></a>

Expressions can be evaluated on lists and functions on lists. For example, assume that a list is defined like the following: `"myList":["one","two"]`. If this list is used in the expression `#{'this is ' + myList}`, it will evaluate to `["this is one", "this is two"]`. If you have two lists, Data Pipeline will ultimately flatten them in their evaluation. For example, if `myList1` is defined as `[1,2]` and `myList2` is defined as `[3,4]` then the expression `[#{myList1}, #{myList2}]` will evaluate to `[1,2,3,4]`.

## Node Expression
<a name="dp-datatype-node"></a>

 AWS Data Pipeline uses the `#{node.*}` expression in either `SnsAlarm` or `PreCondition` for a back-reference to a pipeline component's parent object. Since `SnsAlarm` and `PreCondition` are referenced from an activity or resource with no reference back from them, `node` provides the way to refer to the referrer. For example, the following pipeline definition demonstrates how a failure notification can use `node` to make a reference to its parent, in this case `ShellCommandActivity`, and include the parent's scheduled start and end times in the `SnsAlarm` message. The scheduledStartTime reference on ShellCommandActivity does not require the `node` prefix because scheduledStartTime refers to itself. 

**Note**  
The fields preceded by the AT (@) sign indicate those fields are runtime fields.

```
{
  "id" : "ShellOut",
  "type" : "ShellCommandActivity",
  "input" : {"ref" : "HourlyData"},
  "command" : "/home/userName/xxx.sh #{@scheduledStartTime} #{@scheduledEndTime}",   
  "schedule" : {"ref" : "HourlyPeriod"},
  "stderr" : "/tmp/stderr:#{@scheduledStartTime}",
  "stdout" : "/tmp/stdout:#{@scheduledStartTime}",
  "onFail" : {"ref" : "FailureNotify"},
},
{  
  "id" : "FailureNotify",
  "type" : "SnsAlarm",
  "subject" : "Failed to run pipeline component",
  "message": "Error for interval #{node.@scheduledStartTime}..#{node.@scheduledEndTime}.",
  "topicArn":"arn:aws:sns:us-east-1:28619EXAMPLE:ExampleTopic"
},
```

AWS Data Pipeline supports transitive references for user-defined fields, but not runtime fields. A transitive reference is a reference between two pipeline components that depends on another pipeline component as the intermediary. The following example shows a reference to a transitive user-defined field and a reference to a non-transitive runtime field, both of which are valid. For more information, see [User-defined fields](dp-writing-pipeline-definition.md#dp-userdefined-fields). 

```
{
  "name": "DefaultActivity1",
  "type": "CopyActivity",
  "schedule": {"ref": "Once"},
  "input": {"ref": "s3nodeOne"},  
  "onSuccess": {"ref": "action"},
  "workerGroup": "test",  
  "output": {"ref": "s3nodeTwo"}
},
{
  "name": "action",
  "type": "SnsAlarm",
  "message": "S3 bucket '#{node.output.directoryPath}' succeeded at #{node.@actualEndTime}.",
  "subject": "Testing",  
  "topicArn": "arn:aws:sns:us-east-1:28619EXAMPLE:ExampleTopic",
  "role": "DataPipelineDefaultRole"
}
```

## Expression Evaluation
<a name="dp-datatype-functions"></a>

 AWS Data Pipeline provides a set of functions that you can use to calculate the value of a field. The following example uses the `makeDate` function to set the `startDateTime` field of a `Schedule` object to `"2011-05-24T0:00:00"` GMT/UTC. 

```
"startDateTime" : "makeDate(2011,5,24)"
```

# Mathematical Functions
<a name="dp-pipeline-reference-functions-math"></a>

The following functions are available for working with numerical values. 


****  

| Function | Description | 
| --- | --- | 
|  \$1  |  Addition. Example: `#{1 + 2}` Result: `3`  | 
|  -  |  Subtraction. Example: `#{1 - 2}` Result: `-1`  | 
|  \$1  |  Multiplication. Example: `#{1 * 2}` Result: `2`  | 
|  /  |  Division. If you divide two integers, the result is truncated. Example: `#{1 / 2}`, Result: `0` Example: `#{1.0 / 2}`, Result: `.5`  | 
|  ^  |  Exponent. Example: `#{2 ^ 2}` Result: `4.0`  | 

# String Functions
<a name="dp-pipeline-reference-functions-string"></a>

 The following functions are available for working with string values. 


****  

| Function | Description | 
| --- | --- | 
|  \$1  |  Concatenation. Non-string values are first converted to strings. Example: `#{"hel" + "lo"}` Result: `"hello"`  | 

# Date and Time Functions
<a name="dp-pipeline-reference-functions-datetime"></a>

 The following functions are available for working with DateTime values. For the examples, the value of `myDateTime` is `May 24, 2011 @ 5:10 pm GMT`. 

**Note**  
The date/time format for AWS Data Pipeline is Joda Time, which is a replacement for the Java date and time classes. For more information, see [Joda Time - Class DateTimeFormat](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html).


****  

| Function | Description | 
| --- | --- | 
|  `int day(DateTime myDateTime)`  |  Gets the day of the DateTime value as an integer. Example: `#{day(myDateTime)}` Result: `24`  | 
|  `int dayOfYear(DateTime myDateTime)`  |  Gets the day of the year of the DateTime value as an integer. Example: `#{dayOfYear(myDateTime)}` Result: `144`  | 
|  `DateTime firstOfMonth(DateTime myDateTime)`  |  Creates a DateTime object for the start of the month in the specified DateTime. Example: `#{firstOfMonth(myDateTime)}` Result: `"2011-05-01T17:10:00z"`  | 
|  `String format(DateTime myDateTime,String format)`  |  Creates a String object that is the result of converting the specified DateTime using the specified format string. Example: `#{format(myDateTime,'YYYY-MM-dd HH:mm:ss z')}` Result: `"2011-05-24T17:10:00 UTC"`  | 
|  `int hour(DateTime myDateTime)`  |  Gets the hour of the DateTime value as an integer. Example: `#{hour(myDateTime)}` Result: `17`  | 
|  `DateTime makeDate(int year,int month,int day)`  |  Creates a DateTime object, in UTC, with the specified year, month, and day, at midnight. Example: `#{makeDate(2011,5,24)}` Result: `"2011-05-24T0:00:00z"`  | 
|  `DateTime makeDateTime(int year,int month,int day,int hour,int minute)`  |  Creates a DateTime object, in UTC, with the specified year, month, day, hour, and minute. Example: `#{makeDateTime(2011,5,24,14,21)}` Result: `"2011-05-24T14:21:00z"`  | 
|  `DateTime midnight(DateTime myDateTime)`  |  Creates a DateTime object for the current midnight, relative to the specified DateTime. For example, where `MyDateTime` is `2011-05-25T17:10:00z`, the result is as follows.  Example: `#{midnight(myDateTime)}` Result: `"2011-05-25T0:00:00z"`  | 
|  `DateTime minusDays(DateTime myDateTime,int daysToSub)`  |  Creates a DateTime object that is the result of subtracting the specified number of days from the specified DateTime. Example: `#{minusDays(myDateTime,1)}` Result: `"2011-05-23T17:10:00z"`  | 
|  `DateTime minusHours(DateTime myDateTime,int hoursToSub)`  |  Creates a DateTime object that is the result of subtracting the specified number of hours from the specified DateTime. Example: `#{minusHours(myDateTime,1)}` Result: `"2011-05-24T16:10:00z"`  | 
|  `DateTime minusMinutes(DateTime myDateTime,int minutesToSub)`  |  Creates a DateTime object that is the result of subtracting the specified number of minutes from the specified DateTime. Example: `#{minusMinutes(myDateTime,1)}` Result: `"2011-05-24T17:09:00z"`  | 
|  `DateTime minusMonths(DateTime myDateTime,int monthsToSub)`  |  Creates a DateTime object that is the result of subtracting the specified number of months from the specified DateTime. Example: `#{minusMonths(myDateTime,1)}` Result: `"2011-04-24T17:10:00z"`  | 
|  `DateTime minusWeeks(DateTime myDateTime,int weeksToSub)`  |  Creates a DateTime object that is the result of subtracting the specified number of weeks from the specified DateTime. Example: `#{minusWeeks(myDateTime,1)}` Result: `"2011-05-17T17:10:00z"`  | 
|  `DateTime minusYears(DateTime myDateTime,int yearsToSub)`  |  Creates a DateTime object that is the result of subtracting the specified number of years from the specified DateTime. Example: `#{minusYears(myDateTime,1)}` Result: `"2010-05-24T17:10:00z"`  | 
|  `int minute(DateTime myDateTime)`  |  Gets the minute of the DateTime value as an integer. Example: `#{minute(myDateTime)}` Result: `10`  | 
|  `int month(DateTime myDateTime)`  |  Gets the month of the DateTime value as an integer. Example: `#{month(myDateTime)}` Result: `5`  | 
|  `DateTime plusDays(DateTime myDateTime,int daysToAdd)`  |  Creates a DateTime object that is the result of adding the specified number of days to the specified DateTime. Example: `#{plusDays(myDateTime,1)}` Result: `"2011-05-25T17:10:00z"`  | 
|  `DateTime plusHours(DateTime myDateTime,int hoursToAdd)`  |  Creates a DateTime object that is the result of adding the specified number of hours to the specified DateTime. Example: `#{plusHours(myDateTime,1)}` Result: `"2011-05-24T18:10:00z"`  | 
|  `DateTime plusMinutes(DateTime myDateTime,int minutesToAdd)`  |  Creates a DateTime object that is the result of adding the specified number of minutes to the specified DateTime. Example: `#{plusMinutes(myDateTime,1)}` Result: `"2011-05-24 17:11:00z"`  | 
|  `DateTime plusMonths(DateTime myDateTime,int monthsToAdd)`  |  Creates a DateTime object that is the result of adding the specified number of months to the specified DateTime. Example: `#{plusMonths(myDateTime,1)}` Result: `"2011-06-24T17:10:00z"`  | 
|  `DateTime plusWeeks(DateTime myDateTime,int weeksToAdd)`  |  Creates a DateTime object that is the result of adding the specified number of weeks to the specified DateTime. Example: `#{plusWeeks(myDateTime,1)}` Result: `"2011-05-31T17:10:00z"`  | 
|  `DateTime plusYears(DateTime myDateTime,int yearsToAdd)`  |  Creates a DateTime object that is the result of adding the specified number of years to the specified DateTime. Example: `#{plusYears(myDateTime,1)}` Result: `"2012-05-24T17:10:00z"`  | 
|  `DateTime sunday(DateTime myDateTime)`  |  Creates a DateTime object for the previous Sunday, relative to the specified DateTime. If the specified DateTime is a Sunday, the result is the specified DateTime. Example: `#{sunday(myDateTime)}` Result: `"2011-05-22 17:10:00 UTC"`  | 
|  `int year(DateTime myDateTime)`  |  Gets the year of the DateTime value as an integer. Example: `#{year(myDateTime)}` Result: `2011`  | 
|  `DateTime yesterday(DateTime myDateTime)`  |  Creates a DateTime object for the previous day, relative to the specified DateTime. The result is the same as minusDays(1). Example: `#{yesterday(myDateTime)}` Result: `"2011-05-23T17:10:00z"`  | 

# Special Characters
<a name="dp-pipeline-characters"></a>

AWS Data Pipeline uses certain characters that have a special meaning in pipeline definitions, as shown in the following table. 


****  

| Special Character | Description | Examples | 
| --- | --- | --- | 
| @ | Runtime field. This character is a field name prefix for a field that is only available when a pipeline runs. | @actualStartTime @failureReason @resourceStatus | 
| \$1 | Expression. Expressions are delimited by: "\$1\$1" and "\$1" and the contents of the braces are evaluated by AWS Data Pipeline. For more information, see [Expressions](dp-pipeline-expressions.md). | \$1\$1format(myDateTime,'YYYY-MM-dd hh:mm:ss')\$1 s3://amzn-s3-demo-bucket/\$1\$1id\$1.csv | 
| \$1 | Encrypted field. This character is a field name prefix to indicate that AWS Data Pipeline should encrypt the contents of this field in transit between the console or CLI and the AWS Data Pipeline service. | \$1password | 