

# Data structure recipe steps


Use these recipe steps to tabulate and summarize data from different perspectives, or to perform advanced functions.

**Topics**
+ [

# NEST\$1TO\$1ARRAY
](recipe-actions.NEST_TO_ARRAY.md)
+ [

# NEST\$1TO\$1MAP
](recipe-actions.NEST_TO_MAP.md)
+ [

# NEST\$1TO\$1STRUCT
](recipe-actions.NEST_TO_STRUCT.md)
+ [

# UNNEST\$1ARRAY
](recipe-actions.UNNEST_ARRAY.md)
+ [

# UNNEST\$1MAP
](recipe-actions.UNNEST_MAP.md)
+ [

# UNNEST\$1STRUCT
](recipe-actions.UNNEST_STRUCT.md)
+ [

# UNNEST\$1STRUCT\$1N
](recipe-actions.UNNEST_STRUCT_N.md)
+ [

# GROUP\$1BY
](recipe-actions.GROUP_BY.md)
+ [

# JOIN
](recipe-actions.JOIN.md)
+ [

# PIVOT
](recipe-actions.PIVOT.md)
+ [

## SCALE
](#recipe-actions.SCALE)
+ [

# TRANSPOSE
](recipe-actions.TRANSPOSE.md)
+ [

# UNION
](recipe-actions.UNION.md)
+ [

# UNPIVOT
](recipe-actions.UNPIVOT.md)

# NEST\$1TO\$1ARRAY


Converts user-selected columns into array values. The order of the selected columns is maintained while creating the resultant array. The different column data types are typecast to a common type that supports the data types of all columns.

**Parameters**
+ `sourceColumns` — List of the source columns.
+ `targetColumn` — The name of the target column.
+ `removeSourceColumns` — Contains the value `true` or `false` to indicate whether or not the user wants to remove the selected source columns.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "NEST_TO_ARRAY",
        "Parameters": {
            "sourceColumns": "[\"age\",\"weight_kg\",\"height_cm\"]",
            "targetColumn": "columnName",
            "removeSourceColumns": "true"
        }
    }
}
```

# NEST\$1TO\$1MAP


Converts user-selected columns into key-value pairs, each with a key representing the column name and a value representing the row value. The order of the selected column is not maintained while creating the resultant map. The different column data types are typecast to a common type that supports the data types of all columns.

**Parameters**
+ `sourceColumns` — List of the source columns.
+ `targetColumn` — The name of the target column.
+ `removeSourceColumns` — Contains the value `true` or `false` to indicate whether or not the user wants to remove the selected source columns.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "NEST_TO_MAP",
        "Parameters": {
            "sourceColumns": "[\"age\",\"weight_kg\",\"height_cm\"]",
            "targetColumn": "columnName",
            "removeSourceColumns": "true"
        }
    }
}
```

# NEST\$1TO\$1STRUCT


Converts user-selected columns into key-value pairs, each with a key representing the column name and a value representing the row value. The order of the selected columns and the data type of each column are maintained in the resultant struct.

**Parameters**
+ `sourceColumns` — List of the source columns.
+ `targetColumn` — The name of the target column.
+ `removeSourceColumns` — Contains the value `true` or `false` to indicate whether or not the user wants to remove the selected source columns.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "NEST_TO_STRUCT",
        "Parameters": {
            "sourceColumns": "[\"age\",\"weight_kg\",\"height_cm\"]",
            "targetColumn": "columnName",
            "removeSourceColumns": "true"
        }
    }
}
```

# UNNEST\$1ARRAY


Unnests a column of type `array` into a new column. If the array contains more than one value, then a row corresponding to each element is generated. This function only unnests one level of an array column.

**Parameters**
+ `sourceColumn` — The name of an existing column. This column must be of `struct` type.
+ `targetColumn` — Name of the target column that is generated.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "UNNEST_ARRAY",
        "Parameters": {
            "sourceColumn": "address",
            "targetColumn": "address"
        }
    }
}
```

# UNNEST\$1MAP


Unnests a column of type `map` and generates a column for the key and value. If there is more than one key-value pair, a row corresponding to each key value would be generated. This function only unnests one level of a map column.

**Parameters**
+ `sourceColumn` — The name of an existing column. This column must be of `struct` type.
+ `removeSourceColumn` — If `true`, the source column is deleted after the function is complete.
+ `targetColumn` — If provided, each of the generated column will start with this as the prefix.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "UNNEST_MAP",
        "Parameters": {
            "sourceColumn": "address",
            "removeSourceColumn": "false",
            "targetColumn": "address"
        }
    }
}
```

# UNNEST\$1STRUCT


Unnest a column of type `struct` and generates a column for each of the keys present in the struct. This function only unnests struct level one.

**Parameters**
+ `sourceColumn` — The name of an existing column. This column must be of struct type.
+ `removeSourceColumn` — If `true`, the source column is deleted after the function is complete.
+ `targetColumn` — If provided, each of the generated column will start with this as the prefix.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "UNNEST_STRUCT",
        "Parameters": {
            "sourceColumn": "address",
            "removeSourceColumn": "false"
            "targetColumn": "add"
        }
    }
}
```

# UNNEST\$1STRUCT\$1N


Creates a new column for each field of a selected column of type `struct`.

For example, given the following struct:

```
            user {
               name: “Ammy” 
               address: {
                  state: "CA",
                  zipcode: 12345
               }
            }
```

This function creates 3 columns:


| user.name | user.address.state | user.address.zipcode | 
| --- | --- | --- | 
|  Ammy  |  CA  |  12345  | 

**Parameters**
+ `sourceColumns` — List of the source columns.
+ `regexColumnSelector` — A regular expression to select the columns to unnest.
+ `removeSourceColumn` — A Boolean value. If true, then remove the source column; otherwise keep it.
+ `unnestLevel` — The number of levels to unnest.
+ `delimiter` — The delimiter is used in the newly created column name to separate the different levels of the struct. For example: if the delimiter is “/”, the column name will be in this form: “user/address/state”.
+ `conditionExpressions` — Condition expressions.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "UNNEST_STRUCT_N",
        "Parameters": {
            "sourceColumns": "[\"address\"]",
            "removeSourceColumn": "true",
            "unnestLevel": "2",
            "delimiter": "/"
        }
    }
}
```

# GROUP\$1BY


Summarizes the data by grouping rows by one or more columns, and then applying an aggregation function to each group. 

**Parameters**
+ `sourceColumns` — A JSON-encoded string representing a list of columns that form the basis of each group.
+ `groupByAggFunctions` — A JSON-encoded string representing a list of aggregation function to apply. (If you don't want aggregation, specify `UNAGGREGATED`.)
+ `useNewDataFrame` — If true, the results from GROUP\$1BY are made available in the project session, replacing its current contents.

**Example**  
  

```
[
  {
    "Action": {
      "Operation": "GROUP_BY",
      "Parameters": {
        "groupByAggFunctionOptions": "[{\"sourceColumnName\":\"all_votes\",\"targetColumnName\":\"all_votes_count\",\"targetColumnDataType\":\"number\",\"functionName\":\"COUNT\"}]",
        "sourceColumns": "[\"year\",\"state_name\"]",
        "useNewDataFrame": "true"
      }
    }
  }
]
```

# JOIN


Performs a join operation on two datasets.

**Parameters**
+ `joinKeys` — A JSON-encoded string representing a list of columns from each dataset to act as join keys.
+ `joinType` — The type of join to perform. Must be one of: `INNER_JOIN` \$1 `LEFT_JOIN `\$1 `RIGHT_JOIN` \$1 `OUTER_JOIN `\$1 `LEFT_EXCLUDING_JOIN` \$1 `RIGHT_EXCLUDING_JOIN` \$1 `OUTER_EXCLUDING_JOIN`
+ `leftColumns` — A JSON-encoded string representing a list of columns from the current active dataset.
+ `rightColumns` — A JSON-encoded string representing a list of columns from another (secondary) dataset to join to the current one.
+ `secondInputLocation` — An Amazon S3 URL that resolves to the data file for the secondary dataset.
+ `secondaryDatasetName` — The name of the secondary dataset.

**Example**  
  

```
{
    "Action": {
        "Operation": "JOIN",
        "Parameters": {
            "joinKeys": "[{\"key\":\"assembly_session\",\"value\":\"assembly_session\"},{\"key\":\"state_code\",\"value\":\"state_code\"}]",
            "joinType": "INNER_JOIN",
            "leftColumns": "[\"year\",\"assembly_session\",\"state_code\",\"state_name\",\"all_votes\",\"yes_votes\",\"no_votes\",\"abstain\",\"idealpoint_estimate\",\"affinityscore_usa\",\"affinityscore_russia\",\"affinityscore_china\",\"affinityscore_india\",\"affinityscore_brazil\",\"affinityscore_israel\"]",
            "rightColumns": "[\"assembly_session\",\"vote_id\",\"resolution\",\"state_code\",\"state_name\",\"member\",\"vote\"]",
            "secondInputLocation": "s3://databrew-public-datasets-us-east-1/votes.csv",
            "secondaryDatasetName": "votes"
        }
    }
}
```

# PIVOT


Converts all the row values in a selected column into individual columns with values.

![\[Diagram showing pivot column transformation: original table to new table with columns as values.\]](http://docs.aws.amazon.com/databrew/latest/dg/images/pivot.png)


**Parameters**
+ `sourceColumn` — The name of an existing column. The column can have a maximum of 10 distinct values.
+ `valueColumn` — The name of an existing column. The column can have a maximum of 10 distinct values.
+ `aggregateFunction` — The name of an aggregation function. If you don't want aggregation, use the keyword `COLLECT_LIST`.

**Example**  
  

```
{
    "Action": {
        "Operation": "PIVOT",
        "Parameters": {
            "aggregateFunction": "SUM",
            "sourceColumn": "state_name",
            "valueColumn": "all_votes"
        }
    }
}
```

## SCALE


Scales or normalizes the range of data in a numeric column.

**Parameters**
+ `sourceColumn` — The name of an existing column.
+ `strategy` — The operation to be applied to the column values:
  + `MIN_MAX` — Rescales the values into a range of [0,1].
  + `SCALE_BETWEEN` — Rescales the values into a range of two specified values.
  +  `MEAN_NORMALIZATION` — Rescales the data to have a mean (μ) of 0 and standard deviation (σ) of 1 within a range of [-1, 1].
  +  `Z_SCORE` — Linearly scales data values to have a mean (μ) of 0 and standard deviation (σ) of 1. Best for handling outliers.
+ `targetColumn` — The name of a column to contain the results.

**Example**  
  

```
{
    "Action": {
        "Operation": "NORMALIZATION",
        "Parameters": {
            "sourceColumn": "all_votes",
            "strategy": "MIN_MAX",
            "targetColumn": "all_votes_normalized"
        }
    }
}
```

# TRANSPOSE


Converts all selected rows to columns and columns to rows.

![\[Table transformation from rows to columns, showing data reorganization for improved analysis.\]](http://docs.aws.amazon.com/databrew/latest/dg/images/transpose.png)


**Parameters**
+ `pivotColumns` — A JSON-encoded string representing a list of columns whose rows will be converted to column names.
+ `valueColumns` — A JSON-encoded string representing a list of one or more columns to be converted to rows.
+ `aggregateFunction` — The name of an aggregation function. If you don't want aggregation, use the keyword `COLLECT_LIST`.
+ `newColumn` — The column to hold transposed columns as values.

**Example**  
  

```
{
    "Action": {
        "Operation": "TRANSPOSE",
        "Parameters": {
            "pivotColumns": "[\"Teacher\"]",
            "valueColumns": "[\"Tom\",\"John\",\"Harry\"]",
            "aggregateFunction": "COLLECT_LIST",
            "newColumn": "Student"
        }
    }

}
```

# UNION


Combines the rows from two or more datasets into a single result.

**Parameters**
+ `datasetsColumns` — A JSON-encoded string representing a list of all the columns in the datasets.
+ `secondaryDatasetNames` — A JSON-encoded string representing a list of one or more secondary datasets.
+ `secondaryInputs` — A JSON-encoded string representing a list of Amazon S3 buckets and object key names that tell DataBrew where to find the secondary dataset(s).
+ `targetColumnNames` — A JSON-encoded string representing a list of column names for the results.

**Example**  
  

```
{
    "Action": {
        "Operation": "UNION",
        "Parameters": {
            "datasetsColumns": "[[\"assembly_session\",\"state_code\",\"state_name\",\"year\",\"all_votes\",\"yes_votes\",\"no_votes\",\"abstain\",\"idealpoint_estimate\",\"affinityscore_usa\",\"affinityscore_russia\",\"affinityscore_china\",\"affinityscore_india\",\"affinityscore_brazil\",\"affinityscore_israel\"],[\"assembly_session\",\"state_code\",\"state_name\",null,null,null,null,null,null,null,null,null,null,null,null]]",
            "secondaryDatasetNames": "[\"votes\"]",
            "secondaryInputs": "[{\"S3InputDefinition\":{\"Bucket\":\"databrew-public-datasets-us-east-1\",\"Key\":\"votes.csv\"}}]",
            "targetColumnNames": "[\"assembly_session\",\"state_code\",\"state_name\",\"year\",\"all_votes\",\"yes_votes\",\"no_votes\",\"abstain\",\"idealpoint_estimate\",\"affinityscore_usa\",\"affinityscore_russia\",\"affinityscore_china\",\"affinityscore_india\",\"affinityscore_brazil\",\"affinityscore_israel\"]"
        }
    }
}
```

# UNPIVOT


Converts all the column values in a selected row into individual rows with values.

![\[Table transformation showing column values converted to individual rows with values.\]](http://docs.aws.amazon.com/databrew/latest/dg/images/unpivot.png)


**Parameters**
+ `sourceColumns` — A JSON-encoded string representing a list of one or more columns to be unpivoted.
+ `unpivotColumn` — The value column for the unpivot operation.
+ `valueColumn` — The column to hold unpivoted values.

**Example**  
  

```
{
    "Action": {
        "Operation": "UNPIVOT",
        "Parameters": {
            "sourceColumns": "[\"idealpoint_estimate\"]",
            "unpivotColumn": "unpivoted_idealpoint_estimate",
            "valueColumn": "unpivoted_column_values"
        }
    }
}
```