

# Writing the blueprint code
<a name="developing-blueprints-code"></a>

Each blueprint project that you create must contain at a minimum the following files:
+ A Python layout script that defines the workflow. The script contains a function that defines the entities (jobs and crawlers) in a workflow, and the dependencies between them.
+ A configuration file, `blueprint.cfg`, which defines:
  + The full path of the workflow layout definition function.
  + The parameters that the blueprint accepts.

**Topics**
+ [Creating the blueprint layout script](developing-blueprints-code-layout.md)
+ [Creating the configuration file](developing-blueprints-code-config.md)
+ [Specifying blueprint parameters](developing-blueprints-code-parameters.md)

# Creating the blueprint layout script
<a name="developing-blueprints-code-layout"></a>

The blueprint layout script must include a function that generates the entities in your workflow. You can name this function whatever you like. AWS Glue uses the configuration file to determine the fully qualified name of the function.

Your layout function does the following:
+ (Optional) Instantiates the `Job` class to create `Job` objects, and passes arguments such as `Command` and `Role`. These are job properties that you would specify if you were creating the job using the AWS Glue console or API.
+ (Optional) Instantiates the `Crawler` class to create `Crawler` objects, and passes name, role, and target arguments.
+ To indicate dependencies between the objects (workflow entities), passes the `DependsOn` and `WaitForDependencies` additional arguments to `Job()` and `Crawler()`. These arguments are explained later in this section.
+ Instantiates the `Workflow` class to create the workflow object that is returned to AWS Glue, passing a `Name` argument, an `Entities` argument, and an optional `OnSchedule` argument. The `Entities` argument specifies all of the jobs and crawlers to include in the workflow. To see how to construct an `Entities` object, see the sample project later in this section.
+ Returns the `Workflow` object.

For definitions of the `Job`, `Crawler`, and `Workflow` classes, see [AWS Glue blueprint classes reference](developing-blueprints-code-classes.md).

The layout function must accept the following input arguments.


| Argument | Description | 
| --- | --- | 
| user\$1params | Python dictionary of blueprint parameter names and values. For more information, see [Specifying blueprint parameters](developing-blueprints-code-parameters.md). | 
| system\$1params | Python dictionary containing two properties: region and accountId. | 

Here is a sample layout generator script in a file named `Layout.py`:

```
import argparse
import sys
import os
import json
from awsglue.blueprint.workflow import *
from awsglue.blueprint.job import *
from awsglue.blueprint.crawler import *


def generate_layout(user_params, system_params):

    etl_job = Job(Name="{}_etl_job".format(user_params['WorkflowName']),
                  Command={
                      "Name": "glueetl",
                      "ScriptLocation": user_params['ScriptLocation'],
                      "PythonVersion": "2"
                  },
                  Role=user_params['PassRole'])
    post_process_job = Job(Name="{}_post_process".format(user_params['WorkflowName']),
                            Command={
                                "Name": "pythonshell",
                                "ScriptLocation": user_params['ScriptLocation'],
                                "PythonVersion": "2"
                            },
                            Role=user_params['PassRole'],
                            DependsOn={
                                etl_job: "SUCCEEDED"
                            },
                            WaitForDependencies="AND")
    sample_workflow = Workflow(Name=user_params['WorkflowName'],
                            Entities=Entities(Jobs=[etl_job, post_process_job]))
    return sample_workflow
```

The sample script imports the required blueprint libraries and includes a `generate_layout` function that generates a workflow with two jobs. This is a very simple script. A more complex script could employ additional logic and parameters to generate a workflow with many jobs and crawlers, or even a variable number of jobs and crawlers.

## Using the DependsOn argument
<a name="developing-blueprints-code-layout-depends-on"></a>

The `DependsOn` argument is a dictionary representation of a dependency that this entity has on other entities within the workflow. It has the following form. 

```
DependsOn = {dependency1 : state, dependency2 : state, ...}
```

The keys in this dictionary represent the object reference, not the name, of the entity, while the values are strings that correspond to the state to watch for. AWS Glue infers the proper triggers. For the valid states, see [Condition Structure](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-trigger.html#aws-glue-api-jobs-trigger-Condition).

For example, a job might depend on the successful completion of a crawler. If you define a crawler object named `crawler2` as follows:

```
crawler2 = Crawler(Name="my_crawler", ...)
```

Then an object depending on `crawler2` would include a constructor argument such as: 

```
DependsOn = {crawler2 : "SUCCEEDED"}
```

For example:

```
job1 = Job(Name="Job1", ..., DependsOn = {crawler2 : "SUCCEEDED", ...})
```

If `DependsOn` is omitted for an entity, that entity depends on the workflow start trigger.

## Using the WaitForDependencies argument
<a name="developing-blueprints-code-layout-wait-for-dependencies"></a>

The `WaitForDependencies` argument defines whether a job or crawler entity should wait until *all* entities on which it depends complete or until *any* completes.

The allowable values are "`AND`" or "`ANY`".

## Using the OnSchedule argument
<a name="developing-blueprints-code-layout-on-schedule"></a>

The `OnSchedule` argument for the `Workflow` class constructor is a `cron` expression that defines the starting trigger definition for a workflow.

If this argument is specified, AWS Glue creates a schedule trigger with the corresponding schedule. If it isn't specified, the starting trigger for the workflow is an on-demand trigger.

# Creating the configuration file
<a name="developing-blueprints-code-config"></a>

The blueprint configuration file is a required file that defines the script entry point for generating the workflow, and the parameters that the blueprint accepts. The file must be named `blueprint.cfg`.

Here is a sample configuration file.

```
{
    "layoutGenerator": "DemoBlueprintProject.Layout.generate_layout",
    "parameterSpec" : {
           "WorkflowName" : {
                "type": "String",
                "collection": false
           },
           "WorkerType" : {
                "type": "String",
                "collection": false,
                "allowedValues": ["G1.X", "G2.X"],
                "defaultValue": "G1.X"
           },
           "Dpu" : {
                "type" : "Integer",
                "allowedValues" : [2, 4, 6],
                "defaultValue" : 2
           },
           "DynamoDBTableName": {
                "type": "String",
                "collection" : false
           },
           "ScriptLocation" : {
                "type": "String",
                "collection": false
    	}
    }
}
```

The `layoutGenerator` property specifies the fully qualified name of the function in the script that generates the layout.

The `parameterSpec` property specifies the parameters that this blueprint accepts. For more information, see [Specifying blueprint parameters](developing-blueprints-code-parameters.md).

**Important**  
Your configuration file must include the workflow name as a blueprint parameter, or you must generate a unique workflow name in your layout script.

# Specifying blueprint parameters
<a name="developing-blueprints-code-parameters"></a>

The configuration file contains blueprint parameter specifications in a `parameterSpec` JSON object. `parameterSpec` contains one or more parameter objects.

```
"parameterSpec": {
    "<parameter_name>": {
      "type": "<parameter-type>",
      "collection": true|false, 
      "description": "<parameter-description>",
      "defaultValue": "<default value for the parameter if value not specified>"
      "allowedValues": "<list of allowed values>" 
    },
    "<parameter_name>": {    
       ...
    }
  }
```

The following are the rules for coding each parameter object:
+ The parameter name and `type` are mandatory. All other properties are optional.
+ If you specify the `defaultValue` property, the parameter is optional. Otherwise the parameter is mandatory and the data analyst who is creating a workflow from the blueprint must provide a value for it.
+ If you set the `collection` property to `true`, the parameter can take a collection of values. Collections can be of any data type.
+ If you specify `allowedValues`, the AWS Glue console displays a dropdown list of values for the data analyst to choose from when creating a workflow from the blueprint.

The following are the permitted values for `type`:


| Parameter data type | Notes | 
| --- | --- | 
| String | - | 
| Integer | - | 
| Double | - | 
| Boolean | Possible values are true and false. Generates a check box on the Create a workflow from <blueprint> page on the AWS Glue console. | 
| S3Uri | Complete Amazon S3 path, beginning with s3://. Generates a text field and Browse button on the Create a workflow from <blueprint> page. | 
| S3Bucket | Amazon S3 bucket name only. Generates a bucket picker on the Create a workflow from <blueprint> page. | 
| IAMRoleArn | Amazon Resource Name (ARN) of an AWS Identity and Access Management (IAM) role. Generates a role picker on the Create a workflow from <blueprint> page. | 
| IAMRoleName | Name of an IAM role. Generates a role picker on the Create a workflow from <blueprint> page. | 