

# Migrating big data frameworks with AWS Schema Conversion Tool
<a name="CHAP-migrating-big-data"></a>

You can use the AWS Schema Conversion Tool (AWS SCT) to migrate big data frameworks to the AWS Cloud.

Currently, AWS SCT supports the migration of Hadoop clusters to Amazon EMR and Amazon S3. This migration process includes Hive and HDFS services.

Also, you can use AWS SCT to automate the conversion of your Apache Oozie orchestration workflows to AWS Step Functions.

**Topics**
+ [Migrating Hadoop workloads to Amazon EMR with AWS Schema Conversion Tool](big-data-hadoop.md)
+ [Converting Oozie workflowas to AWS Step Functions with AWS Schema Conversion Tool](big-data-oozie.md)

# Migrating Hadoop workloads to Amazon EMR with AWS Schema Conversion Tool
<a name="big-data-hadoop"></a>

To migrate Apache Hadoop clusters, make sure that you use AWS SCT version 1.0.670 or higher. Also, familiarize yourself with the command line interface (CLI) of AWS SCT. For more information, see [CLI Reference for AWS Schema Conversion Tool](CHAP_Reference.md).

**Topics**
+ [Migration overview](#big-data-hadoop-migration-overview)
+ [Step 1: Connect to your Hadoop clusters](#big-data-hadoop-connect-to-databases)
+ [Step 2: Set up the mapping rules](#big-data-hadoop-mapping-rules)
+ [Step 3: Create an assessment report](#big-data-hadoop-assessment-report)
+ [Step 4: Migrate your Apache Hadoop cluster to Amazon EMR with AWS SCT](#big-data-hadoop-migrate)
+ [Running your CLI script](#big-data-hadoop-run-migration)
+ [Managing your big data migration project](#big-data-hadoop-manage-project)

## Migration overview
<a name="big-data-hadoop-migration-overview"></a>

The following image shows the architecture diagram of the migration from Apache Hadoop to Amazon EMR.

![\[The architecture diagram of the Hadoop migration\]](http://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/images/hadoop-migration-architecture-diagram.png)


AWS SCT migrates data and metadata from your source Hadoop cluster to an Amazon S3 bucket. Next, AWS SCT uses your source Hive metadata to create database objects in the target Amazon EMR Hive service. Optionally, you can configure Hive to use the AWS Glue Data Catalog as its metastore. In this case, AWS SCT migrates your source Hive metadata to the AWS Glue Data Catalog.

Then, you can use AWS SCT to migrate the data from an Amazon S3 bucket to your target Amazon EMR HDFS service. Alternatively, you can leave the data in your Amazon S3 bucket and use it as a data repository for your Hadoop workloads.

To start the Hapood migration, you create and run your AWS SCT CLI script. This script includes the complete set of commands to run the migration. You can download and edit a template of the Hadoop migration script. For more information, see [Getting CLI scenarios](CHAP_Reference.md#CHAP_Reference.Scenario).

Make sure that your script includes the following steps so that you can run your migration from Apache Hadoop to Amazon S3 and Amazon EMR.

## Step 1: Connect to your Hadoop clusters
<a name="big-data-hadoop-connect-to-databases"></a>

To start the migration of your Apache Hadoop cluster, create a new AWS SCT project. Next, connect to your source and target clusters. Make sure that you create and provision your target AWS resources before you start the migration.

In this step, you use the following AWS SCT CLI commands.
+ `CreateProject` – to create a new AWS SCT project.
+ `AddSourceCluster` – to connect to the source Hadoop cluster in your AWS SCT project.
+ `AddSourceClusterHive` – to connect to the source Hive service in your project.
+ `AddSourceClusterHDFS` – to connect to the source HDFS service in your project.
+ `AddTargetCluster` – to connect to the target Amazon EMR cluster in your project.
+ `AddTargetClusterS3` – to add the Amazon S3 bucket to your project.
+ `AddTargetClusterHive` – to connect to the target Hive service in your project
+ `AddTargetClusterHDFS` – to connect to the target HDFS service in your project

For examples of using these AWS SCT CLI commands, see [Connecting to Apache Hadoop](CHAP_Source.Hadoop.md).

When you run the command that connects to a source or target cluster, AWS SCT tries to establish the connection to this cluster. If the connection attempt fails, then AWS SCT stops running the commands from your CLI script and displays an error message.

## Step 2: Set up the mapping rules
<a name="big-data-hadoop-mapping-rules"></a>

After you connect to your source and target clusters, set up the mapping rules. A mapping rule defines the migration target for a source cluster. Make sure that you set up mapping rules for all source clusters that you added in your AWS SCT project. For more information about mapping rules, see [Mapping data types in the AWS Schema Conversion Tool](CHAP_Mapping.md).

In this step, you use the `AddServerMapping` command. This command uses two parameters, which define the source and target clusters. You can use the `AddServerMapping` command with the explicit path to your database objects or with an object names. For the first option, you include the type of the object and its name. For the second option, you include only the object names.
+ `sourceTreePath` – the explicit path to your source database objects.

  `targetTreePath` – the explicit path to your target database objects.
+ `sourceNamePath` – the path that includes only the names of your source objects.

  `targetNamePath` – the path that includes only the names of your target objects.

The following code example creates a mapping rule using explicit paths for the source `testdb` Hive database and the target EMR cluster.

```
AddServerMapping
	-sourceTreePath: 'Clusters.HADOOP_SOURCE.HIVE_SOURCE.Databases.testdb'
	-targetTreePath: 'Clusters.HADOOP_TARGET.HIVE_TARGET'
/
```

You can use this example and the following examples in Windows. To run the CLI commands in Linux, make sure that you updated the file paths appropriately for your operating system.

The following code example creates a mapping rule using the paths that include only the object names.

```
AddServerMapping
	-sourceNamePath: 'HADOOP_SOURCE.HIVE_SOURCE.testdb'
	-targetNamePath: 'HADOOP_TARGET.HIVE_TARGET'
/
```

You can choose Amazon EMR or Amazon S3 as a target for your source object. For each source object, you can choose only one target in a single AWS SCT project. To change the migration target for a source object, delete the existing mapping rule and then create a new mapping rule. To delete a mapping rule, use the `DeleteServerMapping` command. This command uses one of the two following parameters.
+ `sourceTreePath` – the explicit path to your source database objects.
+ `sourceNamePath` – the path that includes only the names of your source objects.

For more information about the `AddServerMapping` and `DeleteServerMapping` commands, see the [AWS Schema Conversion Tool CLI Reference](https://s3.amazonaws.com/publicsctdownload/AWS+SCT+CLI+Reference.pdf).

## Step 3: Create an assessment report
<a name="big-data-hadoop-assessment-report"></a>

Before you start the migration, we recommend to create an assessment report. This report summarizes all of the migration tasks and details the action items that will emerge during the migration. To make sure that your migration doesn't fail, view this report and address the action items before the migration. For more information, see [Assessment report](CHAP_AssessmentReport.md).

In this step, you use the `CreateMigrationReport` command. This command uses two parameters. The `treePath` parameter is mandatory, and the `forceMigrate` parameter is optional.
+ `treePath` – the explicit path to your source database objects for which you save a copy of the assessment report.
+ `forceMigrate` – when set to `true`, AWS SCT continues the migration even if your project includes an HDFS folder and Hive table that refer to the same object. The default value is `false`.

You can then save a copy of the assessment report as a PDF or comma-separated value (CSV) files. To do so, use the `SaveReportPDF` or `SaveReportCSV` command.

The `SaveReportPDF` command saves a copy of your assessment report as a PDF file. This command uses four parameters. The `file` parameter is mandatory, other parameters are optional.
+ `file` – the path to the PDF file and its name.
+ `filter` – the name of the filter that you created before to define the scope of your source objects to migrate.
+ `treePath` – the explicit path to your source database objects for which you save a copy of the assessment report.
+ `namePath` – the path that includes only the names of your target objects for which you save a copy of the assessment report.

The `SaveReportCSV` command saves your assessment report in three CSV files. This command uses four parameters. The `directory` parameter is mandatory, other parameters are optional.
+ `directory` – the path to the folder where AWS SCT saves the CSV files.
+ `filter` – the name of the filter that you created before to define the scope of your source objects to migrate.
+ `treePath` – the explicit path to your source database objects for which you save a copy of the assessment report.
+ `namePath` – the path that includes only the names of your target objects for which you save a copy of the assessment report.

The following code example saves a copy of the assessment report in the `c:\sct\ar.pdf` file.

```
SaveReportPDF
	-file:'c:\sct\ar.pdf'
/
```

The following code example saves a copy of the assessment report as CSV files in the `c:\sct` folder.

```
SaveReportCSV
	-file:'c:\sct'
/
```

For more information about the `SaveReportPDF` and `SaveReportCSV` commands, see the [AWS Schema Conversion Tool CLI Reference](https://s3.amazonaws.com/publicsctdownload/AWS+SCT+CLI+Reference.pdf).

## Step 4: Migrate your Apache Hadoop cluster to Amazon EMR with AWS SCT
<a name="big-data-hadoop-migrate"></a>

After you configure your AWS SCT project, start the migration of your on-premises Apache Hadoop cluster to the AWS Cloud.

In this step, you use the `Migrate`, `MigrationStatus`, and `ResumeMigration` commands.

The `Migrate` command migrates your source objects to the target cluster. This command uses four parameters. Make sure that you specify the `filter` or `treePath` parameter. Other parameters are optional.
+ `filter` – the name of the filter that you created before to define the scope of your source objects to migrate.
+ `treePath` – the explicit path to your source database objects for which you save a copy of the assessment report.
+ `forceLoad` – when set to `true`, AWS SCT automatically loads database metadata trees during migration. The default value is `false`.
+ `forceMigrate` – when set to `true`, AWS SCT continues the migration even if your project includes an HDFS folder and Hive table that refer to the same object. The default value is `false`.

The `MigrationStatus` command returns the information about the migration progress. To run this command, enter the name of your migration project for the `name` parameter. You specified this name in the `CreateProject` command.

The `ResumeMigration` command resumes the interrupted migration that you launched using the `Migrate` command. The `ResumeMigration` command doesn't use parameters. To resume the migration, you must connect to your source and target clusters. For more information, see [Managing your migration project](#big-data-hadoop-manage-project).

The following code example migrates data from your source HDFS service to Amazon EMR.

```
Migrate
	-treePath: 'Clusters.HADOOP_SOURCE.HDFS_SOURCE'
	-forceMigrate: 'true'
/
```

## Running your CLI script
<a name="big-data-hadoop-run-migration"></a>

After you finish editing your AWS SCT CLI script, save it as a file with the `.scts` extension. Now, you can run your script from the `app` folder of your AWS SCT installation path. To do so, use the following command.

```
RunSCTBatch.cmd --pathtoscts "C:\script_path\hadoop.scts"
```

In the preceding example, replace *script\$1path* with the path to your file with the CLI script. For more information about running CLI scripts in AWS SCT, see [Script mode](CHAP_Reference.md#CHAP_Reference.ScriptMode).

## Managing your big data migration project
<a name="big-data-hadoop-manage-project"></a>

After you complete the migration, you can save and edit your AWS SCT project for the future use.

To save your AWS SCT project, use the `SaveProject` command. This command doesn't use parameters.

The following code example saves your AWS SCT project.

```
SaveProject
/
```

To open your AWS SCT project, use the `OpenProject` command. This command uses one mandatory parameter. For the `file` parameter, enter the path to your AWS SCT project file and its name. You specified the project name in the `CreateProject` command. Make sure that you add the `.scts` extension to the name of your project file to run the `OpenProject` command.

The following code example opens the `hadoop_emr` project from the `c:\sct` folder.

```
OpenProject
	-file: 'c:\sct\hadoop_emr.scts'
/
```

After you open your AWS SCT project, you don't need to add the source and target clusters because you have already added them to your project. To start working with your source and target clusters, you must connect to them. To do so, you use the `ConnectSourceCluster` and `ConnectTargetCluster` commands. These commands use the same parameters as the `AddSourceCluster` and `AddTargetCluster` commands. You can edit your CLI script and replace the name of these commands leaving the list of parameters without changes.

The following code example connects to the source Hadoop cluster.

```
ConnectSourceCluster
    -name: 'HADOOP_SOURCE'
    -vendor: 'HADOOP'
    -host: 'hadoop_address'
    -port: '22'
    -user: 'hadoop_user'
    -password: 'hadoop_password'
    -useSSL: 'true'
    -privateKeyPath: 'c:\path\name.pem'
    -passPhrase: 'hadoop_passphrase'
/
```

The following code example connects to the target Amazon EMR cluster.

```
ConnectTargetCluster
	-name: 'HADOOP_TARGET'
	-vendor: 'AMAZON_EMR'
	-host: 'ec2-44-44-55-66.eu-west-1.EXAMPLE.amazonaws.com'
	-port: '22'
	-user: 'emr_user'
	-password: 'emr_password'
	-useSSL: 'true'
	-privateKeyPath: 'c:\path\name.pem'
	-passPhrase: '1234567890abcdef0!'
	-s3Name: 'S3_TARGET'
	-accessKey: 'AKIAIOSFODNN7EXAMPLE'
	-secretKey: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
	-region: 'eu-west-1'
	-s3Path: 'doc-example-bucket/example-folder'
/
```

In the preceding example, replace *hadoop\$1address* with the IP address of your Hadoop cluster. If needed, configure the value of the port variable. Next, replace *hadoop\$1user* and *hadoop\$1password* with the name of your Hadoop user and the password for this user. For *path\$1name*, enter the name and path to the PEM file for your source Hadoop cluster. For more information about adding your source and target clusters, see [Connecting to Apache Hadoop databases with the AWS Schema Conversion Tool](CHAP_Source.Hadoop.md).

After you connect to your source and target Hadoop clusters, you must connect to your Hive and HDFS services, as well as to your Amazon S3 bucket. To do so, you use the `ConnectSourceClusterHive`, `ConnectSourceClusterHdfs`, `ConnectTargetClusterHive`, `ConnectTargetClusterHdfs`, and `ConnectTargetClusterS3` commands. These commands use the same parameters as the commands that you used to add Hive and HDFS services, and the Amazon S3 bucket to your project. Edit the CLI script to replace the `Add` prefix with `Connect` in the command names.

# Converting Oozie workflowas to AWS Step Functions with AWS Schema Conversion Tool
<a name="big-data-oozie"></a>

To convert Apache Oozie workflows, make sure that you use AWS SCT version 1.0.671 or higher. Also, familiarize yourself with the command line interface (CLI) of AWS SCT. For more information, see [CLI Reference for AWS Schema Conversion Tool](CHAP_Reference.md).

**Topics**
+ [Conversion overview](#big-data-oozie-overview)
+ [Step 1: Connect to your source and target services](#big-data-oozie-connect-to-databases)
+ [Step 2: Set up the mapping rules](#big-data-oozie-mapping-rules)
+ [Step 3: Configure parameters](#big-data-oozie-configure-parameters)
+ [Step 4: Create an assessment report](#big-data-oozie-assessment-report)
+ [Step 5: Convert your Apache Oozie workflows to AWS Step Functions with AWS SCT](#big-data-oozie-migrate)
+ [Running your CLI script](#big-data-oozie-run-migration)
+ [Apache Oozie nodes that AWS SCT can convert to AWS Step Functions](#big-data-oozie-supported-nodes)

## Conversion overview
<a name="big-data-oozie-overview"></a>

Your Apache Oozie source code includes action nodes, control flow nodes, and job properties. Action nodes define the jobs, which you run in your Apache Oozie workflow. When you use Apache Oozie to orchestrate your Apache Hadoop cluster, then an action node includes a Hadoop job. Control flow nodes provide a mechanism to control the workflow path. The control flow nodes include such nodes as `start`, `end`, `decision`, `fork`, and `join`.

AWS SCT converts your source action nodes and control flow nodes to AWS Step Functions. In AWS Step Functions, you define your workflows in the Amazon States Language (ASL). AWS SCT uses ASL to define your state machine, which is a collection of states, that can do work, determine which states to transition to next, stop with an error, and so on. Next, AWS SCT uploads the JSON files with state machines definitions. Then, AWS SCT can use your AWS Identity and Access Management (IAM) role to configure your state machines in AWS Step Functions. For more information, see [What is AWS Step Functions?](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html) in the *AWS Step Functions Developer Guide*.

Also, AWS SCT creates an extension pack with AWS Lambda functions which emulate the source functions that AWS Step Functions doesn't support. For more information, see [Using extension packs with AWS Schema Conversion Tool](CHAP_ExtensionPack.md).

AWS SCT migrates your source job properties to AWS Systems Manager. To store parameter names and values, AWS SCT uses Parameter Store, a capability of AWS Systems Manager. For more information, see [What is AWS Systems Manager?](https://docs.aws.amazon.com/systems-manager/latest/userguide/what-is-systems-manager.html) in the *AWS Systems Manager User Guide*.

You can use AWS SCT to automatically update the values and the names of your parameters. Because of the architecture differences between Apache Oozie and AWS Step Functions, you might need to configure your parameters. AWS SCT can find a specified parameter name or value in your source files and replace them with new values. For more information, see [Step 3: Configure parameters](#big-data-oozie-configure-parameters).

The following image shows the architecture diagram of the Apache Oozie conversion to AWS Step Functions.

![\[The architecture diagram of the Apache Oozie conversion to AWS Step Functions.\]](http://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/images/aws-sct-oozie-conversion-architecture-diagram.png)


To start the conversion, create and run your AWS SCT CLI script. This script includes the complete set of commands to run the conversion. You can download and edit a template of the Apache Oozie conversion script. For more information, see [Getting CLI scenarios](CHAP_Reference.md#CHAP_Reference.Scenario).

Make sure that your script includes the following steps.

## Step 1: Connect to your source and target services
<a name="big-data-oozie-connect-to-databases"></a>

To start the conversion of your Apache Oozie cluster, create a new AWS SCT project. Next, connect to your source and target services. Make sure that you create and provision your target AWS resources before you start the migration. For more information, see [Prerequisites for using Apache Oozie as a source](CHAP_Source.Oozie.md#CHAP_Source.Oozie.Prerequisites).

In this step, you use the following AWS SCT CLI commands.
+ `CreateProject` – to create a new AWS SCT project.
+ `AddSource` – to add your source Apache Oozie files in your AWS SCT project.
+ `ConnectSource` – to connect to Apache Oozie as a source.
+ `AddTarget` – to add AWS Step Functions as a migration target in your project.
+ `ConnectTarget` – to connect to AWS Step Functions.

For examples of using these AWS SCT CLI commands, see [Connecting to Apache Oozie](CHAP_Source.Oozie.md).

When you run the `ConnectSource` or `ConnectTarget` commands, AWS SCT tries to establish the connection to your services. If the connection attempt fails, then AWS SCT stops running the commands from your CLI script and displays an error message.

## Step 2: Set up the mapping rules
<a name="big-data-oozie-mapping-rules"></a>

After you connect to your source and target services, set up the mapping rules. A mapping rule defines the migration target for your source Apache Oozie workflows and parameters. For more information about mapping rules, see [Mapping data types in the AWS Schema Conversion Tool](CHAP_Mapping.md).

To define source and target objects for conversion, use the `AddServerMapping` command. This command uses two parameters: `sourceTreePath` and `targetTreePath`. The values of these parameters include an explicit path to your source and target objects. For Apache Oozie to AWS Step Functions conversion, these parameters must start with `ETL`.

The following code example creates a mapping rule for `OOZIE` and `AWS_STEP_FUNCTIONS` objects. You added these objects to your AWS SCT project using `AddSource` and `AddTarget` commands in the previous step.

```
AddServerMapping
    -sourceTreePath: 'ETL.APACHE_OOZIE'
    -targetTreePath: 'ETL.AWS_STEP_FUNCTIONS'
/
```

For more information about the `AddServerMapping` command, see the [AWS Schema Conversion Tool CLI Reference](https://s3.amazonaws.com/publicsctdownload/AWS+SCT+CLI+Reference.pdf).

## Step 3: Configure parameters
<a name="big-data-oozie-configure-parameters"></a>

If your source Apache Oozie workflows use parameters, you might need to change their values after the conversion to AWS Step Functions. Also, you might need to add new parameters to use with your AWS Step Functions.

In this step, you use the `AddParameterMapping` and `AddTargetParameter` commands.

To replace the parameter values in your source files, use the `AddParameterMapping` command. AWS SCT scans your source files, finds the parameters by name or value, and changes their values. You can run a single command to scan all your source files. You define the scope of files to scan using one of the first three parameters from the following list. This command uses up to six parameters.
+ `filterName` – the name of the filter for your source objects. You can create a filter using the `CreateFilter` command.
+ `treePath` – the explicit path to your source objects.
+ `namePath` – the explicit path to a specific source object.
+ `sourceParameterName` – the name of your source parameter.
+ `sourceValue` – the value of your source parameter.
+ `targetValue` – the value of your target parameter.

The following code example replaces all parameters where the value is equal to `c:\oozie\hive.py` with the `s3://bucket-oozie/hive.py` value.

```
AddParameterMapping
	-treePath: 'ETL.OOZIE.Applications'
	-sourceValue: 'c:\oozie\hive.py'
	-targetValue: 's3://bucket-oozie/hive.py'
/
```

The following code example replaces all parameters where the name is equal to `nameNode` with the `hdfs://ip-111-222-33-44.eu-west-1.compute.internal:8020` value.

```
AddParameterMapping
    -treePath: 'ETL.OOZIE_SOURCE.Applications'
    -sourceParameter: 'nameNode'
    -targetValue: 'hdfs://ip-111-222-33-44.eu-west-1.compute.internal:8020'
/
```

The following code example replaces all parameters where the name is equal to `nameNode` and the value is equal to `hdfs://ip-55.eu-west-1.compute.internal:8020` with the value from the `targetValue` parameter.

```
AddParameterMapping
    -treePath: 'ETL.OOZIE_SOURCE.Applications'
    -sourceParameter: 'nameNode'
    -sourceValue: 'hdfs://ip-55-66-77-88.eu-west-1.compute.internal:8020'
    -targetValue: 'hdfs://ip-111-222-33-44.eu-west-1.compute.internal:8020'
/
```

To add a new parameter in your target files in addition to an existing parameter from your source files, use the `AddTargetParameter` command. This command uses the same set of parameters as the `AddParameterMapping` command.

The following code example adds the `clusterId` target parameter instead of the `nameNode` parameter.

```
AddTargetParameter
    -treePath: 'ETL.OOZIE_SOURCE.Applications'
    -sourceParameter: 'nameNode'
    -sourceValue: 'hdfs://ip-55-66-77-88.eu-west-1.compute.internal:8020'
    -targetParameter: 'clusterId'
    -targetValue: '1234567890abcdef0'
/
```

For more information about the `AddServerMapping`, `AddParameterMapping`, `AddTargetParameter`, and `CreateFilter` commands, see the [AWS Schema Conversion Tool CLI Reference](https://s3.amazonaws.com/publicsctdownload/AWS+SCT+CLI+Reference.pdf).

## Step 4: Create an assessment report
<a name="big-data-oozie-assessment-report"></a>

Before you start the conversion, we recommend to create an assessment report. This report summarizes all of the migration tasks and details the action items that will emerge during the migration. To make sure that your migration doesn't fail, view this report and address the action items before the migration. For more information, see [Assessment report](CHAP_AssessmentReport.md).

In this step, you use the `CreateReport` command. This command uses two parameters. The first parameter describes the source objects for which AWS SCT creates an assessment report. To do so, use one of the following parameters: `filterName`, `treePath`, or `namePath`. This parameter is mandatory. Also, you can add an optional Boolean parameter `forceLoad`. If you set this parameter to `true`, then AWS SCT automatically loads all child objects for the source object that you specify in the `CreateReport` command.

The following code example creates an assessment report for the `Applications` node of your source Oozie files.

```
CreateReport
    -treePath: 'ETL.APACHE_OOZIE.Applications'
/
```

You can then save a copy of the assessment report as a PDF or comma-separated value (CSV) files. To do so, use the `SaveReportPDF` or `SaveReportCSV` command.

The `SaveReportPDF` command saves a copy of your assessment report as a PDF file. This command uses four parameters. The `file` parameter is mandatory, other parameters are optional.
+ `file` – the path to the PDF file and its name.
+ `filter` – the name of the filter that you created before to define the scope of your source objects to migrate.
+ `treePath` – the explicit path to your source database objects for which you save a copy of the assessment report.
+ `namePath` – the path that includes only the names of your target objects for which you save a copy of the assessment report.

The `SaveReportCSV` command saves your assessment report in CSV files. This command uses four parameters. The `directory` parameter is mandatory, other parameters are optional.
+ `directory` – the path to the folder where AWS SCT saves the CSV files.
+ `filter` – the name of the filter that you created before to define the scope of your source objects to migrate.
+ `treePath` – the explicit path to your source database objects for which you save a copy of the assessment report.
+ `namePath` – the path that includes only the names of your target objects for which you save a copy of the assessment report.

The following code example saves a copy of the assessment report in the `c:\sct\ar.pdf` file.

```
SaveReportPDF
	-file:'c:\sct\ar.pdf'
/
```

The following code example saves a copy of the assessment report as CSV files in the `c:\sct` folder.

```
SaveReportCSV
	-file:'c:\sct'
/
```

For more information about the `CreateReport`, `SaveReportPDF` and `SaveReportCSV` commands, see the [AWS Schema Conversion Tool CLI Reference](https://s3.amazonaws.com/publicsctdownload/AWS+SCT+CLI+Reference.pdf).

## Step 5: Convert your Apache Oozie workflows to AWS Step Functions with AWS SCT
<a name="big-data-oozie-migrate"></a>

After you configure your AWS SCT project, convert your source code and apply it to the AWS Cloud.

In this step, you use the `Convert`, `SaveOnS3`, `ConfigureStateMachine`, and `ApplyToTarget` commands.

The `Migrate` command migrates your source objects to the target cluster. This command uses four parameters. Make sure that you specify the `filter` or `treePath` parameter. Other parameters are optional.
+ `filter` – the name of the filter that you created before to define the scope of your source objects to migrate.
+ `namePath` – the explicit path to a specific source object.
+ `treePath` – the explicit path to your source database objects for which you save a copy of the assessment report.
+ `forceLoad` – when set to `true`, AWS SCT automatically loads database metadata trees during migration. The default value is `false`.

The following code example converts files from the `Applications` folder in your source Oozie files.

```
Convert
    -treePath: 'ETL.APACHE_OOZIE.Applications'
/
```

The `SaveOnS3` uploads the state machines definitions to your Amazon S3 bucket. This command uses the `treePath` parameter. To run this command, use the target folder with state machines definitions as the value of this parameter.

The following uploads the `State machine definitions` folder of your `AWS_STEP_FUNCTIONS` target object to the Amazon S3 bucket. AWS SCT uses the Amazon S3 bucket that you stored in the AWS service profile in the [Prerequisites](CHAP_Source.Oozie.md#CHAP_Source.Oozie.Prerequisites) step.

```
SaveOnS3
    -treePath: 'ETL.AWS_STEP_FUNCTIONS.State machine definitions'
/
```

The `ConfigureStateMachine` command configures state machines. This command uses up to six parameters. Make sure that you define the target scope using one of the first three parameters from the following list.
+ `filterName` – the name of the filter for your target objects. You can create a filter using the `CreateFilter` command.
+ `treePath` – the explicit path to your target objects.
+ `namePath` – the explicit path to a specific target object.
+ `iamRole` – the Amazon Resource Name (ARN) of the IAM role that provides access to your step machines. This parameter is required.

The following code example configures state machines defined in `AWS_STEP_FUNCTIONS` using the *role\$1name* IAM role.

```
ConfigureStateMachine
    -treePath: 'ETL.AWS_STEP_FUNCTIONS.State machine definitions'
    -role: 'arn:aws:iam::555555555555:role/role_name'
/
```

The `ApplyToTarget` command applies your converted code to the target server. To run this command, use one of the following parameters: `filterName`, `treePath`, or `namePath` to define the target objects to apply.

The following code example applies the `app_wp` state machine to AWS Step Functions.

```
ApplyToTarget
    -treePath: 'ETL.AWS_STEP_FUNCTIONS.State machines.app_wp'
/
```

To make sure that your converted code produces the same results as your source code, you can use the AWS SCT extension pack. This is a set of AWS Lambda functions which emulate your Apache Oozie functions that AWS Step Functions doesn't support. To install this extension pack, you can use the `CreateLambdaExtPack` command.

This command uses up to five parameters. Make sure that you use **Oozie2SF** for `extPackId`. In this case, AWS SCT creates an extension pack for source Apache Oozie functions.
+ `extPackId` – the unique identifier for a set of Lambda functions. This parameter is required.
+ `tempDirectory` – the path where AWS SCT can store temporary files. This parameter is required.
+ `awsProfile` – the name of your AWS profile.
+ `lambdaExecRoles` – the list of Amazon Resource Names (ARNs) of the execution roles to use for Lambda functions.
+ `createInvokeRoleFlag` – the Boolean flag that indicates whether to create an execution role for AWS Step Functions.

To install and use the extension pack, make sure that you provide the required permissions. For more information, see [Permissions for using AWS Lambda functions in the extension pack](CHAP_Source.Oozie.md#CHAP_Source.Oozie.TargetPrerequisites).

For more information about the `Convert`, `SaveOnS3`, `ConfigureStateMachine`, `ApplyToTarget`, and `CreateLambdaExtPack` commands, see the [AWS Schema Conversion Tool CLI Reference](https://s3.amazonaws.com/publicsctdownload/AWS+SCT+CLI+Reference.pdf).

## Running your CLI script
<a name="big-data-oozie-run-migration"></a>

After you finish editing your AWS SCT CLI script, save it as a file with the `.scts` extension. Now, you can run your script from the `app` folder of your AWS SCT installation path. To do so, use the following command.

```
RunSCTBatch.cmd --pathtoscts "C:\script_path\oozie.scts"
```

In the preceding example, replace *script\$1path* with the path to your file with the CLI script. For more information about running CLI scripts in AWS SCT, see [Script mode](CHAP_Reference.md#CHAP_Reference.ScriptMode).

## Apache Oozie nodes that AWS SCT can convert to AWS Step Functions
<a name="big-data-oozie-supported-nodes"></a>

You can use AWS SCT to convert Apache Oozie action nodes and control flow nodes to AWS Step Functions.

Supported action nodes include the following:
+ Hive action
+ Hive2 action
+ Spark action
+ MapReduce Streaming action
+ Java action
+ DistCp action
+ Pig action
+ Sqoop action
+ FS action
+ Shell action

Supported control flow nodes include the following:
+ Start action
+ End action
+ Kill action
+ Decision action
+ Fork action
+ Join action