

# Migrate workloads from AWS Data Pipeline to Amazon MWAA
<a name="migrating-pipeline-workloads"></a>

AWS launched the AWS Data Pipeline service in 2012. At that time, customers wanted a service that let them use a variety of compute options to move data between different data sources. As data transfer needs changed over time, so have the solutions to those needs. You now have the option to choose the solution that most closely meets your business requirements. You can migrate your workloads to any of the following AWS services:
+ Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to manage workflow orchestration for Apache Airflow.
+ Use Step Functions to orchestrate workflows between multiple AWS services.
+ Use AWS Glue to run and orchestrate Apache Spark applications.

The option you choose depends on your current workload on AWS Data Pipeline. This topic explains how to migrate from AWS Data Pipeline to Amazon MWAA.

**Topics**
+ [Choosing Amazon MWAA](#migrating-pipeline-workloads-mwaa)
+ [Architecture and concept mapping](#migrating-pipeline-workloads-concept-mapping)
+ [Example implementations](#migrating-pipeline-workloads-examples)
+ [Pricing comparison](#migrating-pipeline-workloads-price-comparison)
+ [Related resources](#migrating-pipeline-workloads-resources)

## Choosing Amazon MWAA
<a name="migrating-pipeline-workloads-mwaa"></a>

 Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that lets you setup and operate end-to-end data pipelines in the cloud at scale. [Apache Airflow](https://airflow.apache.org/) is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as *workflows*. With Amazon MWAA, you can use Apache Airflow and the Python programming language to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Amazon MWAA automatically scales its workflow capacity to meet your needs, and is integrated with AWS security services to help provide you with fast and secure access to your data. 

The following highlights some of the benefits of migrating from AWS Data Pipeline to Amazon MWAA:
+ **Enhanced scalability and performance** – Amazon MWAA provides a flexible and scalable framework for defining and executing workflows. This allows users to handle large and complex workflows with ease, and take advantage of features such as dynamic task scheduling, data-driven workflows and parallelism.
+ **Improved monitoring and logging** – Amazon MWAA integrates with Amazon CloudWatch to enhance monitoring and logging of your workflows. Amazon MWAA automatically sends system metrics and logs to CloudWatch. This means you can track the progress and performance of your workflows in real-time, and identify any issues that arise.
+ **Better integrations with AWS services and third-party software** – Amazon MWAA integrates with a variety of other AWS services, such as Amazon S3, AWS Glue, and Amazon Redshift, as well as third-party software such as [DBT](https://www.getdbt.com/), [Snowflake](https://www.snowflake.com/en/), and [Databricks](https://www.databricks.com/). This lets you process, and transfer, data across different environments and services.
+ **Open-source data pipeline tool** – Amazon MWAA leverages the same open-source Apache Airflow product you are familiar with. Apache Airflow is a purpose-built tool designed to handle all aspects of data pipeline management, including ingestion, processing, transferring, integrity testing, quality checks, and ensuring data lineage.
+ **Modern and flexible architecture** – Amazon MWAA leverages containerization and cloud-native, serverless technologies. This means for more flexibility and portability, as well as easier deployment and management of your workflow environments.

## Architecture and concept mapping
<a name="migrating-pipeline-workloads-concept-mapping"></a>

 AWS Data Pipeline and Amazon MWAA have different architectures and components, which can affect the migration process and the way workflows are defined and executed. This section overviews architecture and components for both services, and highlights some of the key differences. 

 Both AWS Data Pipeline and Amazon MWAA are fully managed services. When you migrate your workloads to Amazon MWAA you might need to learn new concepts to model your existing workflows using Apache Airflow. However, you will not need to manage infrastructure, patch workers, and manage operating system updates. 

 The following table associates key concepts in AWS Data Pipeline with those in Amazon MWAA. Use this information as a starting point to design a migration plan. 

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/mwaa/latest/migrationguide/migrating-pipeline-workloads.html)

## Example implementations
<a name="migrating-pipeline-workloads-examples"></a>

 In many cases you will be able to re-use resources you are currently orchestrating with AWS Data Pipeline after migrating to Amazon MWAA. The following list contains example implementations using Amazon MWAA for the most common AWS Data Pipeline use cases.
+ [Running an Amazon EMR job](https://catalog.us-east-1.prod.workshops.aws/workshops/795e88bb-17e2-498f-82d1-2104f4824168/en-US/workshop-2-2-2/m1-processing/emr) (AWS workshop)
+ [Creating a custom plugin for Apache Hive and Hadoop](https://docs.aws.amazon.com/mwaa/latest/userguide/samples-hive.html) (*Amazon MWAA User Guide*)
+ [Copying data from S3 to Redshift](https://catalog.us-east-1.prod.workshops.aws/workshops/795e88bb-17e2-498f-82d1-2104f4824168/en-US/workshop-2-2-2/m1-processing/redshift) (AWS workshop)
+ [Executing a shell script on a remote Amazon ECS instance](https://docs.aws.amazon.com/mwaa/latest/userguide/samples-ssh.html) (*Amazon MWAA User Guide*)
+ [Orchestrating hybrid (on-prem) workflows](https://dev.to/aws/orchestrating-hybrid-workflows-using-amazon-managed-workflows-for-apache-airflow-mwaa-2boc) (Blog post)

 For additional tutorials and examples, refer to the following: 
+ [Amazon MWAA tutorials](https://docs.aws.amazon.com/mwaa/latest/userguide/tutorials.html)
+ [Amazon MWAA code examples](https://docs.aws.amazon.com/mwaa/latest/userguide/sample-code.html)

## Pricing comparison
<a name="migrating-pipeline-workloads-price-comparison"></a>

 Pricing for AWS Data Pipeline is based on the number of pipelines, as well as how much you use each pipeline. Activities that you run more than once a day (high frequency) cost \$11 per month per activity. Activities that you run once a day or less (low frequency) cost \$10.60 per month per activity. Inactive Pipelines are priced at \$11 per pipeline. For more information, refer to the [AWS Data Pipeline pricing](https://aws.amazon.com/datapipeline/pricing/) page. 

Pricing for Amazon MWAA is based on the duration of time that your managed Apache Airflow environment exists, and any additional auto scaling required to provide more workers, or scheduler capacity. You pay for your Amazon MWAA environment usage on an hourly basis (billed at one-second resolution), with varying fees depending on the size of the environment. Amazon MWAA auto-scales the number of workers based on your environment configuration. AWS calculates the cost of additional workers separately. For more information on the hourly cost of using various Amazon MWAA environment sizes, refer to the [Amazon MWAA pricing](https://aws.amazon.com/managed-workflows-for-apache-airflow/pricing/) page. 

## Related resources
<a name="migrating-pipeline-workloads-resources"></a>

 For more information and best practices for using Amazon MWAA, refer to the following resources: 
+ [The Amazon MWAA API reference](https://docs.aws.amazon.com/mwaa/latest/API/Welcome.html)
+ [Monitoring dashboards and alarms on Amazon MWAA](https://docs.aws.amazon.com/mwaa/latest/userguide/monitoring-dashboard.html)
+ [Performance tuning for Apache Airflow on Amazon MWAA](https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-tuning.html)