

# Provisioned Workflows
Provisioned

Amazon SageMaker Unified Studio supports provisioned workflows powered by [Amazon MWAA](https://docs.aws.amazon.com/mwaa/latest/userguide/what-is-mwaa.html). With provisioned workflows, you can create, schedule, and monitor workflows using Apache Airflow and Python without managing the underlying infrastructure for scalability, availability, and security.

Provisioned workflows in Amazon SageMaker Unified Studio provide the following capabilities:
+ Automatic scaling of Apache Airflow workers to meet workflow demands, up to the maximum limits you define.
+ Python DAG support with AWS and custom operators for orchestrating notebooks, querybooks, and data processing jobs.
+ Workflow monitoring through Apache Airflow logs and metrics in Amazon CloudWatch.
+ Built-in access to AWS services through open source Apache Airflow operators.
+ Direct access to the Apache Airflow web interface for workflow management and monitoring.

**Note**  
Provisioned workflows are available in Amazon SageMaker Unified Studio projects created with the **All capabilities** project profile.

Provisioned workflows run in a shared environment that all project members can access. To share your workflows with other users, commit the file defining your workflow and sync the workflow with the shared environment.

# Workflow environments in Amazon SageMaker Unified Studio
Workflow environments

Use a shared workflow environment to share workflows with other project members. Workflow environments must be created by project owners. To update or delete a workflow environment, you must be an owner of the project that the workflow environment is in. After a workflow environment has been created by a project owner, any project member can sync their files to share them in the environment.

Only one workflow environment can exist in a project at a time.
+ [Create a workflow environment](#create-workflow-environment)
+ [Update a workflow environment](#update-workflow-environment)
+ [Delete a workflow environment](#delete-workflow-environment)

## Create a workflow environment


To create a workflow environment, you must be an owner of the project that you want to create a workflow environment in.

To create a workflow environment, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to a project that was created with the **Data analytics and AI-ML model development** project profile. To do this, use the center menu at the top of the landing page and choose **Browse all projects**, then choose the name of the project that you want to navigate to.

1. In the center menu, choose **Compute**. This takes you to the Compute page.

1. On the **Workflow environments** tab, confirm that there are no workflow environments in the project yet. Then choose **Create**.

1. In the **Create workflow environment** window, review the parameters of the workflow environment. These are determined by your admin. If you want any of these parameters to change, contact your admin.

1. Choose **Create workflow environment**.

**Note**  
Workflow environment creation takes several minutes to complete.

## Update a workflow environment


To update a workflow environment, you must be an owner of the project that you want to update a workflow environment in.

To update a workflow environment, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project that contains the workflow environment that you want to update. To do this, use the center menu at the top of the landing page and choose **Browse all projects**, then choose the name of the project that you want to navigate to.

1. In the center menu, choose **Compute**. This takes you to the Compute page.

1. On the **Workflow environments** tab, expand the **Actions** menu and choose **Update**.

1. Choose **Update workflow environment**.

**Note**  
Updating a workflow environment takes several minutes to complete.

## Delete a workflow environment


To delete a workflow environment, you must be an owner of the project that contains the workflow environment that you want to delete.

To delete a workflow environment, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project that contains the workflow environment that you want to delete. To do this, use the center menu at the top of the landing page and choose **Browse all projects**, then choose the name of the project that you want to navigate to.

1. In the center menu, choose **Compute**. This takes you to the Compute page.

1. On the **Workflow environments** tab, expand the **Actions** menu and choose **Delete**.

1. Confirm the action by typing `confirm`, then choose **Delete workflow environment**.

**Note**  
Deleting a workflow environment takes several minutes to complete.

# Using visual workflows in Amazon SageMaker Unified Studio
Visual Workflows

With Amazon SageMaker Unified Studio visual workflows, you can create and orchestrate data processing workflows using an intuitive drag-and-drop interface without writing code. Visual workflows enable you to connect notebooks, queries, and data processing jobs in a graphical format, and create and manage schedules.
+ [Create a visual workflow in Amazon SageMaker Unified Studio](#provisioned-using-visual-workflows)
+ [View visual workflow details](#provisioned-view-visual-workflow-details)
+ [Run a visual workflow](#provisioned-run-visual-workflows)
+ [Edit visual workflows](#provisioned-edit-visual-workflows)
+ [View visual workflows code](#provisioned-view-visual-workflows)
+ [Clone and Delete visual workflows](#provisioned-clone-delete-visual-workflows)

## Create a visual workflow in Amazon SageMaker Unified Studio
Create a visual workflow

Use visual workflows to orchestrate data processing jobs, notebooks, and querybooks in your project repositories. With visual workflows, you can define a collection of tasks organized as a directed acyclic graph (DAG) that can run on a user-defined schedule.

### Prerequisites

+ Amazon SageMaker Unified Studio project created with the **All capabilities** project profile
+ Access to the **Workflows** page in your project

### Environment status


Use a shared workflow environment to share workflows with other project members. Workflow environments must be created by project owners. To update or delete a workflow environment, you must be an owner of the project that the workflow environment is in. After a workflow environment has been created by a project owner, any project member can sync their files to share them in the environment.


**Environment Statuses**  

| Status | Shared environment | 
| --- | --- | 
| Active | Active | 
| Missing | Missing | 
| Loading | Loading | 
| Creating | Creating | 
| Failed | Failed | 

### Create a workflow


To create a workflow, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials.

1. Navigate to a project that was created with the **All capabilities** project profile. You can do this by using the center menu at the top of the page and choosing **Browse all projects**, then choosing the name of the project that you want to navigate to.

1. In the **Build** menu, choose **Workflows**. This takes you to the Workflows page.  
![\[Screenshot of the Workflows page in the Build menu\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/visual-workflows/ScreenshotVisualWorkflow1.png)

1. Choose the **Create new workflow** button or in the **Create new workflow** dropdown menu, choose **Create in visual builder**. This takes you to the **Visual canvas workflow**.  
![\[Screenshot of the Create new workflow button and dropdown menu\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/visual-workflows/ScreenshotVisualWorkflow2.png)

1. Provide a name to your workflow.

1. Choose a task from one of the three tabs: **Data processing job**, **Querybook**, or **Notebook**. The selected task appears in the canvas. Configure the task by giving it a name and editing the prepopulated fields.  
![\[Screenshot of the visual canvas with a task selected and configuration fields\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/visual-workflows/ScreenshotVisualWorkflow3.png)

1. Choose the **Add task** icon (\$1) to add more tasks. You can drag the tasks to fit your workflow.

1. Complete the workflow by connecting the tasks. To connect the tasks, choose the **Add task** icon (\$1) of one task and connect it to the **Add task** icon (\$1) of another task. The arrows represent the execution order and data flow.  
![\[Screenshot of the visual canvas showing connected workflow tasks with arrows\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/visual-workflows/ScreenshotVisualWorkflow4.png)

1. After you create your workflow, you can configure its settings. Choose the **Settings** icon.

   1. In the **Workflow settings** tab you can:
      + Edit the Workflow name if the workflow has never been saved to a project.
      + Provide an optional description to the workflow.
      + Toggle the **Run on schedule** button and set the Schedule status to Active or Paused.
      + Choose an option from the Schedule dropdown menu to set a schedule for your workflow or specify a CRON expression in the **Start date and time in UTC** and **End date and time in UTC** fields below.

      Once the settings are set, choose **Apply** to save them.

   1. In the **Default parameters** tab, choose Add parameter and provide a name and a default value to the parameter and choose Apply to save them.

   1. In the **Tags** tab, choose Add tag to create an airflow tag to your workflow and provide a name to the tag, then choose Apply to save it. Airflow tags help in filtering the workflows. This step is optional.

1. Choose **Save to project** to save the current workflow to the project. If there are any validation errors, the notifications symbol next to the settings gear will show a number next to it which indicates the number of errors. You must fix them before you can successfully save the workflow to the project.  
![\[Screenshot of the Save to project button with validation error notifications\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/visual-workflows/ScreenshotVisualWorkflow5.png)

## View visual workflow details


After you create a visual workflow, it appears in a list on the Workflows page in Amazon SageMaker Unified Studio. On the Workflows page, you can see each workflow you created with the name you provided. Note that it might take up to 60 seconds for the workflow to appear in the list.

To view details about workflow runs and parameters, select the name of a workflow from the list on the Workflows page in Amazon SageMaker Unified Studio.
+ Choose **View Runs** to view the results of running the workflow. You can filter to show successful runs. This page shows information about the workflow run triggers, durations, and timeframes. There is also an **Actions** column where you can choose to stop a workflow if it is still running. There is a limit of 1000 rows on the **Runs** tab for a workflow.
+ To view more details about a run, choose the name of a run. This takes you to the run details panel with information about the tasks and parameters in the workflow. You can view which tasks were successfully completed. For workflows that run Python notebooks and not querybooks, you can view the output in the **Notebook output** tab. This can be useful for viewing tasks in more detail and troubleshooting if needed.
+ The **Default parameters** tab shows the default parameters outlined in the workflow code. To modify the parameters, navigate to the **Default parameters** tab from the **Settings** button. For more information about parameters, see [Params](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html) in the Apache Airflow documentation.
+ The **Definition** tab shows the code used for the workflow.
+ The **Tags** tab shows optional tags that are defined for the workflow. These are Airflow tags, not AWS tags. For more information, see [Add tags to DAGs and use it for filtering in the UI](https://airflow.apache.org/docs/apache-airflow/stable/howto/add-dag-tags.html) in the Apache Airflow documentation.

To view details about workflow runs and parameters, select the name of a workflow from the list on the Workflows page in Amazon SageMaker Unified Studio.

## Run a visual workflow


To run a workflow, select a workflow from the Workflows page list. Choose **Run**. You can then choose one of the following two options:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials.

1. Navigate to the project that was created with the **All capabilities** project profile. To do this, use the center menu at the top of the landing page and choose **Browse all projects**, then choose the name of the project that you want to navigate to.

1. In the **Build** menu, choose **Workflows**. This takes you to the Workflows page.

1. Choose the name of the workflow to navigate to the workflow canvas.

1. Expand the **Run** menu, then choose one of the following options:
   + Run with default parameters. This option starts running the workflow using the parameters already defined in the DAG file. To review these parameters, see the **Default parameters** tab.
   + Run with custom parameters. This option opens a window where you can change the inputs for the parameters defined in the DAG file. Enter the variables you want to use, and then choose **Start run** to start running the workflow.

The workflow run then appears on the side panel. The workflow runs until it is complete or until you choose to stop it.

Running a workflow puts tasks together to orchestrate Amazon SageMaker Unified Studio artifacts. You can view multiple runs for a workflow by navigating to the Workflows page and choosing the name of a workflow from the workflows list table.

If you want to see more runs, you can view them using the Airflow UI. Navigate to the Workflows page, choose the three dots in the Action column for a workflow, then choose **Open Airflow UI**. This page displays charts and graphics about the workflow.

**Note**  
To open the Airflow UI, your browser should allow cross-site cookie sharing. If you receive an error message, check the cookie settings in your browser.

## Edit visual workflows


To edit a visual workflow, modify the tasks and workflows in the canvas.

## View visual workflows code


To view a visual workflow code, navigate to the workflow details page by selecting a workflow from the Workflows page list. Then choose the **Actions** dropdown menu and choose **View code**.

## Clone and Delete visual workflows


You can also clone and delete a visual workflow. Navigate to the workflow details page by selecting a workflow from the Workflows page list. Then choose the **Actions** dropdown menu and:
+ Choose **Clone workflow** to create a copy of your workflow.
+ Choose **Delete workflow** to delete the workflow.

**Note**  
Clone and delete workflow options are only available for visual workflows.

# Code workflows in Amazon SageMaker Unified Studio
Code workflows

Use workflows to orchestrate notebooks, querybooks, and more in your project repositories. With workflows, you can define a collection of tasks organized as a directed acyclic graph (DAG) that can run on a user-defined schedule.
+ [Create a code workflow in Amazon SageMaker Unified Studio](#create-workflow)
+ [View code workflow details](#workflows-review-details)
+ [Run a code workflow](#workflows-run)
+ [Share a code workflow with other project members in an Amazon SageMaker Unified Studio workflow environment](#sync-workflow-environment)

## Create a code workflow in Amazon SageMaker Unified Studio
Create a code workflow

### Create a code workflow


To create a code workflow, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to a project that was created with the **All capabilities** project profile. You can do this by using the center menu at the top of the page and choosing **Browse all projects**, then choosing the name of the project that you want to navigate to.

1. In the **Build** menu, choose **Workflows**. This takes you to the Workflows page.

1. Choose **Create workflow in editor**. This takes you to the Code page and opens a new notebook file in the `workflows/dags` folder of the JupyterLab file navigation. The file is prepopulated with a workflow definition template.

1. Update the file as desired to create your workflow.

   1. Update `WORKFLOW_SCHEDULE` to determine when the workflow will be scheduled to run.

   1. Update `NOTEBOOK_PATH` to point to the querybook or JupyterLab notebook that you want to run. For example, `'src/querybook.sqlnb'`.

   1. Update `dag_id` with an ID that you can identify later.

   1. Add tags and parameters, if desired. For more information, see [Params](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html) in the Apache Airflow documentation.

When you create a workflow, you are modifying the directed acyclic graph (DAG) within the Python file. A DAG defines a collection of tasks with their dependencies and relationships to show how they should run.

A DAG consists of the following:
+  A [DAG](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html) definition. The DAG ID will also be the name of the workflow. 
+ [Operators](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html) that describe how to run the DAG and the [tasks](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html) to run.
+ [Operator relationships](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html) that describe the order in which to run the tasks.

For more information about DAGs, see [DAGs](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#) in the Apache Airflow documentation. You can configure a DAG to run on a schedule or run it manually. 

You can include multiple DAGs to create multiple workflows. When you have included the DAGs you want to use, save the file in the `workflows/dag` folder in JupyterLab. There might be a slight delay before the workflow appears on the Workflows page.

### Sample code workflow


The following example shows a sample code workflow definition.

```
from airflow.decorators import dag
from airflow.utils.dates import days_ago
from workflows.airflow.providers.amazon.aws.operators.sagemaker_workflows \
    import NotebookOperator

###############################################################################
#
# Enter in your desired schedule as WORKFLOW_SCHEDULE.  Some options include:
#
# '@daily' (daily at midnight)
# '@hourly' (every hour, at the top of the hour)
# '30 */3 * * *' (a CRON string, run at minute 30 past every 3rd hour)
# '0 8 * * 1-5' (a CRON string, run every weekday at 8am)
#
###############################################################################

WORKFLOW_SCHEDULE = '@monthly'

###############################################################################
#
# Enter in the path to your artifacts. Example:
# 'src/example_notebook.ipynb'
#
###############################################################################

PROCESS_PATH = 'src/dataflows/airQualityToLakehouse.vetl'
QUERY_PATH = 'src/QueryBrooklynDataPutInS3.sqlnb'

default_args = {
    'owner': 'alexa',
}


@dag(
    dag_id='air-quality-process-and-query',
    default_args=default_args,
    schedule_interval=WORKFLOW_SCHEDULE,
    start_date=days_ago(2),
    is_paused_upon_creation=False,
    tags=['example-project', 'alexa'],
    catchup=False
)
def air_quality():
    def process_data():
        return NotebookOperator(
               task_id="process-data",
               input_config={'input_path': PROCESS_PATH, 'input_params': {}},
               output_config={'output_formats': ['NOTEBOOK']},
               wait_for_completion=True,
               poll_interval=5
        )

    def query_data():
        return NotebookOperator(
               task_id="query-data",
               input_config={'input_path': QUERY_PATH, 'input_params': {}},
               output_config={'output_formats': ['NOTEBOOK']},
               wait_for_completion=True,
               poll_interval=5
        )

    process_data() >> query_data()


air_quality = air_quality()
```

## View code workflow details


After you create a code workflow, you must share it with the project before it appears on the Workflows page. For more information, see [Share a code workflow with other project members in an Amazon SageMaker Unified Studio workflow environment](#sync-workflow-environment). On the Workflows page, all project members can see each shared workflow by the name defined using the DAG ID.

To view details about workflow runs and parameters, select the name of a workflow from the list on the Workflows page in Amazon SageMaker Unified Studio.
+ On the **Runs** tab, you can view the results of running the workflow. You can filter to show successful runs. This page shows information about the workflow run triggers, durations, and timeframes. There is also an **Actions** column where you can choose to stop a workflow if it is still running. There is a limit of 1000 rows on the **Runs** tab for a workflow.

  To view more details about a run, choose the name of a run. This takes you to the run details page, with information about the tasks and parameters in the workflow. You can view which tasks were successfully completed on the **Task log** tab. For workflows that run Python notebooks and not querybooks, you can view the output in the **Notebook output** tab. This can be useful for viewing tasks in more detail and troubleshooting if needed.
+ The **Default parameters** tab shows the default parameters outlined in the workflow code. To modify the parameters, choose **Edit code** and edit parameters. For more information about parameters, see [Params](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html) in the Apache Airflow documentation.
+ The **Definition** tab shows the code used for the workflow. This matches the code you wrote on the Code page. Choose **Edit code** to navigate back to the Code page and make changes.
+ The **Tags** tab shows optional tags that are defined in the workflow definition file. These are Airflow tags, not AWS tags. For more information, see [Add tags to DAGs and use it for filtering in the UI](https://airflow.apache.org/docs/apache-airflow/stable/howto/add-dag-tags.html) in the Apache Airflow documentation.

## Run a code workflow


To run a code workflow, navigate to the workflow details page by selecting a workflow from the Workflows page list. Then choose Run. You can then choose one of the following two options:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to a project that was created with the **All capabilities** project profile. To do this, use the center menu at the top of the landing page and choose **Browse all projects**, then choose the name of the project that you want to navigate to.

1. In the **Build** menu, choose **Workflows**. This takes you to the Workflows page.

1. Choose the name of a workflow to navigate to the workflow details page and choose **Run**. This will execute the workflow in the shared environment, allowing all team members to access and collaborate on the execution.

1. Choose the name of a workflow to navigate to the workflow details page.

1. Expand the **Run** menu, then choose one of the following options:
   + Run with default parameters. This option starts running the workflow using the parameters already defined in the DAG file. To review these parameters, see the **Default parameters** tab.
   + Run with custom parameters. This option opens a window where you can change the inputs for the parameters defined in the DAG file. Enter the variables you want to use, and then choose **Start run** to start running the workflow.

The workflow run then appears on the **Runs** tab of the workflow details page. The workflow runs until it is complete or until you choose to stop it.

Running a workflow puts tasks together to orchestrate Amazon SageMaker Unified Studio artifacts. You can view multiple runs for a workflow by navigating to the Workflows page and choosing the name of a workflow from the workflows list table.

If you want to see more runs, you can view them using the Airflow UI. Navigate to the Workflows page, choose the three dots in the Action column for a workflow, then choose **Open Airflow UI**. This page displays charts and graphics about the workflow.

**Note**  
To open the Airflow UI, your browser should allow cross-site cookie sharing. If you receive an error message, check the cookie settings in your browser.

## Share a code workflow with other project members in an Amazon SageMaker Unified Studio workflow environment
Share a code workflow

**For Git storage: **After a workflow environment has been created by a project owner, any project member can sync their files to share them in the environment. After you sync your files, all project members can view the workflows you have added in the workflow environment. Files that are not synced can only be viewed by the project member that created them.

**For S3 storage: **After a workflow environment has been created by a project owner, and once you’ve saved your workflows DAG files in JupyterLab, they are automatically synced to the project. After the files are synced, all project members can view the workflows you have added in the workflow environment. Files that are not synced can only be viewed by the project member that created them.

To share your workflows with other project members in a workflow environment, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to a project that was created with the **Data analytics and AI-ML model development** project profile. You can do this by using the center menu at the top of the page and choosing **Browse all projects**, then choosing the name of the project that you want to navigate to.

1. In the **Build** menu, choose **JupyterLab**.

1. Locate the workflow you want to share in the `workflows/dags` folder.

1. Choose the **Git** icon in the left navigation.

1. Choose the **\$1** icon next to the files you want to commit.

1. Enter a brief summary of the commit in the **Summary** text entry field.

1. (Optional) Enter a longer description of the commit in the **Description** text entry field.

1. Choose **Commit**.

1. Choose the **Push committed changes** icon to do a git push.

1. In the **Build** menu, choose **Workflows**. This takes you to the Workflows page.

1. On the **Shared environment** tab, choose **Sync files from project**.

1. Choose **Confirm**.