# Pipelines actions
Pipelines actions

You can use either the Amazon SageMaker Pipelines Python SDK or the drag-and-drop visual designer in Amazon SageMaker Studio to author, view, edit, execute, and monitor your ML workflows.

The following screenshot shows the visual designer that you can use to create and manage your Amazon SageMaker Pipelines.

![\[Screenshot of the visual drag-and-drop interface for Pipelines in Studio.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


After your pipeline is deployed, you can view the directed acyclic graph (DAG) for your pipeline and manage your executions using Amazon SageMaker Studio. Using SageMaker Studio, you can get information about your current and historical pipelines, compare executions, see the DAG for your executions, get metadata information, and more. To learn about how to view pipelines from Studio, see [View the details of a pipeline](pipelines-studio-list.md). 

**Topics**
+ [

# Define a pipeline
](define-pipeline.md)
+ [

# Edit a pipeline
](edit-pipeline-before-execution.md)
+ [

# Run a pipeline
](run-pipeline.md)
+ [

# Stop a pipeline
](pipelines-studio-stop.md)
+ [

# View the details of a pipeline
](pipelines-studio-list.md)
+ [

# View the details of a pipeline run
](pipelines-studio-view-execution.md)
+ [

# Download a pipeline definition file
](pipelines-studio-download.md)
+ [

# Access experiment data from a pipeline
](pipelines-studio-experiments.md)
+ [

# Track the lineage of a pipeline
](pipelines-lineage-tracking.md)

# Define a pipeline
Define a pipeline

To orchestrate your workflows with Amazon SageMaker Pipelines, you must generate a directed acyclic graph (DAG) in the form of a JSON pipeline definition. The DAG specifies the different steps involved in your ML process, such as data preprocessing, model training, model evaluation, and model deployment, as well as the dependencies and flow of data between these steps. The following topic shows you how to generate a pipeline definition.

You can generate your JSON pipeline definition using either the SageMaker Python SDK or the visual drag-and-drop Pipeline Designer feature in Amazon SageMaker Studio. The following image is a representation of the pipeline DAG that you create in this tutorial:

![\[Screenshot of the visual drag-and-drop interface for Pipelines in Studio.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


The pipeline that you define in the following sections solves a regression problem to determine the age of an abalone based on its physical measurements. For a runnable Jupyter notebook that includes the content in this tutorial, see [Orchestrating Jobs with Amazon SageMaker Model Building Pipelines](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html).

**Note**  
You can reference the model location as a property of the training step, as shown in the end-to-end example [CustomerChurn pipeline](https://github.com/aws-samples/customer-churn-sagemaker-pipelines-sample/blob/main/pipelines/customerchurn/pipeline.py) in Github.

**Topics**

## Define a pipeline (Pipeline Designer)


The following walkthrough guides you through the steps to create a barebones pipeline using the drag-and-drop Pipeline Designer. If you need to pause or end your Pipeline editing session in the visual designer at any time, click on the **Export** option. This allows you to download the current definition of your Pipeline to your local environment. Later, when you want to resume the Pipeline editing process, you can import the same JSON definition file into the visual designer.

### Create a Processing step


To create a data processing job step, do the following:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Process data** and drag it to the canvas.

1. In the canvas, choose the **Process data** step you added.

1. To add an input dataset, choose **Add** under **Data (input)** in the right sidebar and select a dataset.

1. To add a location to save output datasets, choose **Add** under **Data (output)** in the right sidebar and navigate to the destination.

1. Complete the remaining fields in the right sidebar. For information about the fields in these tabs, see [ sagemaker.workflow.steps.ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep).

### Create a Training step


To set up a model training step, do the following:

1. In the left sidebar, choose **Train model** and drag it to the canvas.

1. In the canvas, choose the **Train model** step you added.

1. To add an input dataset, choose **Add** under **Data (input)** in the right sidebar and select a dataset.

1. To choose a location to save your model artifacts, enter an Amazon S3 URI in the **Location (S3 URI)** field, or choose **Browse S3** to navigate to the destination location.

1. Complete the remaining fields in the right sidebar. For information about the fields in these tabs, see [ sagemaker.workflow.steps.TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep).

1. Click and drag the cursor from the **Process data** step you added in the previous section to the **Train model** step to create an edge connecting the two steps.

### Create a model package with a Register model step


To create a model package with a model registration step, do the following:

1. In the left sidebar, choose **Register model** and drag it to the canvas.

1. In the canvas, choose the **Register model** step you added.

1. To select a model to register, choose **Add** under **Model (input)**.

1. Choose **Create a model group** to add your model to a new model group.

1. Complete the remaining fields in the right sidebar. For information about the fields in these tabs, see [ sagemaker.workflow.step\$1collections.RegisterModel](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel).

1. Click and drag the cursor from the **Train model** step you added in the previous section to the **Register model** step to create an edge connecting the two steps.

### Deploy the model to an endpoint with a Deploy model (endpoint) step


To deploy your model using a model deployment step, do the following:

1. In the left sidebar, choose **Deploy model (endpoint)** and drag it to the canvas.

1. In the canvas, choose the **Deploy model (endpoint)** step you added.

1. To choose a model to deploy, choose **Add** under **Model (input)**.

1. Choose the **Create endpoint** radio button to create a new endpoint.

1. Enter a **Name** and **Description** for your endpoint.

1. Click and drag the cursor from the **Register model** step you added in the previous section to the **Deploy model (endpoint)** step to create an edge connecting the two steps.

1. Complete the remaining fields in the right sidebar.

### Define the Pipeline parameters


You can configure a set of Pipeline parameters whose values can be updated for every execution. To define the pipeline parameters and set the default values, click on the gear icon at the bottom of the visual designer.

### Save Pipeline


After you have entered all the required information to create your pipeline, click on **Save** at the bottom of the visual designer. This validates your pipeline for any potential errors at runtime and notifies you. The **Save** operation won't succeed until you address all errors flagged by the automated validations checks. If you want to resume editing at a later point, you can save your in-progress pipeline as a JSON definition in your local environment. You can export your Pipeline as a JSON definition file by clicking on the **Export** button at the bottom of the visual designer. Later, to resume updating your Pipeline, upload that JSON definition file by clicking on the **Import** button.

## Define a pipeline (SageMaker Python SDK)


### Prerequisites


 To run the following tutorial, complete the following: 
+ Set up your notebook instance as outlined in [Create a notebook instance](https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html). This gives your role permissions to read and write to Amazon S3, and create training, batch transform, and processing jobs in SageMaker AI. 
+ Grant your notebook permissions to get and pass its own role as shown in [Modifying a role permissions policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy). Add the following JSON snippet to attach this policy to your role. Replace `<your-role-arn>` with the ARN used to create your notebook instance. 

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "iam:GetRole",
                  "iam:PassRole"
              ],
              "Resource": "arn:aws:iam::111122223333:role/role-name"
          }
      ]
  }
  ```

------
+  Trust the SageMaker AI service principal by following the steps in [Modifying a role trust policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-cli.html#roles-managingrole_edit-trust-policy-cli). Add the following statement fragment to the trust relationship of your role: 

  ```
  {
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
          "Service": "sagemaker.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
  ```

#### Set up your environment


Create a new SageMaker AI session using the following code block. This returns the role ARN for the session. This role ARN should be the execution role ARN that you set up as a prerequisite. 

```
import boto3
import sagemaker
import sagemaker.session
from sagemaker.workflow.pipeline_context import PipelineSession

region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()
default_bucket = sagemaker_session.default_bucket()

pipeline_session = PipelineSession()

model_package_group_name = f"AbaloneModelPackageGroupName"
```

### Create a pipeline


**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

Run the following steps from your SageMaker AI notebook instance to create a pipeline that includes steps for:
+ preprocessing
+ training
+ evaluation
+ conditional evaluation
+ model registration

**Note**  
You can use [ExecutionVariables](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables) and the [ Join](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables) function to specify your output location. `ExecutionVariables` is resolved at runtime. For instance, `ExecutionVariables.PIPELINE_EXECUTION_ID` is resolved to the ID of the current execution, which can be used as a unique identifier across different runs.

#### Step 1: Download the dataset


This notebook uses the UCI Machine Learning Abalone Dataset. The dataset contains the following features: 
+ `length` – The longest shell measurement of the abalone.
+ `diameter` – The diameter of the abalone perpendicular to its length.
+ `height` – The height of the abalone with meat in the shell.
+ `whole_weight` – The weight of the whole abalone.
+ `shucked_weight` – The weight of the meat removed from the abalone.
+ `viscera_weight` – The weight of the abalone viscera after bleeding.
+ `shell_weight` – The weight of the abalone shell after meat removal and drying.
+ `sex` – The sex of the abalone. One of 'M', 'F', or 'I', where 'I' is an infant abalone.
+ `rings` – The number of rings in the abalone shell.

The number of rings in the abalone shell is a good approximation for its age using the formula `age=rings + 1.5`. However, getting this number is a time-consuming task. You must cut the shell through the cone, stain the section, and count the number of rings through a microscope. However, the other physical measurements are easier to get. This notebook uses the dataset to build a predictive model of the variable rings using the other physical measurements.

**To download the dataset**

1. Download the dataset into your account's default Amazon S3 bucket.

   ```
   !mkdir -p data
   local_path = "data/abalone-dataset.csv"
   
   s3 = boto3.resource("s3")
   s3.Bucket(f"sagemaker-servicecatalog-seedcode-{region}").download_file(
       "dataset/abalone-dataset.csv",
       local_path
   )
   
   base_uri = f"s3://{default_bucket}/abalone"
   input_data_uri = sagemaker.s3.S3Uploader.upload(
       local_path=local_path, 
       desired_s3_uri=base_uri,
   )
   print(input_data_uri)
   ```

1. Download a second dataset for batch transformation after your model is created.

   ```
   local_path = "data/abalone-dataset-batch.csv"
   
   s3 = boto3.resource("s3")
   s3.Bucket(f"sagemaker-servicecatalog-seedcode-{region}").download_file(
       "dataset/abalone-dataset-batch",
       local_path
   )
   
   base_uri = f"s3://{default_bucket}/abalone"
   batch_data_uri = sagemaker.s3.S3Uploader.upload(
       local_path=local_path, 
       desired_s3_uri=base_uri,
   )
   print(batch_data_uri)
   ```

#### Step 2: Define pipeline parameters


 This code block defines the following parameters for your pipeline: 
+  `processing_instance_count` – The instance count of the processing job. 
+  `input_data` – The Amazon S3 location of the input data. 
+  `batch_data` – The Amazon S3 location of the input data for batch transformation. 
+  `model_approval_status` – The approval status to register the trained model with for CI/CD. For more information, see [MLOps Automation With SageMaker Projects](sagemaker-projects.md).

```
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
model_approval_status = ParameterString(
    name="ModelApprovalStatus",
    default_value="PendingManualApproval"
)
input_data = ParameterString(
    name="InputData",
    default_value=input_data_uri,
)
batch_data = ParameterString(
    name="BatchData",
    default_value=batch_data_uri,
)
```

#### Step 3: Define a processing step for feature engineering


This section shows how to create a processing step to prepare the data from the dataset for training.

**To create a processing step**

1.  Create a directory for the processing script.

   ```
   !mkdir -p abalone
   ```

1. Create a file in the `/abalone` directory named `preprocessing.py` with the following content. This preprocessing script is passed in to the processing step for running on the input data. The training step then uses the preprocessed training features and labels to train a model. The evaluation step uses the trained model and preprocessed test features and labels to evaluate the model. The script uses `scikit-learn` to do the following:
   +  Fill in missing `sex` categorical data and encode it so it's suitable for training. 
   +  Scale and normalize all numerical fields except for `rings` and `sex`. 
   +  Split the data into training, test, and validation datasets. 

   ```
   %%writefile abalone/preprocessing.py
   import argparse
   import os
   import requests
   import tempfile
   import numpy as np
   import pandas as pd
   
   
   from sklearn.compose import ColumnTransformer
   from sklearn.impute import SimpleImputer
   from sklearn.pipeline import Pipeline
   from sklearn.preprocessing import StandardScaler, OneHotEncoder
   
   
   # Because this is a headerless CSV file, specify the column names here.
   feature_columns_names = [
       "sex",
       "length",
       "diameter",
       "height",
       "whole_weight",
       "shucked_weight",
       "viscera_weight",
       "shell_weight",
   ]
   label_column = "rings"
   
   feature_columns_dtype = {
       "sex": str,
       "length": np.float64,
       "diameter": np.float64,
       "height": np.float64,
       "whole_weight": np.float64,
       "shucked_weight": np.float64,
       "viscera_weight": np.float64,
       "shell_weight": np.float64
   }
   label_column_dtype = {"rings": np.float64}
   
   
   def merge_two_dicts(x, y):
       z = x.copy()
       z.update(y)
       return z
   
   
   if __name__ == "__main__":
       base_dir = "/opt/ml/processing"
   
       df = pd.read_csv(
           f"{base_dir}/input/abalone-dataset.csv",
           header=None, 
           names=feature_columns_names + [label_column],
           dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype)
       )
       numeric_features = list(feature_columns_names)
       numeric_features.remove("sex")
       numeric_transformer = Pipeline(
           steps=[
               ("imputer", SimpleImputer(strategy="median")),
               ("scaler", StandardScaler())
           ]
       )
   
       categorical_features = ["sex"]
       categorical_transformer = Pipeline(
           steps=[
               ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
               ("onehot", OneHotEncoder(handle_unknown="ignore"))
           ]
       )
   
       preprocess = ColumnTransformer(
           transformers=[
               ("num", numeric_transformer, numeric_features),
               ("cat", categorical_transformer, categorical_features)
           ]
       )
       
       y = df.pop("rings")
       X_pre = preprocess.fit_transform(df)
       y_pre = y.to_numpy().reshape(len(y), 1)
       
       X = np.concatenate((y_pre, X_pre), axis=1)
       
       np.random.shuffle(X)
       train, validation, test = np.split(X, [int(.7*len(X)), int(.85*len(X))])
   
       
       pd.DataFrame(train).to_csv(f"{base_dir}/train/train.csv", header=False, index=False)
       pd.DataFrame(validation).to_csv(f"{base_dir}/validation/validation.csv", header=False, index=False)
       pd.DataFrame(test).to_csv(f"{base_dir}/test/test.csv", header=False, index=False)
   ```

1.  Create an instance of an `SKLearnProcessor` to pass in to the processing step. 

   ```
   from sagemaker.sklearn.processing import SKLearnProcessor
   
   
   framework_version = "0.23-1"
   
   sklearn_processor = SKLearnProcessor(
       framework_version=framework_version,
       instance_type="ml.m5.xlarge",
       instance_count=processing_instance_count,
       base_job_name="sklearn-abalone-process",
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1. Create a processing step. This step takes in the `SKLearnProcessor`, the input and output channels, and the `preprocessing.py` script that you created. This is very similar to a processor instance's `run` method in the SageMaker AI Python SDK. The `input_data` parameter passed into `ProcessingStep` is the input data of the step itself. This input data is used by the processor instance when it runs. 

    Note the  `"train`, `"validation`, and `"test"` named channels specified in the output configuration for the processing job. Step `Properties` such as these can be used in subsequent steps and resolve to their runtime values at runtime. 

   ```
   from sagemaker.processing import ProcessingInput, ProcessingOutput
   from sagemaker.workflow.steps import ProcessingStep
      
   
   processor_args = sklearn_processor.run(
       inputs=[
         ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
       ],
       outputs=[
           ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
           ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
           ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
       ],
       code="abalone/preprocessing.py",
   ) 
   
   step_process = ProcessingStep(
       name="AbaloneProcess",
       step_args=processor_args
   )
   ```

#### Step 4: Define a training step


This section shows how to use the SageMaker AI [XGBoost Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) to train a model on the training data output from the processing steps. 

**To define a training step**

1.  Specify the model path where you want to save the models from training. 

   ```
   model_path = f"s3://{default_bucket}/AbaloneTrain"
   ```

1. Configure an estimator for the XGBoost algorithm and the input dataset. The training instance type is passed into the estimator. A typical training script:
   + loads data from the input channels
   + configures training with hyperparameters
   + trains a model
   + saves a model to `model_dir` so that it can be hosted later

   SageMaker AI uploads the model to Amazon S3 in the form of a `model.tar.gz` at the end of the training job.

   ```
   from sagemaker.estimator import Estimator
   
   
   image_uri = sagemaker.image_uris.retrieve(
       framework="xgboost",
       region=region,
       version="1.0-1",
       py_version="py3",
       instance_type="ml.m5.xlarge"
   )
   xgb_train = Estimator(
       image_uri=image_uri,
       instance_type="ml.m5.xlarge",
       instance_count=1,
       output_path=model_path,
       sagemaker_session=pipeline_session,
       role=role,
   )
   xgb_train.set_hyperparameters(
       objective="reg:linear",
       num_round=50,
       max_depth=5,
       eta=0.2,
       gamma=4,
       min_child_weight=6,
       subsample=0.7,
       silent=0
   )
   ```

1. Create a `TrainingStep` using the estimator instance and properties of the `ProcessingStep`. Pass in the `S3Uri` of the `"train"` and `"validation"` output channel to the `TrainingStep`.  

   ```
   from sagemaker.inputs import TrainingInput
   from sagemaker.workflow.steps import TrainingStep
   
   
   train_args = xgb_train.fit(
       inputs={
           "train": TrainingInput(
               s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                   "train"
               ].S3Output.S3Uri,
               content_type="text/csv"
           ),
           "validation": TrainingInput(
               s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                   "validation"
               ].S3Output.S3Uri,
               content_type="text/csv"
           )
       },
   )
   
   step_train = TrainingStep(
       name="AbaloneTrain",
       step_args = train_args
   )
   ```

#### Step 5: Define a processing step for model evaluation


This section shows how to create a processing step to evaluate the accuracy of the model. The result of this model evaluation is used in the condition step to determine which run path to take.

**To define a processing step for model evaluation**

1. Create a file in the `/abalone` directory named `evaluation.py`. This script is used in a processing step to perform model evaluation. It takes a trained model and the test dataset as input, then produces a JSON file containing classification evaluation metrics.

   ```
   %%writefile abalone/evaluation.py
   import json
   import pathlib
   import pickle
   import tarfile
   import joblib
   import numpy as np
   import pandas as pd
   import xgboost
   
   
   from sklearn.metrics import mean_squared_error
   
   
   if __name__ == "__main__":
       model_path = f"/opt/ml/processing/model/model.tar.gz"
       with tarfile.open(model_path) as tar:
           tar.extractall(path=".")
       
       model = pickle.load(open("xgboost-model", "rb"))
   
       test_path = "/opt/ml/processing/test/test.csv"
       df = pd.read_csv(test_path, header=None)
       
       y_test = df.iloc[:, 0].to_numpy()
       df.drop(df.columns[0], axis=1, inplace=True)
       
       X_test = xgboost.DMatrix(df.values)
       
       predictions = model.predict(X_test)
   
       mse = mean_squared_error(y_test, predictions)
       std = np.std(y_test - predictions)
       report_dict = {
           "regression_metrics": {
               "mse": {
                   "value": mse,
                   "standard_deviation": std
               },
           },
       }
   
       output_dir = "/opt/ml/processing/evaluation"
       pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
       
       evaluation_path = f"{output_dir}/evaluation.json"
       with open(evaluation_path, "w") as f:
           f.write(json.dumps(report_dict))
   ```

1.  Create an instance of a `ScriptProcessor` that is used to create a `ProcessingStep`. 

   ```
   from sagemaker.processing import ScriptProcessor
   
   
   script_eval = ScriptProcessor(
       image_uri=image_uri,
       command=["python3"],
       instance_type="ml.m5.xlarge",
       instance_count=1,
       base_job_name="script-abalone-eval",
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1.  Create a `ProcessingStep` using the processor instance, the input and output channels, and the  `evaluation.py` script. Pass in:
   + the `S3ModelArtifacts` property from the `step_train` training step
   + the `S3Uri` of the `"test"` output channel of the `step_process` processing step

   This is very similar to a processor instance's `run` method in the SageMaker AI Python SDK.  

   ```
   from sagemaker.workflow.properties import PropertyFile
   
   
   evaluation_report = PropertyFile(
       name="EvaluationReport",
       output_name="evaluation",
       path="evaluation.json"
   )
   
   eval_args = script_eval.run(
           inputs=[
           ProcessingInput(
               source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
               destination="/opt/ml/processing/model"
           ),
           ProcessingInput(
               source=step_process.properties.ProcessingOutputConfig.Outputs[
                   "test"
               ].S3Output.S3Uri,
               destination="/opt/ml/processing/test"
           )
       ],
       outputs=[
           ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation"),
       ],
       code="abalone/evaluation.py",
   )
   
   step_eval = ProcessingStep(
       name="AbaloneEval",
       step_args=eval_args,
       property_files=[evaluation_report],
   )
   ```

#### Step 6: Define a CreateModelStep for batch transformation


**Important**  
We recommend using [Model step](build-and-manage-steps-types.md#step-type-model) to create models as of v2.90.0 of the SageMaker Python SDK. `CreateModelStep` will continue to work in previous versions of the SageMaker Python SDK, but is no longer actively supported.

This section shows how to create a SageMaker AI model from the output of the training step. This model is used for batch transformation on a new dataset. This step is passed into the condition step and only runs if the condition step evaluates to `true`.

**To define a CreateModelStep for batch transformation**

1.  Create a SageMaker AI model. Pass in the `S3ModelArtifacts` property from the `step_train` training step.

   ```
   from sagemaker.model import Model
   
   
   model = Model(
       image_uri=image_uri,
       model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1. Define the model input for your SageMaker AI model.

   ```
   from sagemaker.inputs import CreateModelInput
   
   
   inputs = CreateModelInput(
       instance_type="ml.m5.large",
       accelerator_type="ml.eia1.medium",
   )
   ```

1. Create your `CreateModelStep` using the `CreateModelInput` and SageMaker AI model instance you defined.

   ```
   from sagemaker.workflow.steps import CreateModelStep
   
   
   step_create_model = CreateModelStep(
       name="AbaloneCreateModel",
       model=model,
       inputs=inputs,
   )
   ```

#### Step 7: Define a TransformStep to perform batch transformation


This section shows how to create a `TransformStep` to perform batch transformation on a dataset after the model is trained. This step is passed into the condition step and only runs if the condition step evaluates to `true`.

**To define a TransformStep to perform batch transformation**

1. Create a transformer instance with the appropriate compute instance type, instance count, and desired output Amazon S3 bucket URI. Pass in the `ModelName` property from the `step_create_model` `CreateModel` step. 

   ```
   from sagemaker.transformer import Transformer
   
   
   transformer = Transformer(
       model_name=step_create_model.properties.ModelName,
       instance_type="ml.m5.xlarge",
       instance_count=1,
       output_path=f"s3://{default_bucket}/AbaloneTransform"
   )
   ```

1. Create a `TransformStep` using the transformer instance you defined and the `batch_data` pipeline parameter.

   ```
   from sagemaker.inputs import TransformInput
   from sagemaker.workflow.steps import TransformStep
   
   
   step_transform = TransformStep(
       name="AbaloneTransform",
       transformer=transformer,
       inputs=TransformInput(data=batch_data)
   )
   ```

#### Step 8: Define a RegisterModel step to create a model package


**Important**  
We recommend using [Model step](build-and-manage-steps-types.md#step-type-model) to register models as of v2.90.0 of the SageMaker Python SDK. `RegisterModel` will continue to work in previous versions of the SageMaker Python SDK, but is no longer actively supported.

This section shows how to create an instance of `RegisterModel`. The result of running `RegisterModel` in a pipeline is a model package. A model package is a reusable model artifacts abstraction that packages all ingredients necessary for inference. It consists of an inference specification that defines the inference image to use along with an optional model weights location. A model package group is a collection of model packages. You can use a `ModelPackageGroup` for Pipelines to add a new version and model package to the group for every pipeline run. For more information about model registry, see [Model Registration Deployment with Model Registry](model-registry.md).

This step is passed into the condition step and only runs if the condition step evaluates to `true`.

**To define a RegisterModel step to create a model package**
+  Construct a `RegisterModel` step using the estimator instance you used for the training step . Pass in the `S3ModelArtifacts` property from the `step_train` training step and specify a `ModelPackageGroup`. Pipelines creates this `ModelPackageGroup` for you.

  ```
  from sagemaker.model_metrics import MetricsSource, ModelMetrics 
  from sagemaker.workflow.step_collections import RegisterModel
  
  
  model_metrics = ModelMetrics(
      model_statistics=MetricsSource(
          s3_uri="{}/evaluation.json".format(
              step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
          ),
          content_type="application/json"
      )
  )
  step_register = RegisterModel(
      name="AbaloneRegisterModel",
      estimator=xgb_train,
      model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
      content_types=["text/csv"],
      response_types=["text/csv"],
      inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
      transform_instances=["ml.m5.xlarge"],
      model_package_group_name=model_package_group_name,
      approval_status=model_approval_status,
      model_metrics=model_metrics
  )
  ```

#### Step 9: Define a condition step to verify model accuracy


A `ConditionStep` allows Pipelines to support conditional running in your pipeline DAG based on the condition of step properties. In this case, you only want to register a model package if the accuracy of that model exceeds the required value. The accuracy of the model is determined by the model evaluation step. If the accuracy exceeds the required value, the pipeline also creates a SageMaker AI Model and runs batch transformation on a dataset. This section shows how to define the Condition step.

**To define a condition step to verify model accuracy**

1.  Define a `ConditionLessThanOrEqualTo` condition using the accuracy value found in the output of the model evaluation processing step, `step_eval`. Get this output using the property file you indexed in the processing step and the respective JSONPath of the mean squared error value, `"mse"`.

   ```
   from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
   from sagemaker.workflow.condition_step import ConditionStep
   from sagemaker.workflow.functions import JsonGet
   
   
   cond_lte = ConditionLessThanOrEqualTo(
       left=JsonGet(
           step_name=step_eval.name,
           property_file=evaluation_report,
           json_path="regression_metrics.mse.value"
       ),
       right=6.0
   )
   ```

1.  Construct a `ConditionStep`. Pass the `ConditionEquals` condition in, then set the model package registration and batch transformation steps as the next steps if the condition passes. 

   ```
   step_cond = ConditionStep(
       name="AbaloneMSECond",
       conditions=[cond_lte],
       if_steps=[step_register, step_create_model, step_transform],
       else_steps=[], 
   )
   ```

#### Step 10: Create a pipeline


Now that you’ve created all of the steps, combine them into a pipeline.

**To create a pipeline**

1.  Define the following for your pipeline: `name`, `parameters`, and `steps`. Names must be unique within an `(account, region)` pair.
**Note**  
A step can only appear once in either the pipeline's step list or the if/else step lists of the condition step. It cannot appear in both. 

   ```
   from sagemaker.workflow.pipeline import Pipeline
   
   
   pipeline_name = f"AbalonePipeline"
   pipeline = Pipeline(
       name=pipeline_name,
       parameters=[
           processing_instance_count,
           model_approval_status,
           input_data,
           batch_data,
       ],
       steps=[step_process, step_train, step_eval, step_cond],
   )
   ```

1.  (Optional) Examine the JSON pipeline definition to ensure that it's well-formed.

   ```
   import json
   
   json.loads(pipeline.definition())
   ```

 This pipeline definition is ready to submit to SageMaker AI. In the next tutorial, you submit this pipeline to SageMaker AI and start a run. 

## Define a pipeline (JSON)


You can also use [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_pipeline) or [CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-pipeline.html) to create a pipeline. Creating a pipeline requires a pipeline definition, which is a JSON object that defines each step of the pipeline. The SageMaker SDK offers a simple way to construct the pipeline definition, which you can use with any of the APIs previously mentioned to create the pipeline itself. Without using the SDK, users have to write the raw JSON definition to create the pipeline without any of the error checks provided by the SageMaker Python SDK. To see the schema for the pipeline JSON definition, see [ SageMaker AI Pipeline Definition JSON Schema](https://aws-sagemaker-mlops.github.io/sagemaker-model-building-pipeline-definition-JSON-schema/). The following code sample shows an example of a SageMaker AI pipeline definition JSON object:

```
{'Version': '2020-12-01',
 'Metadata': {},
 'Parameters': [{'Name': 'ProcessingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.m5.xlarge'},
  {'Name': 'ProcessingInstanceCount', 'Type': 'Integer', 'DefaultValue': 1},
  {'Name': 'TrainingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.m5.xlarge'},
  {'Name': 'ModelApprovalStatus',
   'Type': 'String',
   'DefaultValue': 'PendingManualApproval'},
  {'Name': 'ProcessedData',
   'Type': 'String',
   'DefaultValue': 'S3_URL',
{'Name': 'InputDataUrl',
   'Type': 'String',
   'DefaultValue': 'S3_URL',
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
  'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Name': 'ReadTrainDataFromFS',
   'Type': 'Processing',
   'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': 'ml.m5.4xlarge',
      'InstanceCount': 2,
      'VolumeSizeInGB': 30}},
    'AppSpecification': {'ImageUri': 'IMAGE_URI',
     'ContainerArguments': [....]},
    'RoleArn': 'ROLE',
      'ProcessingInputs': [...],
    'ProcessingOutputConfig': {'Outputs': [.....]},
    'StoppingCondition': {'MaxRuntimeInSeconds': 86400}},
   'CacheConfig': {'Enabled': True, 'ExpireAfter': '30d'}},
   ...
   ...
   ...
  }
```

 **Next step:** [Run a pipeline](run-pipeline.md) 

# Edit a pipeline


To make changes to a pipeline before running it, do the following:

1. Open SageMaker Studio by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane of Studio, select **Pipelines**.

1. Select a pipeline name to view details about the pipeline.

1. Choose the **Executions** tab.

1. Select the name of a pipeline execution.

1. Choose **Edit** to open the Pipeline Designer.

1. Update the edges between steps or the step configuration as required and click **Save**. 

   Saving a pipeline after editing automatically generates a new version number.

1. Choose **Run**.

# Run a pipeline


After defining the steps of your pipeline as a directed acyclic graph (DAG), you can run your pipeline, which executes the steps defined in your DAG. The following walkthroughs show you how to run an Amazon SageMaker AI pipeline using either the drag-and-drop visual editor in Amazon SageMaker Studio or the Amazon SageMaker Python SDK.

## Run a pipeline (Pipeline designer)


To start a new execution of your pipeline, do the following:

------
#### [ Studio ]

1. Open SageMaker Studio by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Choose a pipeline name to open the pipeline details view.

1. Choose **Visual Editor** on the top right.

1. To start an execution from the latest version, choose **Executions**.

1. To start an execution from a specific version, follow these steps:
   + Choose the version icon in the bottom toolbar to open the version panel.
   + Choose the pipeline version you want to execute.
   + Hover over the version item to reveal the three-dot menu, choose **Execute**.
   + (Optional) To view a previous version of the pipeline, choose **Preview** from the three-dot menu in the version panel. You can also edit the version by choosing **Edit** in the notification bar.

**Note**  
If your pipeline fails, the status banner will show a **Failed** status. After troubleshooting the failed step, choose **Retry** on the status banner to resume running the pipeline from that step.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name.

1. From the **Executions** or **Graph** tab in the execution list, choose **Create execution**.

1. Enter or update the following required information:
   + **Name** – Must be unique to your account in the AWS Region.
   + **ProcessingInstanceCount** – The number of instances to use for processing.
   + **ModelApprovalStatus** – For your convenience.
   + **InputDataUrl** – The Amazon S3 URI of the input data.

1. Choose **Start**.

Once your pipeline is running, you can view the details of the execution by choosing **View details** on the status banner.

To stop the run, choose **Stop** on the status banner. To resume the execution from where it was stopped, choose **Resume** on the status banner.

**Note**  
If your pipeline fails, the status banner will show a **Failed** status. After troubleshooting the failed step, choose **Retry** on the status banner to resume running the pipeline from that step.

------

## Run a pipeline (SageMaker Python SDK)


After you’ve created a pipeline definition using the SageMaker AI Python SDK, you can submit it to SageMaker AI to start your execution. The following tutorial shows how to submit a pipeline, start an execution, examine the results of that execution, and delete your pipeline. 

**Topics**
+ [

### Prerequisites
](#run-pipeline-prereq)
+ [

### Step 1: Start the Pipeline
](#run-pipeline-submit)
+ [

### Step 2: Examine a Pipeline Execution
](#run-pipeline-examine)
+ [

### Step 3: Override Default Parameters for a Pipeline Execution
](#run-pipeline-parametrized)
+ [

### Step 4: Stop and Delete a Pipeline Execution
](#run-pipeline-delete)

### Prerequisites


This tutorial requires the following: 
+  A SageMaker notebook instance.  
+  A Pipelines pipeline definition. This tutorial assumes you're using the pipeline definition created by completing the [Define a pipeline](define-pipeline.md) tutorial. 

### Step 1: Start the Pipeline


First, you need to start the pipeline. 

**To start the pipeline**

1. Examine the JSON pipeline definition to ensure that it's well-formed.

   ```
   import json
   
   json.loads(pipeline.definition())
   ```

1. Submit the pipeline definition to the Pipelines service to create a pipeline if it doesn't exist, or update the pipeline if it does. The role passed in is used by Pipelines to create all of the jobs defined in the steps. 

   ```
   pipeline.upsert(role_arn=role)
   ```

1. Start a pipeline execution.

   ```
   execution = pipeline.start()
   ```

### Step 2: Examine a Pipeline Execution


Next, you need to examine the pipeline execution. 

**To examine a pipeline execution**

1.  Describe the pipeline execution status to ensure that it has been created and started successfully.

   ```
   execution.describe()
   ```

1. Wait for the execution to finish. 

   ```
   execution.wait()
   ```

1. List the execution steps and their status.

   ```
   execution.list_steps()
   ```

   Your output should look like the following:

   ```
   [{'StepName': 'AbaloneTransform',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 27, 870000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 45, 50, 492000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'TransformJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:transform-job/pipelines-cfvy1tjuxdq8-abalonetransform-ptyjoef3jy'}}},
    {'StepName': 'AbaloneRegisterModel',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 929000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 28, 15000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:model-package/abalonemodelpackagegroupname/1'}}},
    {'StepName': 'AbaloneCreateModel',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 895000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 27, 708000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:model/pipelines-cfvy1tjuxdq8-abalonecreatemodel-jl94rai0ra'}}},
    {'StepName': 'AbaloneMSECond',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 25, 558000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 329000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'Condition': {'Outcome': 'True'}}},
    {'StepName': 'AbaloneEval',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 37, 34, 767000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 18, 80000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:processing-job/pipelines-cfvy1tjuxdq8-abaloneeval-zfraozhmny'}}},
    {'StepName': 'AbaloneTrain',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 34, 55, 867000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 37, 34, 34000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:training-job/pipelines-cfvy1tjuxdq8-abalonetrain-tavd6f3wdf'}}},
    {'StepName': 'AbaloneProcess',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 30, 27, 160000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 34, 48, 390000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:processing-job/pipelines-cfvy1tjuxdq8-abaloneprocess-mgqyfdujcj'}}}]
   ```

1. After your pipeline execution is complete, download the resulting  `evaluation.json` file from Amazon S3 to examine the report. 

   ```
   evaluation_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
       step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
   ))
   json.loads(evaluation_json)
   ```

### Step 3: Override Default Parameters for a Pipeline Execution


You can run additional executions of the pipeline by specifying different pipeline parameters to override the defaults.

**To override default parameters**

1. Create the pipeline execution. This starts another pipeline execution with the model approval status override set to "Approved". This means that the model package version generated by the `RegisterModel` step is automatically ready for deployment through CI/CD pipelines, such as with SageMaker Projects. For more information, see [MLOps Automation With SageMaker Projects](sagemaker-projects.md).

   ```
   execution = pipeline.start(
       parameters=dict(
           ModelApprovalStatus="Approved",
       )
   )
   ```

1. Wait for the execution to finish. 

   ```
   execution.wait()
   ```

1. List the execution steps and their status.

   ```
   execution.list_steps()
   ```

1. After your pipeline execution is complete, download the resulting  `evaluation.json` file from Amazon S3 to examine the report. 

   ```
   evaluation_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
       step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
   ))
   json.loads(evaluation_json)
   ```

### Step 4: Stop and Delete a Pipeline Execution


When you're finished with your pipeline, you can stop any ongoing executions and delete the pipeline.

**To stop and delete a pipeline execution**

1. Stop the pipeline execution.

   ```
   execution.stop()
   ```

1. Delete the pipeline.

   ```
   pipeline.delete()
   ```

# Stop a pipeline


You can stop a pipeline run in the Amazon SageMaker Studio console.

To stop a pipeline execution in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name.

1. Choose the **Executions** tab.

1. Select the execution to stop.

1. Choose **Stop**. To resume the execution from where it was stopped, choose **Resume**

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. To stop a pipeline run, choose **View details** on the status banner of the pipeline, and then choose **Stop**. To resume the execution from where it was stopped, choose **Resume**.

------

# View the details of a pipeline


You can view the details of a SageMaker AI pipeline to understand its parameters, the dependencies of its steps, or monitor its progress and status. This can help you troubleshoot or optimize your workflow. You can access the details of a given pipeline using the Amazon SageMaker Studio console and explore its execution history, definition, parameters, and metadata.

Alternatively, if your pipeline is associated with a SageMaker AI Project, you can access the pipeline details from the project's details page. For more information, see [View Project Resources](sagemaker-projects-resources.md).

To view the details of a SageMaker AI pipeline, complete the following steps based on whether you use Studio or Studio Classic.

**Note**  
Model repacking happens when the pipeline needs to include a custom script in the compressed model file (model.tar.gz) to be uploaded to Amazon S3 and used to deploy a model to a SageMaker AI endpoint. When SageMaker AI pipeline trains a model and registers it to the model registry, it introduces a repack step *if* the trained model output from the training job needs to include a custom inference script. The repack step uncompresses the model, adds a new script, and recompresses the model. Running the pipeline adds the repack step as a training job.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name to view details about the pipeline.

1. Choose one of the following tabs to view pipeline details:
   + **Executions** – Details about the executions.
   + **Graph** – The pipeline graph, including all steps.
   + **Parameters** – The run parameters and metrics related to the pipeline.
   + **Information** – The metadata associated with the pipeline, such as tags, the pipeline Amazon Resource Name (ARN), and role ARN. You can also edit the pipeline description from this page.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name to view details about the pipeline. The pipeline details tab opens and displays a list of pipeline executions. You can start an execution or choose one of the other tabs for more information about the pipeline. Use the **Property Inspector** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/gears.png)) to choose which columns to display.

1. From the pipeline details page, choose one of the following tabs to view details about the pipeline:
   + **Executions** – Details about the executions. You can create an execution from this tab or the **Graph** tab.
   + **Graph** – The DAG for the pipeline.
   + **Parameters** – Includes the model approval status.
   + **Settings** – The metadata associated with the pipeline. You can download the pipeline definition file and edit the pipeline name and description from this tab.

------

# View the details of a pipeline run


You can review the details of a particular SageMaker AI pipeline run. This can help you:
+ Identify and resolve problems that may have occurred during the run, such as failed steps or unexpected errors.
+ Compare the results of different pipeline executions to understand how changes in input data or parameters impact the overall workflow.
+ Identify bottlenecks and opportunities for optimization.

To view the details of a pipeline run, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name to view details about the pipeline.

1. Choose the **Executions** tab.

1. Select the name of a pipeline execution to view. The pipeline graph for that execution appears.

1. Choose any of the pipeline steps in the graph to see step settings in the right sidebar.

1. Choose one of the following tabs to view more pipeline details:
   + **Definition** — The pipeline graph, including all steps.
   + **Parameters** – Includes the model approval status.
   + **Details** – The metadata associated with the pipeline, such as tags, the pipeline Amazon Resource Name (ARN), and role ARN. You can also edit the pipeline description from this page.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name. The pipeline's **Executions** page opens.

1. In the **Executions** page, select an execution name to view details about the execution. The execution details tab opens and displays a graph of the steps in the pipeline.

1. To search for a step by name, type characters that match a step name in the search field. Use the resizing icons on the lower-right side of the graph to zoom in and out of the graph, fit the graph to screen, and expand the graph to full screen. To focus on a specific part of the graph, you can select a blank area of the graph and drag the graph to center on that area.   
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/execution-graph-w-input.png)

1. Choose one of the pipeline steps in the graph to see details about the step. In the preceding screenshot, a training step is chosen and displays the following tabs:
   + **Input** – The training inputs. If an input source is from Amazon Simple Storage Service (Amazon S3), choose the link to view the file in the Amazon S3 console.
   + **Output** – The training outputs, such as metrics, charts, files, and evaluation outcome. The graphs are produced using the [Tracker](https://sagemaker-experiments.readthedocs.io/en/latest/tracker.html#smexperiments.tracker.Tracker.log_precision_recall) APIs.
   + **Logs** – The Amazon CloudWatch logs produced by the step.
   + **Info** – The parameters and metadata associated with the step.  
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/execution-graph-info.png)

------

# Download a pipeline definition file


You can download the definition file for your SageMaker AI pipeline directly from the Amazon SageMaker Studio UI. You can use this pipeline definition file for:
+ Backup and restoration: Use the downloaded file to create a backup of your pipeline configuration, which you can restore in case of infrastructure failures or accidental changes.
+ Version control: Store the pipeline definition file in a source control system to track changes to the pipeline and revert to previous versions if needed.
+ Programmatic interactions: Use the pipeline definition file as input to the SageMaker SDK or AWS CLI.
+ Integration with automation processes: Integrate the pipeline definition into your CI/CD workflows or other automation processes.

To download the definition file of a pipeline, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name. The **Executions** page opens and displays a list of pipeline executions.

1. Stay on the **Executions** page or choose the **Graph**, **Information**, or **Parameters** page to the left of the pipeline executions table. You can download the pipeline definition from any of these pages.

1. At the top right of the page, choose the vertical ellipsis and choose **Download pipeline definition (JSON)**.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name.

1. Choose the **Settings** tab.

1. Choose **Download pipeline definition file**.

------

# Access experiment data from a pipeline
Access experiment data from a pipeline

**Note**  
SageMaker Experiments is a feature provided in Studio Classic only.

When you create a pipeline and specify [pipeline\$1experiment\$1config](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline.Pipeline.pipeline_experiment_config), Pipelines creates the following SageMaker Experiments entities by default if they don't exist:
+ An experiment for the pipeline
+ A run group for every execution of the pipeline
+ A run for each SageMaker AI job created in a pipeline step

For information about how experiments are integrated with pipelines, see [Amazon SageMaker Experiments Integration](pipelines-experiments.md). For more information about SageMaker Experiments, see [Amazon SageMaker Experiments in Studio Classic](experiments.md).

You can get to the list of runs associated with a pipeline from either the pipeline executions list or the experiments list.

**To view the runs list from the pipeline executions list**

1. To view the pipeline executions list, follow the first five steps in the *Studio Classic* tab of [View the details of a pipeline](pipelines-studio-list.md).

1. On the top right of the screen, choose the **Filter** icon (![\[Funnel or filter icon representing data filtering or narrowing down options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/jumpstart/jumpstart-filter-icon.png)).

1. Choose **Experiment**. If experiment integration wasn't deactivated when the pipeline was created, the experiment name is displayed in the executions list. 
**Note**  
Experiments integration was introduced in v2.41.0 of the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable). Pipelines created with an earlier version of the SDK aren't integrated with experiments by default.

1. Select the experiment of your choice to view run groups and runs related to that experiment.

**To view the runs list from the experiments list**

1. In the left sidebar of Studio Classic, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Experiments** from the menu.

1. Use search bar or **Filter** icon (![\[Funnel or filter icon representing data filtering or narrowing down options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/jumpstart/jumpstart-filter-icon.png)) to filter the list to experiments created by a pipeline.

1. Open an experiment name and view a list of runs created by the pipeline.

# Track the lineage of a pipeline
Track the lineage of a pipeline

In this tutorial, you use Amazon SageMaker Studio to track the lineage of an Amazon SageMaker AI ML Pipeline.

The pipeline was created by the [Orchestrating Jobs with Amazon SageMaker Model Building Pipelines](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html) notebook in the [Amazon SageMaker example GitHub repository](https://github.com/awslabs/amazon-sagemaker-examples). For detailed information on how the pipeline was created, see [Define a pipeline](define-pipeline.md).

Lineage tracking in Studio is centered around a directed acyclic graph (DAG). The DAG represents the steps in a pipeline. From the DAG you can track the lineage from any step to any other step. The following diagram displays the steps in the pipeline. These steps appear as a DAG in Studio.

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/pipeline-tutorial-steps.png)


To track the lineage of a pipeline in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

**To track the lineage of a pipeline**

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. In the **Name** column, select a pipeline name to view details about the pipeline.

1. Choose the **Executions** tab.

1. In the **Name** column of the **Executions** table, select the name of a pipeline execution to view.

1. At the top right of the **Executions** page, choose the vertical ellipsis and choose **Download pipeline definition (JSON)**. You can view the file to see how the pipeline graph was defined. 

1. Choose **Edit** to open the Pipeline Designer.

1. Use the resizing and zoom controls at the top right corner of the canvas to zoom in and out of the graph, fit the graph to screen, or expand the graph to full screen.

1. To view your training, validation, and test datasets, complete the following steps:

   1. Choose the Processing step in your pipeline graph.

   1. In the right sidebar, choose the **Overview** tab.

   1. In the **Files** section, find the Amazon S3 paths to the training, validation, and test datasets.

1. To view your model artifacts, complete the following steps:

   1. Choose the Training step in your pipeline graph.

   1. In the right sidebar, choose the **Overview** tab.

   1. In the **Files** section, find the Amazon S3 paths to the model artifact.

1. To find the model package ARN, complete the following steps:

   1. Choose the Register model step.

   1. In the right sidebar, choose the **Overview** tab.

   1. In the **Files** section, find the ARN of the model package.

------
#### [ Studio Classic ]

**To track the lineage of a pipeline**

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left sidebar of Studio, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. In the menu, select **Pipelines**.

1. Use the **Search** box to filter the pipelines list.

1. Choose the `AbalonePipeline` pipeline to view the execution list and other details about the pipeline.

1. Choose the **Property Inspector** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/gears.png)) in the right sidebar to open the **TABLE PROPERTIES** pane, where you can choose which properties to view.

1. Choose the **Settings** tab and then choose **Download pipeline definition file**. You can view the file to see how the pipeline graph was defined.

1. On the **Execution** tab, select the first row in the execution list to view its execution graph and other details about the execution. Note that the graph matches the diagram displayed at the beginning of the tutorial.

   Use the resizing icons on the lower-right side of the graph to zoom in and out of the graph, fit the graph to screen, or expand the graph to full screen. To focus on a specific part of the graph, you can select a blank area of the graph and drag the graph to center on that area. The inset on the lower-right side of the graph displays your location in the graph.  
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/pipeline-tutorial-execution-graph.png)

1. On the **Graph** tab, choose the `AbaloneProcess` step to view details about the step.

1. Find the Amazon S3 paths to the training, validation, and test datasets in the **Output** tab, under **Files**.
**Note**  
To get the full paths, right-click the path and then choose **Copy cell contents**.

   ```
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/train
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/validation
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/test
   ```

1. Choose the `AbaloneTrain` step.

1. Find the Amazon S3 path to the model artifact in the **Output** tab, under **Files**:

   ```
   s3://sagemaker-eu-west-1-acct-id/AbaloneTrain/pipelines-6locnsqz4bfu-AbaloneTrain-NtfEpI0Ahu/output/model.tar.gz
   ```

1. Choose the `AbaloneRegisterModel` step.

1. Find the ARN of the model package in the **Output** tab, under **Files**:

   ```
   arn:aws:sagemaker:eu-west-1:acct-id:model-package/abalonemodelpackagegroupname/2
   ```

------