# SageMaker Notebook Jobs
<a name="notebook-auto-run"></a>

You can use Amazon SageMaker AI to interactively build, train, and deploy machine learning models from your Jupyter notebook in any JupyterLab environment. However, there are various scenarios in which you might want to run your notebook as a noninteractive, scheduled job. For example, you might want to create regular audit reports that analyze all training jobs run over a certain time frame and analyze the business value of deploying those models into production. Or you might want to scale up a feature engineering job after testing the data transformation logic on a small subset of data. Other common use cases include:
+ Scheduling jobs for model drift monitoring
+ Exploring the parameter space for better models

In these scenarios, you can use SageMaker Notebook Jobs to create a noninteractive job (which SageMaker AI runs as an underlying training job) to either run on demand or on a schedule. SageMaker Notebook Jobs provides an intuitive user interface so you can schedule your jobs right from JupyterLab by choosing the Notebook Jobs widget (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in your notebook. You can also schedule your jobs using the SageMaker AI Python SDK, which offers the flexibility of scheduling multiple notebook jobs in a pipeline workflow. You can run multiple notebooks in parallel, and parameterize cells in your notebooks to customize the input parameters.

This feature leverages the Amazon EventBridge, SageMaker Training and Pipelines services and is available for use in your Jupyter notebook in any of the following environments:
+ Studio, Studio Lab, Studio Classic, or Notebook Instances
+ Local setup, such as your local machine, where you run JupyterLab

**Prerequisites**

To schedule a notebook job, make sure you meet the following criteria:
+ Ensure your Jupyter notebook and any initialization or startup scripts are self-contained with respect to code and software packages. Otherwise, your noninteractive job may incur errors.
+ Review [Constraints and considerations](notebook-auto-run-constraints.md) to make sure you properly configured your Jupyter notebook, network settings, and container settings.
+ Ensure your notebook can access needed external resources, such as Amazon EMR clusters.
+ If you are setting up Notebook Jobs in a local Jupyter notebook, complete the installation. For instructions, see [Installation guide](scheduled-notebook-installation.md). 
+ If you connect to an Amazon EMR cluster in your notebook and want to parameterize your Amazon EMR connection command, you must apply a workaround using environment variables to pass parameters. For details, see [Connect to an Amazon EMR cluster from your notebook](scheduled-notebook-connect-emr.md).
+ If you connect to an Amazon EMR cluster using Kerberos, LDAP, or HTTP Basic Auth authentication, you must use the AWS Secrets Manager to pass your security credentials to your Amazon EMR connection command. For details, see [Connect to an Amazon EMR cluster from your notebook](scheduled-notebook-connect-emr.md).
+ (optional) If you want the UI to preload a script to run upon notebook startup, your admin must install it with a Lifecycle Configuration (LCC). For information about how to use a LCC script, see [Customize a Notebook Instance Using a Lifecycle Configuration Script](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html).

# Installation guide
<a name="scheduled-notebook-installation"></a>

The following provides information about what you need to install to use Notebook Jobs in your JupyterLab environment.

**For Amazon SageMaker Studio and Amazon SageMaker Studio Lab**

If your notebook is in Amazon SageMaker Studio or Amazon SageMaker Studio Lab, you don’t need to perform additional installation—SageMaker Notebook Jobs is built into the platform. To set up required permissions for Studio, see [Set up policies and permissions for Studio](scheduled-notebook-policies-studio.md).

**For local Jupyter notebooks**

If you want to use SageMaker Notebook Jobs for your local JupyterLab environment, you need to perform additional installation.

To install SageMaker Notebook Jobs, complete the following steps:

1. Install Python 3. For details, see [Installing Python 3 and Python Packages](https://www.codecademy.com/article/install-python3).

1. Install JupyterLab version 4 or higher. For details, see [JupyterLab SDK documentation](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html).

1. Install the AWS CLI. For details, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

1. Install two sets of permissions. The IAM user needs permissions to submit jobs to SageMaker AI, and once submitted, the notebook job itself assumes an IAM role that needs permissions to access resources depending on the job tasks.

   1. If you haven’t yet created an IAM user, see [Creating an IAM user in your AWS account](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html).

   1. If you haven’t yet created your notebook job role, see [Creating a role to delegate permissions to an IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html).

   1. Attach the necessary permissions and trust policy to attach to your user and role. For step-by-step instructions and permission details, see [Install policies and permissions for local Jupyter environments](scheduled-notebook-policies-other.md).

1. Generate AWS credentials for your newly-created IAM user and save them in the credentials file (\$1/.aws/credentials) of your JupyterLab environment. You can do this with the CLI command `aws configure`. For instructions, see section *Set and view configuration settings using commands* in [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).

1. (optional) By default, the scheduler extension uses a pre-built SageMaker AI Docker image with Python 2.0. Any non-default kernel used in the notebook should be installed in the container. If you want to run your notebook in a container or Docker image, you need to create an Amazon Elastic Container Registry (Amazon ECR) image. For information about how to push a Docker image to an Amazon ECR, see [Pushing a Docker Image](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html).

1. Add the JupyterLab extension for SageMaker Notebook Jobs. You can add it to your JupyterLab environment with the command: `pip install amazon_sagemaker_jupyter_scheduler`. You may need to restart your Jupyter server with the command:`sudo systemctl restart jupyter-server`.

1. Start JupyterLab with the command: `jupyter lab`.

1. Verify that the Notebook Jobs widget (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) appears in your Jupyter notebook taskbar.

# Set up policies and permissions for Studio
<a name="scheduled-notebook-policies-studio"></a>

You will need to install the proper policies and permissions before you schedule your first notebook run. The following provides instructions on setting up the following permissions:
+ Job execution role trust relationships
+ Additional IAM permissions attached to the job execution role
+ (optional) The AWS KMS permission policy to use a custom KMS key

**Important**  
If your AWS account belongs to an organization with service control policies (SCP) in place, your effective permissions are the logical intersection between what is allowed by the SCPs and what is allowed by your IAM role and user policies. For example, if your organization’s SCP specifies that you can only access resources in `us-east-1` and `us-west-1`, and your policies only allow you to access resources in `us-west-1` and `us-west-2`, then ultimately you can only access resources in `us-west-1`. If you want to exercise all the permissions allowed in your role and user policies, your organization’s SCPs should grant the same set of permissions as your own IAM user and role policies. For details about how to determine your allowed requests, see [Determining whether a request is allowed or denied within an account](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html#policy-eval-denyallow).

**Trust relationships**

To modify the trust relationships, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Roles** in the left panel.

1. Find the job execution role for your notebook job and choose the role name. 

1. Choose the **Trust relationships** tab.

1. Choose **Edit trust policy**.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "sagemaker.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "events.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

1. Choose **Update Policy**.

## Additional IAM permissions
<a name="scheduled-notebook-policies-add"></a>

You might need to include additional IAM permissions in the following situations:
+ Your Studio execution and notebook job roles differ
+ You need to access Amazon S3 resources through a S3 VPC endpoint
+ You want to use a custom KMS key to encrypt your input and output Amazon S3 buckets

The following discussion provides the policies you need for each case.

### Permissions needed if your Studio execution and notebook job roles differ
<a name="scheduled-notebook-policies-add-diffrole"></a>

The following JSON snippet is an example policy that you should add to the Studio execution and notebook job roles if you don’t use the Studio execution role as the notebook job role. Review and modify this policy if you need to further restrict privileges.

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Action":"iam:PassRole",
         "Resource":"arn:aws:iam::*:role/*",
         "Condition":{
            "StringLike":{
               "iam:PassedToService":[
                  "sagemaker.amazonaws.com",
                  "events.amazonaws.com"
               ]
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":[
            "events:TagResource",
            "events:DeleteRule",
            "events:PutTargets",
            "events:DescribeRule",
            "events:PutRule",
            "events:RemoveTargets",
            "events:DisableRule",
            "events:EnableRule"
         ],
         "Resource":"*",
         "Condition":{
            "StringEquals":{
               "aws:ResourceTag/sagemaker:is-scheduling-notebook-job":"true"
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":[
            "s3:CreateBucket",
            "s3:PutBucketVersioning",
            "s3:PutEncryptionConfiguration"
         ],
         "Resource":"arn:aws:s3:::sagemaker-automated-execution-*"
      },
      {
            "Sid": "S3DriverAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::sagemakerheadlessexecution-*"
            ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "sagemaker:ListTags"
         ],
         "Resource":[
            "arn:aws:sagemaker:*:*:user-profile/*",
            "arn:aws:sagemaker:*:*:space/*",
            "arn:aws:sagemaker:*:*:training-job/*",
            "arn:aws:sagemaker:*:*:pipeline/*"
         ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "sagemaker:AddTags"
         ],
         "Resource":[
            "arn:aws:sagemaker:*:*:training-job/*",
            "arn:aws:sagemaker:*:*:pipeline/*"
         ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "ec2:DescribeDhcpOptions",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DescribeRouteTables",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeSubnets",
            "ec2:DescribeVpcEndpoints",
            "ec2:DescribeVpcs",
            "ecr:BatchCheckLayerAvailability",
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetAuthorizationToken",
            "s3:ListBucket",
            "s3:GetBucketLocation",
            "s3:GetEncryptionConfiguration",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:GetObject",
            "sagemaker:DescribeApp",
            "sagemaker:DescribeDomain",
            "sagemaker:DescribeUserProfile",
            "sagemaker:DescribeSpace",
            "sagemaker:DescribeStudioLifecycleConfig",
            "sagemaker:DescribeImageVersion",
            "sagemaker:DescribeAppImageConfig",
            "sagemaker:CreateTrainingJob",
            "sagemaker:DescribeTrainingJob",
            "sagemaker:StopTrainingJob",
            "sagemaker:Search",
            "sagemaker:CreatePipeline",
            "sagemaker:DescribePipeline",
            "sagemaker:DeletePipeline",
            "sagemaker:StartPipelineExecution"
         ],
         "Resource":"*"
      }
   ]
}
```

------

### Permissions needed to access Amazon S3 resources through a S3 VPC endpoint
<a name="scheduled-notebook-policies-add-vpc"></a>

If you run SageMaker Studio in private VPC mode and access S3 through the S3 VPC endpoint, you can add permissions to the VPC endpoint policy to control which S3 resources are accessible through the VPC endpoint. Add the following permissions to your VPC endpoint policy. You can modify the policy if you need to further restrict permissions—for example, you can provide a more narrow specification for the `Principal` field.

```
{
    "Sid": "S3DriverAccess",
    "Effect": "Allow",
    "Principal": "*",
    "Action": [
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket"
    ],
    "Resource": "arn:aws:s3:::sagemakerheadlessexecution-*"
}
```

For details about how to set up a S3 VPC endpoint policy, see [Edit the VPC endpoint policy](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html#edit-vpc-endpoint-policy-s3).

### Permissions needed to use a custom KMS key (optional)
<a name="scheduled-notebook-policies-add-kms"></a>

By default, the input and output Amazon S3 buckets are encrypted using server side encryption, but you can specify a custom KMS key to encrypt your data in the output Amazon S3 bucket and the storage volume attached to the notebook job.

If you want to use a custom KMS key, attach the following policy and supply your own KMS key ARN.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
         "Effect":"Allow",
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey",
            "kms:CreateGrant"
         ],
         "Resource":"arn:aws:kms:us-east-1:111122223333:key/key-id"
      }
   ]
}
```

------

# Install policies and permissions for local Jupyter environments
<a name="scheduled-notebook-policies-other"></a>

You will need to set up the necessary permissions and policies to schedule notebook jobs in a local Jupyter environment. The IAM user needs permissions to submit jobs to SageMaker AI and the IAM role that the notebook job itself assumes needs permissions to access resources, depending on the job tasks. The following will provide instructions on how to set up the necessary permissions and policies.

You will need to install two sets of permissions. The following diagram shows the permission structure for you to schedule notebook jobs in a local Jupyter environment. The IAM user needs to set up IAM permissions in order to submit jobs to SageMaker AI. Once the user submits the notebook job, the job itself assumes an IAM role that has permissions to access resources depending on the job tasks.

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/notebook-jobs-permissions.png)


The following sections help you install necessary policies and permissions for both the IAM user and the job execution role.

## IAM user permissions
<a name="scheduled-notebook-policies-other-user"></a>

**Permissions to submit jobs to SageMaker AI**

To add permissions to submit jobs, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Users** in the left panel.

1. Find the IAM user for your notebook job and choose the user name.

1. Choose **Add Permissions**, and choose **Create inline policy** from the dropdown menu.

1. Choose the **JSON** tab.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "EventBridgeSchedule",
               "Effect": "Allow",
               "Action": [
                   "events:TagResource",
                   "events:DeleteRule",
                   "events:PutTargets",
                   "events:DescribeRule",
                   "events:EnableRule",
                   "events:PutRule",
                   "events:RemoveTargets",
                   "events:DisableRule"
               ],
               "Resource": "*",
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                   }
               }
           },
           {
               "Sid": "IAMPassrole",
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*",
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": [
                           "sagemaker.amazonaws.com",
                           "events.amazonaws.com"
                       ]
                   }
               }
           },
           {
               "Sid": "IAMListRoles",
               "Effect": "Allow",
               "Action": "iam:ListRoles",
               "Resource": "*"
           },
           {
               "Sid": "S3ArtifactsAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:PutEncryptionConfiguration",
                   "s3:CreateBucket",
                   "s3:PutBucketVersioning",
                   "s3:ListBucket",
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:GetEncryptionConfiguration",
                   "s3:DeleteObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemaker-automated-execution-*"
               ]
           },
           {
               "Sid": "S3DriverAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:ListBucket",
                   "s3:GetObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemakerheadlessexecution-*"
               ]
           },
           {
               "Sid": "SagemakerJobs",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:DescribeTrainingJob",
                   "sagemaker:StopTrainingJob",
                   "sagemaker:DescribePipeline",
                   "sagemaker:CreateTrainingJob",
                   "sagemaker:DeletePipeline",
                   "sagemaker:CreatePipeline"
               ],
               "Resource": "*",
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                   }
               }
           },
           {
               "Sid": "AllowSearch",
               "Effect": "Allow",
               "Action": "sagemaker:Search",
               "Resource": "*"
           },
           {
               "Sid": "SagemakerTags",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:ListTags",
                   "sagemaker:AddTags"
               ],
               "Resource": [
                   "arn:aws:sagemaker:*:*:pipeline/*",
                   "arn:aws:sagemaker:*:*:space/*",
                   "arn:aws:sagemaker:*:*:training-job/*",
                   "arn:aws:sagemaker:*:*:user-profile/*"
               ]
           },
           {
               "Sid": "ECRImage",
               "Effect": "Allow",
               "Action": [
                   "ecr:GetAuthorizationToken",
                   "ecr:BatchGetImage"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

**AWS KMS permission policy (optional)**

By default, the input and output Amazon S3 buckets are encrypted using server side encryption, but you can specify a custom KMS key to encrypt your data in the output Amazon S3 bucket and the storage volume attached to the notebook job.

If you want to use a custom KMS key, repeat the previous instructions, attaching the following policy, and supply your own KMS key ARN.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
         "Effect":"Allow",
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey",
            "kms:CreateGrant"
         ],
         "Resource":"arn:aws:kms:us-east-1:111122223333:key/key-id"
      }
   ]
}
```

------

## Job execution role permissions
<a name="scheduled-notebook-policies-other-job"></a>

**Trust relationships**

To modify the job execution role trust relationships, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Roles** in the left panel.

1. Find the job execution role for your notebook job and choose the role name.

1. Choose the **Trust relationships** tab.

1. Choose **Edit trust policy**.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": [
                       "sagemaker.amazonaws.com",
                       "events.amazonaws.com"
                   ]
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

**Additional permissions**

Once submitted, the notebook job needs permissions to access resources. The following instructions show you how to add a minimal set of permissions. If needed, add more permissions based on your notebook job needs. To add permissions to your job execution role, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Roles** in the left panel.

1. Find the job execution role for your notebook job and choose the role name.

1. Choose **Add Permissions**, and choose **Create inline policy** from the dropdown menu.

1. Choose the **JSON** tab.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "PassroleForJobCreation",
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*",
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": "sagemaker.amazonaws.com"
                   }
               }
           },
           {
               "Sid": "S3ForStoringArtifacts",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:ListBucket",
                   "s3:GetBucketLocation"
               ],
               "Resource": "arn:aws:s3:::sagemaker-automated-execution-*"
           },
           {
               "Sid": "S3DriverAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:ListBucket",
                   "s3:GetObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemakerheadlessexecution-*"
               ]
           },
           {
               "Sid": "SagemakerJobs",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:StartPipelineExecution",
                   "sagemaker:CreateTrainingJob"
               ],
               "Resource": "*"
           },
           {
               "Sid": "ECRImage",
               "Effect": "Allow",
               "Action": [
                   "ecr:GetDownloadUrlForLayer",
                   "ecr:BatchGetImage",
                   "ecr:GetAuthorizationToken",
                   "ecr:BatchCheckLayerAvailability"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Add permissions to other resources your notebook job accesses.

1. Choose **Review policy**.

1. Enter a name for your policy.

1. Choose **Create policy**.

# Where you can create a notebook job
<a name="create-notebook-auto-run"></a>

If you want to create a notebook job, you have multiple options. The following provides the SageMaker AI options for you to create a notebook job.

You can create a job in your JupyterLab notebook in the Studio UI, or you can programmatically create a job with the SageMaker Python SDK:
+ If you create your notebook job in the Studio UI, you supply details about the image and kernel, security configurations, and any custom variables or scripts, and your job is scheduled. For details about how to schedule your job using SageMaker Notebook Jobs, see [Create a notebook job in Studio](create-notebook-auto-run-studio.md).
+ To create a notebook job with the SageMaker Python SDK, you create a pipeline with a Notebook Job step and initiate an on-demand run or optionally use the pipeline scheduling feature to schedule future runs. The SageMaker SDK gives you the flexibility to customize your pipeline—you can expand your pipeline to a workflow with multiple notebook job steps. Since you create both a SageMaker Notebook Job step and a pipeline, you can track your pipeline execution status in the SageMaker Notebook Jobs job dashboard and also view your pipeline graph in Studio. For details about how to schedule your job with the SageMaker Python SDK and links to example notebooks, see [Create notebook job with SageMaker AI Python SDK example](create-notebook-auto-run-sdk.md).

# Create notebook job with SageMaker AI Python SDK example
<a name="create-notebook-auto-run-sdk"></a>

To run a standalone notebook using the SageMaker Python SDK, you need to create a Notebook Job step, attach it into a pipeline, and use the utilities provided by Pipelines to run your job on demand or optionally schedule one or more future jobs. The following sections describe the basic steps to create an on-demand or scheduled notebook job and track the run. In addition, refer to the following discussion if you need to pass parameters to your notebook job or connect to Amazon EMR in your notebook—additional preparation of your Jupyter notebook is required in these cases. You can also apply defaults for a subset of the arguments of `NotebookJobStep` so you don’t have to specify them every time you create a Notebook Job step.

To view sample notebooks that demonstrate how to schedule notebook jobs with the SageMaker AI Python SDK, see [notebook job sample notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/notebook-job-step).

**Topics**
+ [Steps to create a notebook job](#create-notebook-auto-run-overall)
+ [View your notebook jobs in the Studio UI dashboard](#create-notebook-auto-run-dash)
+ [View your pipeline graph in Studio](#create-notebook-auto-run-graph)
+ [Passing parameters to your notebook](#create-notebook-auto-run-passparam)
+ [Connecting to an Amazon EMR cluster in your input notebook](#create-notebook-auto-run-emr)
+ [Set up default options](#create-notebook-auto-run-intdefaults)

## Steps to create a notebook job
<a name="create-notebook-auto-run-overall"></a>

You can either create a notebook job that runs immediately or on a schedule. The following instructions describe both methods.

**To schedule a notebook job, complete the following basic steps:**

1. Create a `NotebookJobStep` instance. For details about `NotebookJobStep` parameters, see [sagemaker.workflow.steps.NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep). At minimum, you can provide the following arguments as shown in the following code snippet:
**Important**  
If you schedule your notebook job using the SageMaker Python SDK, you can only specify certain images to run your notebook job. For more information, see [Image constraints for SageMaker AI Python SDK notebook jobs](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk).

   ```
   notebook_job_step = NotebookJobStep(
       input_notebook=input-notebook,
       image_uri=image-uri,
       kernel_name=kernel-name
   )
   ```

1. Create a pipeline with your `NotebookJobStep` as a single step, as shown in the following snippet:

   ```
   pipeline = Pipeline(
       name=pipeline-name,
       steps=[notebook_job_step],
       sagemaker_session=sagemaker-session,
   )
   ```

1. Run the pipeline on demand or optionally schedule future pipeline runs. To initiate an immediate run, use the following command:

   ```
   execution = pipeline.start(
       parameters={...}
   )
   ```

   Optionally, you can schedule a single future pipeline run or multiple runs at a predetermined interval. You specify your schedule in `PipelineSchedule` and then pass the schedule object to your pipeline with `put_triggers`. For more information about pipeline scheduling, see [Schedule a pipeline with the SageMaker Python SDK](pipeline-eventbridge.md#build-and-manage-scheduling).

   The following example schedules your pipeline to run once at December 12, 2023 at 10:31:32 UTC.

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       at=datetime(year=2023, month=12, date=25, hour=10, minute=31, second=32) 
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

   The following example schedules your pipeline to run at 10:15am UTC on the last Friday of each month during the years 2022 to 2023. For details about cron-based scheduling, see [Cron-based schedules](https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html#cron-based).

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       cron="15 10 ? * 6L 2022-2023"
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

1. (Optional) View your notebook jobs in the SageMaker Notebook Jobs dashboard. The values you supply for the `tags` argument of your Notebook Job step control how the Studio UI captures and displays the job. For more information, see [View your notebook jobs in the Studio UI dashboard](#create-notebook-auto-run-dash).

## View your notebook jobs in the Studio UI dashboard
<a name="create-notebook-auto-run-dash"></a>

The notebook jobs you create as pipeline steps appear in the Studio Notebook Job dashboard if you specify certain tags.

**Note**  
Only notebook jobs created in Studio or local JupyterLab environments create job definitions. Therefore, if you create your notebook job with the SageMaker Python SDK, you don’t see job definitions in the Notebook Jobs dashboard. You can, however, view your notebook jobs as described in [View notebook jobs](view-notebook-jobs.md). 

You can control which team members can view your notebook jobs with the following tags:
+ To display the notebook to all user profiles or [spaces](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html) in a domain, add the domain tag with your domain name. An example is shown as follows:
  + key: `sagemaker:domain-name`, value: `d-abcdefghij5k`
+ To display the notebook job to a certain user profile in a domain, add both the user profile and the domain tags. An example of a user profile tag is shown as follows:
  + key: `sagemaker:user-profile-name`, value: `studio-user`
+ To display the notebook job to a [space](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html), add both the space and the domain tags. An example of a space tag is shown as follows:
  + key: `sagemaker:shared-space-name`, value: `my-space-name`
+ If you do not attach any domain or user profile or space tags, then the Studio UI does not show the notebook job created by pipeline step. In this case, you can view the underlying training job in the training job console or you can view the status in the [list of pipeline executions](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-studio-view-execution.html).

Once you set up the necessary tags to view your jobs in the dashboard, see [View notebook jobs](view-notebook-jobs.md) for instructions about how to view your jobs and download outputs.

## View your pipeline graph in Studio
<a name="create-notebook-auto-run-graph"></a>

Since your notebook job step is part of a pipeline, you can view the pipeline graph (DAG) in Studio. In the pipeline graph, you can view the status of the pipeline run and track lineage. For details, see [View the details of a pipeline run](pipelines-studio-view-execution.md).

## Passing parameters to your notebook
<a name="create-notebook-auto-run-passparam"></a>

If you want to pass parameters to your notebook job (using the `parameters` argument of `NotebookJobStep`), you need to prepare your input notebook to receive the parameters. 

The Papermill-based notebook job executor searches for a Jupyter cell tagged with the `parameters` tag and applies the new parameters or parameter overrides immediately after this cell. For details, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md). 

Once you have performed this step, pass your parameters to your `NotebookJobStep`, as shown in the following example:

```
notebook_job_parameters = {
    "company": "Amazon"
}

notebook_job_step = NotebookJobStep(
    image_uri=image-uri,
    kernel_name=kernel-name,
    role=role-name,
    input_notebook=input-notebook,
    parameters=notebook_job_parameters,
    ...
)
```

## Connecting to an Amazon EMR cluster in your input notebook
<a name="create-notebook-auto-run-emr"></a>

If you connect to an Amazon EMR cluster from your Jupyter notebook in Studio, you might need to further modify your Jupyter notebook. See [Connect to an Amazon EMR cluster from your notebook](scheduled-notebook-connect-emr.md) if you need to perform any of the following tasks in your notebook:
+ **Pass parameters into your Amazon EMR connection command.** Studio uses Papermill to run notebooks. In SparkMagic kernels, parameters you pass to your Amazon EMR connection command may not work as expected due to how Papermill passes information to SparkMagic.
+ **Passing user credentials to Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR clusters.** You have to pass user credentials through the AWS Secrets Manager.

## Set up default options
<a name="create-notebook-auto-run-intdefaults"></a>

The SageMaker SDK gives you the option to set defaults for a subset of parameters so you don’t have to specify these parameters every time you create a `NotebookJobStep` instance. These parameters are `role`, `s3_root_uri`, `s3_kms_key`, `volume_kms_key`, `subnets`, and `security_group_ids`. Use the SageMaker AI config file to set the defaults for the step. For information about the SageMaker AI configuration file, see [Configuring and using defaults with the SageMaker Python SDK.](https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk).

To set up the notebook job defaults, apply your new defaults to the notebook job section of the config file as shown in the following snippet:

```
SageMaker:
  PythonSDK:
    Modules:
      NotebookJob:
        RoleArn: 'arn:aws:iam::555555555555:role/IMRole'
        S3RootUri: 's3://amzn-s3-demo-bucket/my-project'
        S3KmsKeyId: 's3kmskeyid'
        VolumeKmsKeyId: 'volumekmskeyid1'
        VpcConfig:
          SecurityGroupIds:
            - 'sg123'
          Subnets:
            - 'subnet-1234'
```

# Create a notebook job in Studio
<a name="create-notebook-auto-run-studio"></a>

**Note**  
The notebook scheduler is built from the Amazon EventBridge, SageMaker Training, and Pipelines services. If your notebook jobs fail, you might see errors related to these services. The following provides information on how to create a notebook job in the Studio UI.

SageMaker Notebook Jobs gives you the tools to create and manage your noninteractive notebook jobs using the Notebook Jobs widget. You can create jobs, view the jobs you created, and pause, stop, or resume existing jobs. You can also modify notebook schedules.

When you create your scheduled notebook job with the widget, the scheduler tries to infer a selection of default options and automatically populates the form to help you get started quickly. If you are using Studio, at minimum you can submit an on-demand job without setting any options. You can also submit a (scheduled) notebook job definition supplying just the time-specific schedule information. However, you can customize other fields if your scheduled job requires specialized settings. If you are running a local Jupyter notebook, the scheduler extension provides a feature for you to specify your own defaults (for a subset of options) so you don't have to manually insert the same values every time.

When you create a notebook job, you can include additional files such as datasets, images, and local scripts. To do so, choose **Run job with input folder**. The Notebook Job will now have access to all files under the input file's folder. While the notebook job is running the file structure of directory remains unchanged.

To schedule a notebook job, complete the following steps.

1. Open the **Create Job** form.

   In local JupyterLab environments, choose the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the taskbar. If you don't see the icon, follow the instructions in [Installation guide](scheduled-notebook-installation.md) to install it.

   In Studio, open the form in one of two ways:
   + Using the **File Browser**

     1. In the **File Browser** in the left panel, right-click on the notebook you want to run as a scheduled job.

     1. Choose **Create Notebook Job**.
   + Within the Studio notebook
     + Inside the Studio notebook you want to run as a scheduled job, choose the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the Studio toolbar.

1. Complete the popup form. The form displays the following fields:
   + **Job name**: A descriptive name you specify for your job.
   + **Input file**: The name of the notebook which you are scheduling to run in noninteractive mode.
   + **Compute type**: The type of Amazon EC2 instance in which you want to run your notebook.
   + **Parameters**: Custom parameters you can optionally specify as inputs to your notebook. To use this feature, you might optionally want to tag a specific cell in your Jupyter notebook with the **parameters** tag to control where your parameters are applied. For more details, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md).
   + (Optional)**Run job with input folder**: If selected the scheduled job will have access to all the files found in the same folder as the **Input file**.
   + **Additional Options**: You can specify additional customizations for your job. For example, you can specify an image or kernel, input and output folders, job retry and timeout options, encryption details, and custom initialization scripts. For the complete listing of customizations you can apply, see [Available options](create-notebook-auto-execution-advanced.md).

1. Schedule your job. You can run your notebook on demand or on a fixed schedule.
   + To run the notebook on demand, complete the following steps:
     + Select **Run Now**.
     + Choose **Create**.
     + The **Notebook Jobs** tab appears. Choose **Reload** to load your job into the dashboard.
   + To run the notebook on a fixed schedule, complete the following steps:
     + Choose **Run on a schedule**.
     + Choose the **Interval** dropdown list and select an interval. The intervals range from every minute to monthly. You can also select **Custom schedule**.
     + Based on the interval you choose, additional fields appear to help you further specify your desired run day and time. For example, if you select **Day** for a daily run, an additional field appears for you to specify the desired time. Note that any time you specify is in UTC format. Note also that if you choose a small interval, such as one minute, your jobs overlap if the previous job is not complete when the next job starts.

       If you select a custom schedule, you use cron syntax in the expression box to specify your exact run date and time. The cron syntax is a space-separated list of digits, each of which represent a unit of time from seconds to years. For help with cron syntax, you can choose **Get help with cron syntax** under the expression box.
     + Choose **Create**.
     + The **Notebook Job Definitions** tab appears. Choose **Reload** to load your job definition into the dashboard.

# Set up default options for local notebooks
<a name="create-notebook-auto-execution-advanced-default"></a>

**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the Studio Classic application. For information about using the updated Studio experience, see [Amazon SageMaker Studio](studio-updated.md).  
Studio Classic is still maintained for existing workloads but is no longer available for onboarding. You can only stop or delete existing Studio Classic applications and cannot create new ones. We recommend that you [migrate your workload to the new Studio experience](studio-updated-migrate.md).

You can set up default options when you create a notebook job. This can save you time if you plan to create multiple notebook jobs with different options than the provided defaults. The following provides information on how to set up the default options for local notebooks.

If you have to manually type (or paste in) custom values in the **Create Job** form, you can store new default values and the scheduler extension inserts your new values every time you create a new job definition. This feature is available for the following options:
+ **Role ARN**
+ **S3 Input Folder**
+ **S3 Output Folder**
+ **Output encryption KMS key** (if you turn on **Configure Job Encryption**)
+ **Job instance volume encryption KMS key** (if you turn on **Configure Job Encryption**)

This feature saves you time if you insert different values than the provided defaults and continue to use those values for future job runs. Your chosen user settings are stored on the machine that runs your JupyterLab server and are retrieved with the help of native API. If you provide new default values for one or more but not all five options, the previous defaults are taken for the ones you don’t customize.

The following instructions show you how to preview the existing default values, set new default values, and reset your default values for your notebook jobs.

**To preview existing default values for your notebook jobs, complete the following steps:**

1. Open the Amazon SageMaker Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. In the **File Browser** in the left panel, right-click on the notebook you want to run as a scheduled job.

1. Choose **Create Notebook Job**.

1. Choose **Additional options** to expand the tab of notebook job settings. You can view the default settings here. 

**To set new default values for your future notebook jobs, complete the following steps:**

1. Open the Amazon SageMaker Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. From the top menu in Studio Classic, choose **Settings**, then choose **Advanced Settings Editor**.

1. Choose **Amazon SageMaker Scheduler** from the list below **Settings**. This may already be open by default.

1. You can update the default settings directly in this UI page or by using the JSON editor.
   + In the UI you can insert new values for **Role ARN**, **S3 Input Folder**, **S3 Output Folder**, **Output encryption KMS key**, or **Job instance volume encryption KMS key**. If you change these values, you will see the new defaults for these fields while you create your next notebook job under **Additional options**.
   + (Optional) To update the user defaults using the **JSON Settings Editor**, complete the following steps:

     1. In the top right corner, choose **JSON Settings Editor**.

     1. In the **Settings** left sidebar, choose **Amazon SageMaker AI Scheduler**. This may already be open by default.

        You can see your current default values in the **User Preferences** panel.

        You can see the system default values in the **System Defaults** panel.

     1. To update your default values, copy and paste the JSON snippet from the **System Defaults** panel to the **User Preferences** panel, and update the fields.

     1. If you updated the default values, choose the **Save User Settings** icon (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Notebook_save.png)) in the top right corner. Closing the editor does not save the changes.

**If you previously changed and now want to reset the user-defined default values, complete following steps:**

1. From the top menu in Studio Classic, choose **Settings**, then choose **Advanced Settings Editor**.

1. Choose **Amazon SageMaker Scheduler** from the list below **Settings**. This may already be open by default.

1. You can restore the defaults by directly using this UI page or using the JSON editor.
   + In the UI you can choose **Restore to Defaults** in the top right corner. Your defaults are restored to empty strings. You only see this option if you previously changed your default values.
   + (Optional) To restart the default settings using the **JSON Settings Editor**, complete the following steps:

     1. In the top right corner, choose **JSON Settings Editor**.

     1. In the **Settings** left sidebar, choose **Amazon SageMaker AI Scheduler**. This may already be open by default.

        You can see your current default values in the **User Preferences** panel.

        You can see the system default values in the **System Defaults** panel.

     1. To restore your current default settings copy the content from the **System Defaults** panel to the **User Preferences** panel.

     1. Choose the **Save User Settings** icon (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Notebook_save.png)) in the top right corner. Closing the editor does not save the changes.

# Notebook job workflows
<a name="create-notebook-auto-run-dag"></a>

Since a notebook job runs your custom code, you can create a pipeline that includes one or more notebook job steps. ML workflows often contain multiple steps, such as a processing step to preprocess data, a training step to build your model, and a model evaluation step, among others. One possible use of notebook jobs is to handle preprocessing—you might have a notebook that performs data transformation or ingestion, an EMR step that performs data cleaning, and another notebook job that performs featurization of your inputs before initiating a training step. A notebook job may require information from previous steps in the pipeline or from user-specified customization as parameters in the input notebook. For examples that show how to pass environment variables and parameters to your notebook and retrieve information from prior steps, see [Pass information to and from your notebook step](create-notebook-auto-run-dag-seq.md).

In another use case, one of your notebook jobs might call another notebook to perform some tasks during your notebook run—in this scenario you need to specify these sourced notebooks as dependencies with your notebook job step. For information about how to call another notebook, see [Invoke another notebook in your notebook job](create-notebook-auto-run-dag-call.md).

To view sample notebooks that demonstrate how to schedule notebook jobs with the SageMaker AI Python SDK, see [notebook job sample notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/notebook-job-step).

# Pass information to and from your notebook step
<a name="create-notebook-auto-run-dag-seq"></a>

The following sections describe ways to pass information to your notebook as environment variables and parameters.

## Pass environment variables
<a name="create-notebook-auto-run-dag-seq-env-var"></a>

Pass environment variables as a dictionary to the `environment_variable` argument of your `NotebookJobStep`, as shown in the following example:

```
environment_variables = {"RATE": 0.0001, "BATCH_SIZE": 1000}

notebook_job_step = NotebookJobStep(
    ...
    environment_variables=environment_variables,
    ...
)
```

You can use the environment variables in the notebook using `os.getenv()`, as shown in the following example:

```
# inside your notebook
import os
print(f"ParentNotebook: env_key={os.getenv('env_key')}")
```

## Pass parameters
<a name="create-notebook-auto-run-dag-seq-param"></a>

When you pass parameters to the first Notebook Job step in your `NotebookJobStep` instance, you might optionally want to tag a cell in your Jupyter notebook to indicate where to apply new parameters or parameter overrides. For instructions about how to tag a cell in your Jupyter notebook, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md).

You pass parameters through the Notebook Job step's `parameters` parameter, as shown in the following snippet:

```
notebook_job_parameters = {
    "company": "Amazon",
}

notebook_job_step = NotebookJobStep(
    ...
    parameters=notebook_job_parameters,
    ...
)
```

Inside your input notebook, your parameters are applied after the cell tagged with `parameters` or at the beginning of the notebook if you don’t have a tagged cell.

```
# this cell is in your input notebook and is tagged with 'parameters'
# your parameters and parameter overrides are applied after this cell
company='default'
```

```
# in this cell, your parameters are applied
# prints "company is Amazon"
print(f'company is {company}')
```

## Retrieve information from a previous step
<a name="create-notebook-auto-run-dag-seq-interstep"></a>

The following discussion explains how you can extract data from a previous step to to pass to your Notebook Job step.

**Use `properties` attribute**

You can use the following properties with the previous step's `properties` attribute:
+ `ComputingJobName`—The training job name
+ `ComputingJobStatus`—The training job status
+ `NotebookJobInputLocation`—The input Amazon S3 location
+ `NotebookJobOutputLocationPrefix`—The path to your training job outputs, more specifically `{NotebookJobOutputLocationPrefix}/{training-job-name}/output/output.tar.gz`. containing outputs
+ `InputNotebookName`—The input notebook file name
+ `OutputNotebookName`—The output notebook file name (which may not exist in the training job output folder if the job fails)

The following code snippet shows how to extract parameters from the properties attribute.

```
notebook_job_step2 = NotebookJobStep(
    ....
    parameters={
        "step1_JobName": notebook_job_step1.properties.ComputingJobName,
        "step1_JobStatus": notebook_job_step1.properties.ComputingJobStatus,
        "step1_NotebookJobInput": notebook_job_step1.properties.NotebookJobInputLocation,
        "step1_NotebookJobOutput": notebook_job_step1.properties.NotebookJobOutputLocationPrefix,
    }
```

**Use JsonGet**

If you want to pass parameters other than the ones previously mentioned and the JSON outputs of your previous step reside in Amazon S3, use `JsonGet`. `JsonGet` is a general mechanism that can directly extract data from JSON files in Amazon S3.

To extract JSON files in Amazon S3 with `JsonGet`, complete the following steps:

1. Upload your JSON file to Amazon S3. If your data is already uploaded to Amazon S3, skip this step. The following example demonstrates uploading a JSON file to Amazon S3.

   ```
   import json
   from sagemaker.s3 import S3Uploader
   
   output = {
       "key1": "value1", 
       "key2": [0,5,10]
   }
               
   json_output = json.dumps(output)
   
   with open("notebook_job_params.json", "w") as file:
       file.write(json_output)
   
   S3Uploader.upload(
       local_path="notebook_job_params.json",
       desired_s3_uri="s3://path/to/bucket"
   )
   ```

1. Provide your S3 URI and the JSON path to the value you want to extract. In the following example, `JsonGet` returns an object representing index 2 of the value associated with key `key2` (`10`).

   ```
   NotebookJobStep(
       ....
       parameters={
           # the key job_key1 returns an object representing the value 10
           "job_key1": JsonGet(
               s3_uri=Join(on="/", values=["s3:/", ..]),
               json_path="key2[2]" # value to reference in that json file
           ), 
           "job_key2": "Amazon" 
       }
   )
   ```

# Invoke another notebook in your notebook job
<a name="create-notebook-auto-run-dag-call"></a>

You can set up a pipeline in which one notebook job calls another notebook. The following sets up an example of a pipeline with a Notebook Job step in which the notebook calls two other notebooks. The input notebook contains the following lines:

```
%run 'subfolder/notebook_to_call_in_subfolder.ipynb'
%run 'notebook_to_call.ipynb'
```

Pass these notebooks into your `NotebookJobStep` instances with `additional_dependencies`, as shown in the following snippet. Note that the paths provided for the notebooks in `additional_dependencies` are provided from the root location. For information about how SageMaker AI uploads your dependent files and folders to Amazon S3 so you can correctly provide paths to your dependencies, see the description for `additional_dependencies` in [NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep).

```
input_notebook = "inputs/input_notebook.ipynb"
simple_notebook_path = "inputs/notebook_to_call.ipynb"
folder_with_sub_notebook = "inputs/subfolder"

notebook_job_step = NotebookJobStep(
    image_uri=image-uri,
    kernel_name=kernel-name,
    role=role-name,
    input_notebook=input_notebook,
    additional_dependencies=[simple_notebook_path, folder_with_sub_notebook],
    tags=tags,
)
```

# Available options
<a name="create-notebook-auto-execution-advanced"></a>

The following table displays all available options you can use to customize your notebook job, whether you run your Notebook Job in Studio, a local Jupyter environment, or using the SageMaker Python SDK. The table includes the type of custom option, a description, additional guidelines about how to use the option, a field name for the option in Studio (if available) and the parameter name for the notebook job step in the SageMaker Python SDK (if available).

For some options, you can also preset custom default values so you don’t have to specify them every time you set up a notebook job. For Studio, these options are **Role**, **Input folder**, **Output folder**, and **KMS Key ID**, and are specified in the following table. If you preset custom defaults for these options, these fields are prepopulated in the **Create Job** form when you create your notebook job. For details about how to create custom defaults in Studio and local Jupyter environments, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md).

The SageMaker SDK also gives you the option to set intelligent defaults so that you don’t have to specify these parameters when you create a `NotebookJobStep`. These parameters are `role`, `s3_root_uri`, `s3_kms_key`, `volume_kms_key`, `subnets`, `security_group_ids`, and are specified in the following table. For information about how to set intelligent defaults, see [Set up default options](create-notebook-auto-run-sdk.md#create-notebook-auto-run-intdefaults).


| Custom option | Description | Studio-specific guideline | Local Jupyter environment guideline | SageMaker Python SDK guideline | 
| --- | --- | --- | --- | --- | 
| Job name | Your job name as it should appear in the Notebook Jobs dashboard. | Field Job name. | Same as Studio. | Parameter notebook\$1job\$1name. Defaults to None. | 
| Image | The container image used to run the notebook noninteractively on the chosen compute type. | Field Image. This field defaults to your notebook’s current image. Change this field from the default to a custom value if needed. If Studio cannot infer this value, the form displays a validation error requiring you to specify it. This image can be a custom, [bring-your-own image](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi.html) or an available Amazon SageMaker image. For a list of available SageMaker images supported by the notebook scheduler, see [Amazon SageMaker Images Available for Use With Studio Classic Notebooks](notebooks-available-images.md). | Field Image. This field requires an ECR URI of a Docker image that can run the provided notebook on the selected compute type. By default, the scheduler extension uses a pre-built SageMaker AI Docker image—base Python 2.0. This is the official Python 3.8 image from DockerHub with boto3, AWS CLI, and the Python 3 kernel. You can also provide any ECR URI that meets the notebook custom image specification. For details, see [Custom SageMaker Image Specifications for Amazon SageMaker Studio Classic](studio-byoi-specs.md). This image should have all the kernels and libraries needed for the notebook run. | Required. Parameter image\$1uri. URI location of a Docker image on ECR. You can use specific SageMaker Distribution Images or custom image based on those images, or your own image pre-installed with notebook job dependencies that meets additional requirements. For details, see [Image constraints for SageMaker AI Python SDK notebook jobs](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk). | 
| Instance type | The EC2 instance type to use to run the notebook job. The notebook job uses a SageMaker Training Job as a computing layer, so the specified instance type should be a SageMaker Training supported instance type. | Field Compute type. Defaults to ml.m5.large. | Same as Studio. | Parameter instance\$1type. Defaults to ml.m5.large. | 
| Kernel | The Jupyter kernel used to run the notebook job. | Field Kernel. This field defaults to your notebook’s current kernel. Change this field from the default to a custom value if needed. If Studio cannot infer this value, the form displays a validation error requiring you to specify it. | Field Kernel. This kernel should be present in the image and follow the Jupyter kernel specs. This field defaults to the Python3 kernel found in the base Python 2.0 SageMaker image. Change this field to a custom value if needed. | Required. Parameter kernel\$1name. This kernel should be present in the image and follow the Jupyter kernel specs. To see the kernel identifiers for your image, see (LINK). | 
| SageMaker AI session | The underlying SageMaker AI session to which SageMaker AI service calls are delegated. | N/A | N/A | Parameter sagemaker\$1session. If unspecified, one is created using a default configuration chain. | 
| Role ARN | The role’s Amazon Resource Name (ARN) used with the notebook job. | Field Role ARN. This field defaults to the Studio execution role. Change this field to a custom value if needed.  If Studio cannot infer this value, the **Role ARN** field is blank. In this case, insert the ARN you want to use.  | Field Role ARN. This field defaults to any role prefixed with SagemakerJupyterScheduler. If you have multiple roles with the prefix, the extension chooses one. Change this field to a custom value if needed. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | Parameter role. Defaults to the SageMaker AI default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks. Otherwise, it throws a ValueError. Allows intelligent defaults. | 
| Input notebook | The name of the notebook which you are scheduling to run. | Required. Field Input file. | Same as Studio. | Required.Parameter input\$1notebook. | 
| Input folder | The folder containing your inputs. The job inputs, including the input notebook and any optional start-up or initialization scripts, are put in this folder. | Field Input folder. If you don’t provide a folder, the scheduler creates a default Amazon S3 bucket for your inputs. | Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | N/A. The input folder is placed inside the location specified by parameter s3\$1root\$1uri. | 
| Output folder | The folder containing your outputs. The job outputs, including the output notebook and logs, are put in this folder. | Field Output folder. If you don’t specify a folder, the scheduler creates a default Amazon S3 bucket for your outputs. | Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | N/A. The output folder is placed inside the location specified by parameter s3\$1root\$1uri. | 
| Parameters | A dictionary of variables and values to pass to your notebook job. | Field Parameters. You need to [parameterize your notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html) to accept parameters. | Same as Studio. | Parameter parameters. You need to [parameterize your notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html) to accept parameters. | 
| Additional (file or folder) dependencies | The list of file or folder dependencies which the notebook job uploads to s3 staged folder. | Not supported. | Not supported. | Parameter additional\$1dependencies. The notebook job uploads these dependencies to an S3 staged folder so they can be consumed during execution. | 
| S3 root URI | The folder containing your inputs. The job inputs, including the input notebook and any optional start-up or initialization scripts, are put in this folder. This S3 bucket must be in the same AWS account that you're using to run your notebook job. | N/A. Use Input Folder and Output folder. | Same as Studio. | Parameter s3\$1root\$1uri. Defaults to a default S3 bucket. Allows intelligent defaults. | 
| Environment variables | Any existing environment variables that you want to override, or new environment variables that you want to introduce and use in your notebook. | Field Environment variables. | Same as Studio. | Parameter environment\$1variables. Defaults to None. | 
| Tags | A list of tags attached to the job. | N/A | N/A | Parameter tags. Defaults to None. Your tags control how the Studio UI captures and displays the job created by the pipeline. For details, see [View your notebook jobs in the Studio UI dashboard](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash). | 
| Start-up script | A script preloaded in the notebook startup menu that you can choose to run before you run the notebook. | Field Start-up script. Select a Lifecycle Configuration (LCC) script that runs on the image at start-up. A start-up script runs in a shell outside of the Studio environment. Therefore, this script cannot depend on the Studio local storage, environment variables, or app metadata (in `/opt/ml/metadata`). Also, if you use a start-up script and an initialization script, the start-up script runs first.   | Not supported. | Not supported. | 
| Initialization script | A path to a local script you can run when your notebook starts up. | Field Initialization script. Enter the EFS file path where a local script or a Lifecycle Configuration (LCC) script is located. If you use a start-up script and an initialization script, the start-up script runs first. An initialization script is sourced from the same shell as the notebook job. This is not the case for a start-up script described previously. Also, if you use a start-up script and an initialization script, the start-up script runs first.    | Field Initialization script. Enter the local file path where a local script or a Lifecycle Configuration (LCC) script is located.  | Parameter initialization\$1script. Defaults to None. | 
| Max retry attempts | The number of times Studio tries to rerun a failed job run. | Field Max retry attempts. Defaults to 1. | Same as Studio. | Parameter max\$1retry\$1attempts. Defaults to 1. | 
| Max run time (in seconds) | The maximum length of time, in seconds, that a notebook job can run before it is stopped. If you configure both Max run time and Max retry attempts, the run time applies to each retry. If a job does not complete in this time, its status is set to Failed. | Field Max run time (in seconds). Defaults to 172800 seconds (2 days). | Same as Studio. | Parameter max\$1runtime\$1in\$1seconds. Defaults to 172800 seconds (2 days). | 
| Retry policies | A list of retry policies, which govern actions to take in case of failure. | Not supported. | Not supported. | Parameter retry\$1policies. Defaults to None. | 
| Add Step or StepCollection dependencies | A list of Step or StepCollection names or instances on which the job depends. | Not supported. | Not supported. | Parameter depends\$1on. Defaults to None. Use this to define explicit dependencies between steps in your pipeline graph. | 
| Volume size | The size in GB of the storage volume for storing input and output data during training. | Not supported. | Not supported. | Parameter volume\$1size. Defaults to 30GB. | 
| Encrypt traffic between containers | A flag that specifies whether traffic between training containers is encrypted for the training job. | N/A. Enabled by default. | N/A. Enabled by default. | Parameter encrypt\$1inter\$1container\$1traffic. Defaults to True. | 
| Configure job encryption | An indicator that you want to encrypt your notebook job outputs, job instance volume, or both. | Field Configure job encryption. Check this box to choose encryption. If left unchecked, the job outputs are encrypted with the account's default KMS key and the job instance volume is not encrypted. | Same as Studio. | Not supported. | 
| Output encryption KMS key | A KMS key to use if you want to customize the encryption key used for your notebook job outputs. This field is only applicable if you checked Configure job encryption. | Field Output encryption KMS key. If you do not specify this field, your notebook job outputs are encrypted with SSE-KMS using the default Amazon S3 KMS key. Also, if you create the Amazon S3 bucket yourself and use encryption, your encryption method is preserved. | Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | Parameter s3\$1kms\$1key. Defaults to None. Allows intelligent defaults. | 
| Job instance volume encryption KMS key | A KMS key to use if you want to encrypt your job instance volume. This field is only applicable if you checked Configure job encryption. | Field Job instance volume encryption KMS key. | Field Job instance volume encryption KMS key. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | Parameter volume\$1kms\$1key. Defaults to None. Allows intelligent defaults. | 
| Use a Virtual Private Cloud to run this job (for VPC users) | An indicator that you want to run this job in a Virtual Private Cloud (VPC). For better security, it is recommend that you use a private VPC. | Field Use a Virtual Private Cloud to run this job. Check this box if you want to use a VPC. At minimum, create the following VPC endpoints to enable your notebook job to privately connect to those AWS resources: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/create-notebook-auto-execution-advanced.html)If you choose to use a VPC, you need to specify at least one private subnet and at least one security group in the following options. If you don’t use any private subnets, you need to consider other configuration options. For details, see Public VPC subnets not supported in [Constraints and considerations](notebook-auto-run-constraints.md). | Same as Studio. | N/A | 
| Subnet(s) (for VPC users) | Your subnets. This field must contain at least one and at most five, and all the subnets you provide should be private. For details, see Public VPC subnets not supported in [Constraints and considerations](notebook-auto-run-constraints.md). | Field Subnet(s). This field defaults to the subnets associated with the Studio domain, but you can change this field if needed. | Field Subnet(s). The scheduler cannot detect your subnets, so you need to enter any subnets you configured for your VPC. | Parameter subnets. Defaults to None. Allows intelligent defaults. | 
| Security group(s) (for VPC users) | Your security groups. This field must contain at least one and at most 15. For details, see Public VPC subnets not supported in [Constraints and considerations](notebook-auto-run-constraints.md). | Field Security groups. This field defaults to the security groups associated with the domain VPC, but you can change this field if needed. | Field Security groups. The scheduler cannot detect your security groups, so you need to enter any security groups you configured for your VPC. | Parameter security\$1group\$1ids. Defaults to None. Allows intelligent defaults. | 
| Name | The name of the notebook job step. | N/A | N/A | Parameter name. If unspecified, it is derived from the notebook file name. | 
| Display name | Your job name as it should appear in your list of pipeline executions. | N/A | N/A | Parameter display\$1name. Defaults to None. | 
| Description | A description of your job. | N/A | N/A | Parameter description. | 

# Parameterize your notebook
<a name="notebook-auto-run-troubleshoot-override"></a>

To pass new parameters or parameter overrides to your scheduled notebook job, you might optionally want to modify your Jupyter notebook if you want your new parameters values to be applied after a cell. When you pass a parameter, the notebook job executor uses the methodology enforced by Papermill. The notebook job executor searches for a Jupyter cell tagged with the `parameters` tag and applies the new parameters or parameter overrides immediately after this cell. If you don’t have any cells tagged with `parameters`, the parameters are applied at the beginning of the notebook. If you have more than one cell tagged with `parameters`, the parameters are applied after the first cell tagged with `parameters`.

To tag a cell in your notebook with the `parameters` tag, complete the following steps:

1. Select the cell to parameterize.

1. Choose the **Property Inspector** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/gears.png)) in the right sidebar.

1. Type **parameters** in the **Add Tag** box.

1. Choose the **\$1** sign.

1. The `parameters` tag appears under **Cell Tags** with a check mark, which means the tag is applied to the cell.

# Connect to an Amazon EMR cluster from your notebook
<a name="scheduled-notebook-connect-emr"></a>

If you connect to an Amazon EMR cluster from your Jupyter notebook in Studio, you might need to perform additional setup. In particular, the following discussion addresses two issues:
+ **Passing parameters into your Amazon EMR connection command**. In SparkMagic kernels, parameters you pass to your Amazon EMR connection command may not work as expected due to differences in how Papermill passes parameters and how SparkMagic receives parameters. The workaround to address this limitation is to pass parameters as environment variables. For more details about the issue and workaround, see [Pass parameters to your EMR connection command](#scheduled-notebook-connect-emr-pass-param).
+ **Passing user credentials to Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR clusters**. In interactive mode, Studio asks for credentials in a popup form where you can enter your sign-in credentials. In your noninteractive scheduled notebook, you have to pass them through the AWS Secrets Manager. For more details about how to use the AWS Secrets Manager in your scheduled notebook jobs, see [Pass user credentials to your Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR cluster](#scheduled-notebook-connect-emr-credentials).

## Pass parameters to your EMR connection command
<a name="scheduled-notebook-connect-emr-pass-param"></a>

If you are using images with the SparkMagic PySpark and Spark kernels and want to parameterize your EMR connection command, provide your parameters in the **Environment variables** field instead of the Parameters field in the Create Job form (in the **Additional Options** dropdown menu). Make sure your EMR connection command in the Jupyter notebook passes these parameters as environment variables. For example, suppose you pass `cluster-id` as an environment variable when you create your job. Your EMR connection command should look like the following:

```
%%local
import os
```

```
%sm_analytics emr connect —cluster-id {os.getenv('cluster_id')} --auth-type None
```

You need this workaround to meet requirements by SparkMagic and Papermill. For background context, the SparkMagic kernel expects that the `%%local` magic command accompany any local variables you define. However, Papermill does not pass the `%%local` magic command with your overrides. In order to work around this Papermill limitation, you must supply your parameters as environment variables in the **Environment variables** field.

## Pass user credentials to your Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR cluster
<a name="scheduled-notebook-connect-emr-credentials"></a>

To establish a secure connection to an Amazon EMR cluster that uses Kerberos, LDAP, or HTTP Basic Auth authentication, you use the AWS Secrets Manager to pass user credentials to your connection command. For information about how to create a Secrets Manager secret, see [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html). Your secret must contain your username and password. You pass the secret with the `--secrets` argument, as shown in the following example:

```
%sm_analytics emr connect --cluster-id j_abcde12345 
    --auth Kerberos 
    --secret aws_secret_id_123
```

Your administrator can set up a flexible access policy using an attribute-based-access-control (ABAC) method, which assigns access based on special tags. You can set up flexible access to create a single secret for all users in the account or a secret for each user. The following code samples demonstrate these scenarios:

**Create a single secret for all users in the account**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"
            },
            "Action": "secretsmanager:GetSecretValue",
            "Resource": [
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes123-1a2b3c",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes456-4d5e6f",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes789-7g8h9i"
            ]
        }
    ]
}
```

------

**Create a different secret for each user**

You can create a different secret for each user using the `PrincipleTag` tag, as shown in the following example:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"
            },
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/user-identity": "${aws:PrincipalTag/user-identity}"
                }
            },
            "Action": "secretsmanager:GetSecretValue",
            "Resource": [
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes123-1a2b3c",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes456-4d5e6f",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes789-7g8h9i"
            ]
        }
    ]
}
```

------

# Notebook jobs details in Amazon SageMaker Studio
<a name="track-jobs-jobdefs"></a>

SageMaker Notebook Jobs dashboards help organize the job definitions that you schedule, and also keep track of the actual jobs that run from your job definitions. There are two important concepts to understand when scheduling notebook jobs: *job definitions* and *job runs*. Job definitions are schedules you set to run specific notebooks. For example, you can create a job definition that runs notebook XYZ.ipynb every Wednesday. This job definition launches the actual job runs which occur this coming Wednesday, next Wednesday, the Wednesday after that, and so on. 

**Note**  
The SageMaker Python SDK notebook job step does not create job definitions. However, you can view your jobs in the Notebook Jobs dashboard. Both jobs and job definitions are available if you schedule your job in a JupyterLab environment.

The interface provides two main tabs that help you track your existing job definitions and job runs:
+ **Notebook Jobs** tab: This tab displays a list of all your job runs from your on-demand jobs and job definitions. From this tab, you can directly access the details for a single job run. For example, you can view a single job run that occurred two Wednesdays ago.
+ **Notebook Job Definitions** tab: This tab displays a list of all your job definitions. From this tab, you can directly access the details for a single job definition. For example, you can view the schedule you created to run XYZ.ipynb every Wednesday.

For details about the **Notebook Jobs** tab, see [View notebook jobs](view-notebook-jobs.md).

For details about the **Notebook Job Definitions** tab, see [View notebook job definitions](view-def-detail-notebook-auto-run.md).

# View notebook jobs
<a name="view-notebook-jobs"></a>

**Note**  
You can automatically view your notebook jobs if you scheduled your notebook job from the Studio UI. If you used the SageMaker Python SDK to schedule your notebook job, you need to supply additional tags when you create the notebook job step. For details, see [View your notebook jobs in the Studio UI dashboard](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash).

The following topic gives information about the **Notebook Jobs** tab and how to view the details of a single notebook job. The **Notebook Jobs** tab (which you access by choosing the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the Studio toolbar) shows a history of your on-demand jobs and all the jobs that run from the job definitions you created. This tab opens after you create an on-demand job, or you can just view this tab yourself to see a history of past and current jobs. If you select the **Job name** for any job, you can view details for a single job in its **Job Detail** page. For more information about the **Job Detail** page, see the following section [View a single job](#view-jobs-detail-notebook-auto-run).

The **Notebook Jobs** tab includes the following information for each job:
+ **Output files**: Displays the availability of output files. This column can contain one of the following:
  + A download icon (![\[Cloud icon with downward arrow, representing download or cloud storage functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/File_download.png)): The output notebook and log are available for download; choose this button to download them. Note that a failed job can still generate output files if the failure occurred after the files were created. In this case, it is helpful to view the output notebook to identify the failure point.
  + Links to the **Notebook** and **Output log**: The notebook and output log are downloaded. Choose the links to view their contents.
  + (blank): The job was stopped by the user, or a failure occurred in the job run before it could generate output files. For example, network failures could prevent the job from starting.

  The output notebook is the result of running all cells in the notebook, and also incorporates any new or overriding parameters or environment variables you included. The output log captures the details of the job run to help you troubleshoot failed jobs.
+ **Created at**: The time the on-demand job or scheduled job was created.
+ **Status**: The current status of the job, which is one of the following values:
  + **In progress**: The job is running
  + **Failed**: The job failed from configuration or notebook logic errors
  + **Stopped**: The job was stopped by the user
  + **Completed**: The job completed
+ **Actions**: This column provides shortcuts to help you stop or remove any job directly in the interface.

## View a single job
<a name="view-jobs-detail-notebook-auto-run"></a>

From the **Notebook Jobs** tab, you can select a job name to view the **Job Detail** page for a specific job. The **Job Detail** page includes all the details you provided in the **Create Job** form. Use this page to confirm the settings you specified when you created the job definition. 

In addition, you can access shortcuts to help you perform the following actions in the page itself:
+ **Delete Job**: Remove the job from the **Notebook Jobs** tab.
+ **Stop Job**: Stop your running job.

# View notebook job definitions
<a name="view-def-detail-notebook-auto-run"></a>

**Note**  
If you scheduled your notebook job with the SageMaker Python SDK, skip this section. Only notebook jobs created in Studio or local JupyterLab environments create job definitions. Therefore, if you created your notebook job with the SageMaker Python SDK, you won’t see job definitions in the Notebook Jobs dashboard. You can, however, view your notebook jobs as described in [View notebook jobs](view-notebook-jobs.md). 

When you create a job definition, you create a schedule for a job. The **Notebook Job Definitions** tab lists these schedules, as well as information about specific notebook job definitions. For example, you might create a job definition that runs a specific notebook every minute. Once this job definition is active, you see a new job every minute in the **Notebook Jobs** tab. The following page gives information about the **Notebook Job Definitions** tab, as well as how to view a notebook job definition.

The **Notebook Job Definitions** tab displays a dashboard with all your job definitions and includes the input notebook, the creation time, the schedule, and the status for each job definition. The value in the **Status** column is one of the following values:
+ **Paused**: You paused the job definition. Studio does not initiate any jobs until you resume the definition.
+ **Active**: The schedule is on and Studio can run the notebook according to the schedule you specified.

In addition, the **Actions** column provides shortcuts to help you perform the following tasks directly in the interface:
+ Pause: Pauses the job definition. Studio won’t create any jobs until you resume the definition.
+ Delete: Removes the job definition from the **Notebook Job Definitions** tab.
+ Resume: Continues a paused job definition so that it can start jobs.

If you created a job definition but it doesn’t initiate jobs, see [Job definition doesn’t create jobs](notebook-auto-run-troubleshoot.md#notebook-auto-run-troubleshoot-no-jobs) in the [Troubleshooting guide](notebook-auto-run-troubleshoot.md).

## View a single job definition
<a name="view-job-definition-detail-page"></a>

If you select a job definition name in the **Notebook Job Definitions** tab, you see the **Job Definition** page where you can view specific details for a job definition. Use this page to confirm the settings you specified when you created the job definition. If you don’t see any jobs created from your job definition, see [Job definition doesn’t create jobs](notebook-auto-run-troubleshoot.md#notebook-auto-run-troubleshoot-no-jobs) in the [Troubleshooting guide](notebook-auto-run-troubleshoot.md).

This page also contains a section listing the jobs that run from this job definition. Viewing your jobs in the **Job Definition** page may be a more productive way to help you organize your jobs instead of viewing jobs in the **Notebook Jobs** tab, which combines all jobs from all your job definitions.

In addition, this page provides shortcuts for the following actions:
+ **Pause/Resume**: Pause your job definition, or resume a paused definition. Note that if a job is currently running for this definition, Studio does not stop it.
+ **Run**: Run a single on-demand job from this job definition. This option also lets you specify different input parameters to your notebook before starting the job.
+ **Edit Job Definition**: Change the schedule of your job definition. You can select a different time interval, or you can opt for a custom schedule using cron syntax.
+ **Delete Job Definition**: Remove the job definition from the **Notebook Job Definitions** tab. Note that if a job is currently running for this definition, Studio does not stop it.

# Troubleshooting guide
<a name="notebook-auto-run-troubleshoot"></a>

Refer to this troubleshooting guide to help you debug failures you might experience when your scheduled notebook job runs.

## Job definition doesn’t create jobs
<a name="notebook-auto-run-troubleshoot-no-jobs"></a>

If your job definition does not initiate any jobs, the notebook or training job may not be displayed in the **Jobs** section on the left navigation bar in Amazon SageMaker Studio. If this is the case, you can find error messages in the **Pipelines** section on the left navigation bar in Studio. Each notebook or training job definition belongs to an execution pipeline. The following are common causes for failing to initiate notebook jobs.

**Missing permissions**
+ The role assigned to the job definition does not have a trust relationship with Amazon EventBridge. That is, EventBridge cannot assume the role.
+ The role assigned to the job definition does not have permission to call `SageMaker AI:StartPipelineExecution`.
+ The role assigned to the job definition does not have permission to call `SageMaker AI:CreateTrainingJob`.

**EventBridge quota exceeded**

If you see a `Put*` error such as the following example, you exceeded an EventBridge quota. To resolve this, you can clean up unused EventBridge runs, or ask AWS Support to increase your quota.

```
LimitExceededException) when calling the PutRule operation: 
The requested resource exceeds the maximum number allowed
```

For more information about EventBridge quotas, see [Amazon EventBridge quotas](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-quota.html).

**Pipeline quota limit exceeded**

If you see an error such as the following example, you exceeded the number of pipelines that you can run. To resolve this, you can clean up unused pipelines in your account, or ask AWS Support to increase your quota.

```
ResourceLimitExceeded: The account-level service limit 
'Maximum number of pipelines allowed per account' is XXX Pipelines, 
with current utilization of XXX Pipelines and a request delta of 1 Pipelines.
```

For more information about pipeline quotas, see [Amazon SageMaker AI endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html).

**Training job limit exceeded**

If you see an error such as the following example, you exceeded the number of training jobs that you can run. To resolve this, reduce the number of training jobs in your account, or ask AWS Support to increase your quota.

```
ResourceLimitExceeded: The account-level service limit 
'ml.m5.2xlarge for training job usage' is 0 Instances, with current 
utilization of 0 Instances and a request delta of 1 Instances. 
Please contact AWS support to request an increase for this limit.
```

For more information about training job quotas, see [Amazon SageMaker AI endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html).

## Auto visualizations disabled in SparkMagic notebooks
<a name="notebook-auto-run-troubleshoot-visualization"></a>

If your notebook uses the SparkMagic PySpark kernel and you run the notebook as a Notebook Job, you may see that your auto visualizations are disabled in the output. Turning on auto visualization causes the kernel to hang, so the notebook job executor currently disables auto visualizations as a workaround.

# Constraints and considerations
<a name="notebook-auto-run-constraints"></a>

Review the following constraints to ensure your notebook jobs complete successfully. Studio uses Papermill to run notebooks. You might need to update Jupyter notebooks to align to Papermill's requirements. There are also restrictions on the content of LCC scripts and important details to understand regarding VPC configuration.

## JupyterLab version
<a name="notebook-auto-run-constraints-jpt"></a>

JupyterLab version 4.0 is supported.

## Installation of packages that require kernel restart
<a name="notebook-auto-run-constraints-pmill-pkg"></a>

Papermill does not support calling `pip install` to install packages that require a kernel restart. In this situation, use `pip install` in an initialization script. For a package installation that does not require kernel restart, you can still include `pip install` in the notebook. 

## Kernel and language names registered with Jupyter
<a name="notebook-auto-run-constraints-pmill-names"></a>

Papermill registers a translator for specific kernels and languages. If you bring your own instance (BYOI), use a standard kernel name as shown in the following snippet:

```
papermill_translators.register("python", PythonTranslator)
papermill_translators.register("R", RTranslator)
papermill_translators.register("scala", ScalaTranslator)
papermill_translators.register("julia", JuliaTranslator)
papermill_translators.register("matlab", MatlabTranslator)
papermill_translators.register(".net-csharp", CSharpTranslator)
papermill_translators.register(".net-fsharp", FSharpTranslator)
papermill_translators.register(".net-powershell", PowershellTranslator)
papermill_translators.register("pysparkkernel", PythonTranslator)
papermill_translators.register("sparkkernel", ScalaTranslator)
papermill_translators.register("sparkrkernel", RTranslator)
papermill_translators.register("bash", BashTranslator)
```

## Parameters and environment variable limits
<a name="notebook-auto-run-constraints-var-limits"></a>

**Parameters and environment variable limits.** When you create your notebook job, it receives the parameters and environment variables you specify. You can pass up to 100 parameters. Each parameter name can be up to 256 characters long, and the associated value can be up to 2500 characters long. If you pass environment variables, you can pass up to 28 variables. The variable name and associated value can be up to 512 characters long. If you need more than 28 environment variables, use additional environment variables in an initialization script which has no limit on the number of environment variables you can use.

## Viewing jobs and job definitions
<a name="notebook-auto-run-constraints-view-job"></a>

**Viewing jobs and job definitions.** If you schedule your notebook job in the Studio UI in the JupyterLab notebook, you can [view your notebook jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/view-notebook-jobs.html) and your [notebook job definitions](https://docs.aws.amazon.com/sagemaker/latest/dg/view-def-detail-notebook-auto-run.html) in the Studio UI. If you scheduled your notebook job with the SageMaker Python SDK, you can view your jobs only—the SageMaker Python SDK notebook job step does not create job definitions. To view your jobs, you also need to supply additional tags to your notebook job step instance. For details, see [View your notebook jobs in the Studio UI dashboard](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash).

## Image
<a name="notebook-auto-run-constraints-image"></a>

You need to manage image constraints depending on whether you run notebook jobs in Studio or the SageMaker Python SDK notebook job step in a pipeline.

### Image constraints for SageMaker AI Notebook Jobs (Studio)
<a name="notebook-auto-run-constraints-image-studio"></a>

**Image and kernel support.** The driver that launches your notebook job assumes the following:
+ A base Python runtime environment is installed in the Studio or bring-your-own (BYO) images and is the default in the shell.
+ The base Python runtime environment includes the Jupyter client with kernelspecs properly configured.
+ The base Python runtime environment includes the `pip` function so the notebook job can install system dependencies.
+ For images with multiple environments, your initialization script should switch to the proper kernel-specific environment before installing notebook-specific packages. You should switch back to the default Python runtime environment, if different from the kernel runtime environment, after configuring the kernel Python runtime environment.

The driver that launches your notebook job is a bash script, and Bash v4 must be available at /bin/bash. 

**Root privileges on bring-your-own-images (BYOI).** You must have root privileges on your own Studio images, either as the root user or through `sudo` access. If you are not a root user but accessing root privileges through `sudo`, use **1000/100** as the `UID/GID`.

### Image constraints for SageMaker AI Python SDK notebook jobs
<a name="notebook-auto-run-constraints-image-sdk"></a>

The notebook job step supports the following images:
+ SageMaker Distribution Images listed in [Amazon SageMaker Images Available for Use With Studio Classic Notebooks](notebooks-available-images.md).
+ A custom image based on the SageMaker Distribution images in the previous list. Use a [SageMaker Distribution image](https://github.com/aws/sagemaker-distribution) as a base.
+ A custom image (BYOI) pre-installed with notebook job dependencies (i.e., [sagemaker-headless-execution-driver](https://pypi.org/project/sagemaker-headless-execution-driver/). Your image must meet the following requirements:
  + The image is pre-installed with notebook job dependencies.
  + A base Python runtime environment is installed and is default in the shell environment.
  + The base Python runtime environment includes the Jupyter client with kernelspecs properly configured.
  + You have root privileges, either as the root user or through `sudo` access. If you are not a root user but accessing root privileges through `sudo`, use **1000/100** as the `UID/GID`.

## VPC subnets used during job creation
<a name="notebook-auto-run-constraints-vpc"></a>

If you use a VPC, Studio uses your private subnets to create your job. Specify one to five private subnets (and 1–15 security groups).

If you use a VPC with private subnets, you must choose one of the following options to ensure the notebook job can connect to dependent services or resources:
+ If the job needs access to an AWS service that supports interface VPC endpoints, create an endpoint to connect to the service. For a list of services that support interface endpoints, see [AWS services that integrate with AWS PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html). For information about creating an interface VPC endpoint, see [Access an AWS service using an interface VPC endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html). At minimum, an Amazon S3 VPC endpoint gateway must be provided.
+ If a notebook job needs access to an AWS service that doesn't support interface VPC endpoints or to a resource outside of AWS, create a NAT gateway and configure your security groups to allow outbound connections. For information about setting up a NAT gateway for your VPC, see *VPC with public and private Subnets (NAT)* in the [Amazon Virtual Private Cloud User Guide](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html).

## Service limits
<a name="notebook-auto-run-constraints-service-limit"></a>

Since the notebook job scheduler is built from Pipelines, SageMaker Training, and Amazon EventBridge services, your notebook jobs are subject to their service-specific quotas. If you exceed these quotas, you may see error messages related to these services. For example, there are limits for how many pipelines you can run at one time, and how many rules you can set up for a single event bus. For more information about SageMaker AI quotas, see [Amazon SageMaker AI Endpoints and Quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html). For more information about EventBridge quotas, see [Amazon EventBridge Quotas](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-quota.html).

# Pricing for SageMaker Notebook Jobs
<a name="notebook-auto-run-pricing"></a>

When you schedule notebook jobs, your Jupyter notebooks run on SageMaker training instances. After you select an **Image** and **Kernel** in your **Create Job** form, the form provides a list of available compute types. You are charged for the compute type you choose, based on the combined duration of use for all notebook jobs that run from the job definition. If you don’t specify a compute type, SageMaker AI assigns you a default Amazon EC2 instance type of `ml.m5.large`. For a breakdown of SageMaker AI pricing by compute type, see [Amazon SageMaker AI Pricing](https://aws.amazon.com/sagemaker/pricing).