

# Amazon EMR Notebooks overview
<a name="emr-managed-notebooks"></a>

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

You can use Amazon EMR Notebooks along with Amazon EMR clusters running [Apache Spark](https://aws.amazon.com/emr/features/spark/) to create and open [Jupyter](https://jupyter.org) Notebook and JupyterLab interfaces within the Amazon EMR console. An EMR notebook is a "serverless" notebook that you can use to run queries and code. Unlike a traditional notebook, the contents of an EMR notebook — the equations, queries, models, code, and narrative text within notebook cells — run in a client. The commands are executed using a kernel on the EMR cluster. Notebook contents are also saved to Amazon S3 separately from cluster data for durability and flexible re-use.

You can start a cluster, attach an EMR notebook for analysis, and then terminate the cluster. You can also close a notebook attached to one running cluster and switch to another. Multiple users can attach notebooks to the same cluster simultaneously and share notebook files in Amazon S3 with each other. These features let you run clusters on-demand to save cost, and reduce the time spent re-configuring notebooks for different clusters and datasets.

You can also execute an EMR notebook programmatically using the Amazon EMR API, without the need to interact with Amazon EMR console ("headless execution"). You need to include a cell in the EMR notebook that has a parameters tag. That cell allows a script to pass new input values to the notebook. Parameterized notebooks can be re-used with different sets of input values. There's no need to make copies of the same notebook to edit and execute with new input values. Amazon EMR creates and saves the output notebook on S3 for each run of the parameterized notebook. For EMR notebook API code samples, see [Sample programmatic commands for EMR Notebooks](emr-managed-notebooks-headless.md).

**Important**  
The EMR Notebooks capability supports clusters that use Amazon EMR releases 5.18.0 and higher. We recommend that you use EMR Notebooks with clusters that use the latest version of Amazon EMR, or at least 5.30.0, 5.32.0, or 6.2.0. With these releases, Jupyter kernels run on the attached cluster rather than on a Jupyter instance. This improves performance and enhances your ability to customize kernels and libraries. For more information, see [Differences in capabilities by cluster release version](emr-managed-notebooks-considerations.md#considerations-cluster-version).

Applicable charges for Amazon S3 storage and for Amazon EMR clusters apply.

# Amazon EMR Notebooks are available as Amazon EMR Studio Workspaces in the console
<a name="emr-managed-notebooks-migration"></a>

## Making the transition from EMR Notebooks to Workspaces
<a name="emr-notebooks-workspaces-transition"></a>

In the [new Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html), we've merged EMR Notebooks with Amazon EMR Studio Workspaces into a single experience. When you use an EMR Studio, you can create and configure different Workspaces to organize and run notebooks. If you had Amazon EMR notebooks in the old console, they're available as EMR Studio Workspaces in the console.

Amazon EMR created these new EMR Studio Workspaces for you. The number of Studios that we created corresponds to the number of distinct VPCs that you use from EMR Notebooks. For example, if you connect to EMR clusters in two different VPCs from EMR Notebooks, then we created two new EMR Studios. Your notebooks are distributed among the new Studios. 

**Important**  
We turned off the option to create new notebooks in the old Amazon EMR console. Instead, use **Create Workspace** in the new Amazon EMR console.

For more information on Amazon EMR Studio Workspaces, see [Learn EMR Studio workspaces](emr-studio-configure-workspace.md). For a conceptual overview of EMR Studio, see [Workspaces](how-emr-studio-works.md#emr-studio-workspaces) on the [How Amazon EMR Studio works](how-emr-studio-works.md) page.

## What do you need to do?
<a name="emr-notebooks-workspaces-prepare"></a>

While you can still use your existing notebooks in the old console, we recommend that you instead use Amazon EMR Studio Workspaces in the console. You must configure additional role permissions to turn on the [capabilities in EMR Studio that aren’t available in EMR Notebooks](#emr-notebooks-workspaces-enhancements). 

**Note**  
At a minimum, to view existing EMR Notebooks as EMR Studio Workspaces and to create new Workspaces, users must have `elasticmapreduce:ListStudios` and `elasticmapreduce:CreateStudioPresignedUrl` permissions on their roles. To access all of the EMR Studio features, see [Enabling EMR Studio features for EMR Notebooks users](#emr-notebooks-workspaces-enable) for the complete list of added permissions that EMR Notebooks users will need.

## Enhanced capabilities in EMR Studio beyond EMR Notebooks
<a name="emr-notebooks-workspaces-enhancements"></a>

With Amazon EMR Studio, you can set up and use the following capabilities that aren't available with EMR Notebooks:
+ [Browse and attach to EMR clusters from within Jupyterlab](emr-studio-create-use-clusters.md)
+ [Browse and attach to EMR Notebooks virtual clusters from within Jupyterlab](emr-studio-create-use-clusters.md)
+ [Connect to Git repos from within Jupyterlab](emr-studio-git-repo.md)
+ [Collaborate with other members of your team to write and run notebook code ](emr-studio-workspace-collaboration.md)
+ [Browse data with SQL Explorer](emr-studio-sql-explorer.md)
+ [Provision EMR clusters with Service Catalog](emr-studio-cluster-templates.md)

For a complete list of capabilities with Amazon EMR Studio, see [Key features of EMR Studio](emr-studio.md#emr-studio-key-features). 

## Enabling EMR Studio features for EMR Notebooks users
<a name="emr-notebooks-workspaces-enable"></a>

The new EMR Studios that we will create as part of this merge use the existing `EMR_Notebooks_DefaultRole` IAM role as the EMR Studio service role.

Users who transition to EMR Studio from EMR Notebooks and want to use the additional capabilities of EMR Studio require several new role permissions. Add the following permissions to the roles of your EMR Notebooks users who plan to use EMR Studio.

**Note**  
At a minimum, to view existing EMR Notebooks as EMR Studio Workspaces and to create new Workspaces, users must have `elasticmapreduce:ListStudios` and `elasticmapreduce:CreateStudioPresignedUrl` permissions on their roles. To use all of the EMR Studio features, add all of the permissions listed below. Admin users also need permission to create and manage an EMR Studio. For more information, see [Administrator permissions to create and manage an EMR Studio](emr-studio-admin-permissions.md).

```
"elasticmapreduce:DescribeStudio", 
"elasticmapreduce:ListStudios",
"elasticmapreduce:CreateStudioPresignedUrl",
"elasticmapreduce:UpdateEditor", 
"elasticmapreduce:PutWorkspaceAccess", 
"elasticmapreduce:DeleteWorkspaceAccess", 
"elasticmapreduce:ListWorkspaceAccessIdentities",
"emr-containers:ListVirtualClusters", 
"emr-containers:DescribeVirtualCluster", 
"emr-containers:ListManagedEndpoints", 
"emr-containers:DescribeManagedEndpoint", 
"emr-containers:CreateAccessTokenForManagedEndpoint",
"emr-containers:ListJobRuns", 
"emr-containers:DescribeJobRun",
"servicecatalog:SearchProducts", 
"servicecatalog:DescribeProduct", 
"servicecatalog:DescribeProductView", 
"servicecatalog:DescribeProvisioningParameters", 
"servicecatalog:ProvisionProduct", 
"servicecatalog:UpdateProvisionedProduct", 
"servicecatalog:ListProvisioningArtifacts", 
"servicecatalog:DescribeRecord", 
"servicecatalog:ListLaunchPaths", 
"cloudformation:DescribeStackResources"
```

The following permissions are also required to use the collaboration capabilities in EMR Studio, but weren't required with EMR Notebooks.

```
"sso-directory:SearchUsers",
"iam:GetUser", 
"iam:GetRole", 
"iam:ListUsers", 
"iam:ListRoles", 
"sso:GetManagedApplicationInstance"
```

# Requirements, differences in release versions, and security for EMR Notebooks
<a name="emr-managed-notebooks-considerations"></a>

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

Consider the following requirements, differences in release versions, security information and other considerations when you create clusters and develop solutions using EMR notebook.

## Cluster requirements
<a name="considerations-limitations"></a>
+ **Enable Amazon EMR Block Public Access** – Inbound access to a cluster enables cluster users to execute notebook kernels. Ensure that only authorized users can access the cluster. We strongly recommend that you leave block public access enabled, and that you limit inbound SSH traffic to only trusted sources. For more information, see [Using Amazon EMR block public access](emr-block-public-access.md) and [Control network traffic with security groups for your Amazon EMR cluster](emr-security-groups.md).
+ **Use a Compatible Cluster** – A cluster attached to a notebook must meet the following requirements:
  + Only clusters created using Amazon EMR are supported. You can create a cluster independently within Amazon EMR and then attach an EMR notebook, or you can create a compatible cluster when you create an EMR notebook.
  + Only clusters created using Amazon EMR release version 5.18.0 and later are supported. See [Differences in capabilities by cluster release version](#considerations-cluster-version).
  + Clusters created using Amazon EC2 instances with AMD EPYC processors—for example, m5a.\$1 and r5a.\$1 instance types—are not supported.
  + EMR Notebooks works only with clusters created with `VisibleToAllUsers` set to `true`. `VisibleToAllUsers` is `true` by default.
  + The cluster must be launched within an EC2-VPC. Public and private subnets are supported. The EC2-Classic platform is not supported.
  + The cluster must be launched with Hadoop, Spark, and Livy installed. Other applications may be installed, but EMR Notebooks currently supports Spark clusters only.
**Important**  
For Amazon EMR release versions 5.32.0 and later, or 6.2.0 and later, your cluster must also be running the Jupyter Enterprise Gateway application in order to work with EMR Notebooks.
  + Clusters using Kerberos authentication are not supported.
  + Clusters integrated with AWS Lake Formation support the installation of notebook-scoped libraries only. Installing kernels and libraries on the cluster are not supported.
  + Clusters with multiple primary nodes are not supported.
  + Clusters using Amazon EC2 instances based on AWS Graviton2 are not supported.

## Differences in capabilities by cluster release version
<a name="considerations-cluster-version"></a>

We strongly recommend that you use EMR Notebooks with clusters created using Amazon EMR release versions 5.30.0, 5.32.0 or later, or 6.2.0 or later. With these versions, EMR Notebooks runs kernels on the attached Amazon EMR cluster. Kernels and libraries can be installed directly on the cluster primary node. Using EMR Notebooks with these cluster versions has the following benefits:
+ **Improved performance** – Notebook kernels run on clusters with EC2 instance types that you select. Earlier versions run kernels on a specialized instance that cannot be resized, accessed, or customized. 
+ **Ability to add and customize kernels** – You can connect to the cluster to install kernel packages using `conda` and `pip`. In addition, `pip` installation is supported using terminal commands within notebook cells. In earlier versions, only pre-installed kernels were available (Python, PySpark, Spark, and SparkR). For more information, see [Installing kernels and Python libraries on a cluster primary node](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-cluster-kernel).
+ **Ability to install Python libraries** – You can [install Python libraries on the cluster primary node](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-cluster-kernel) using `conda` and `pip`. We recommend using `conda`. With earlier versions, only [notebook-scoped libraries](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-custom-libraries-limitations) for PySpark are supported.


**Supported EMR Notebooks features by cluster release**  

| Cluster release version | Notebook-scoped libraries for PySpark | Kernel installation on cluster | Python library installation on primary node | 
| --- | --- | --- | --- | 
|  Earlier than 5.18.0  |  EMR Notebooks not supported  | 
|  5.18.0–5.25.0  |  No  |  No  |  No  | 
|  5.26.0–5.29.0  |  [Yes](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-custom-libraries-limitations)  |  No  |  No  | 
|  5.30.0  |  [Yes](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-custom-libraries-limitations)  |  [Yes](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-cluster-kernel)  |  [Yes](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-cluster-kernel)  | 
|  6.0.0  |  No  |  No  |  No  | 
| 5.32.0 and later, and 6.2.0 and later | [Yes](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-custom-libraries-limitations) | [Yes](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-cluster-kernel) | [Yes](emr-managed-notebooks-installing-libraries-and-kernels.md#emr-managed-notebooks-cluster-kernel) | 

## Limits for concurrently attached EMR Notebooks
<a name="emr-managed-notebooks-cluster-limits"></a>

When you create a cluster that supports notebooks, consider the EC2 Instance type of the cluster primary node. The memory constraints of this EC2 Instance determine the number of notebooks that can be ready simultaneously to run code and queries on the cluster.


| Primary node EC2 instance type | Number of EMR Notebooks | 
| --- | --- | 
|  \$1.medium  |  2  | 
|  \$1.large  |  4  | 
|  \$1.xlarge  |  8  | 
|  \$1.2xlarge  |  16  | 
|  \$1.4xlarge  |  24  | 
|  \$1.8xlarge  |  24  | 
|  \$1.16xlarge  |  24  | 

## Jupyter Notebook and Python versions
<a name="considerations-versions"></a>

EMR Notebooks runs [Jupyter Notebook version 6.0.2](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html#release-6-0-2) and Python 3.6.5 regardless of the Amazon EMR release version of the attached cluster.

## Security-related considerations
<a name="considerations-notebooks-security"></a>

**Using encrypted S3 locations**  
If you specify an encrypted location in Amazon S3 to store notebook files, you must set up the [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md) as a key user. The default service role is `EMR_Notebooks_DefaultRole`. If you are using an AWS KMS key for encryption, see [Using key policies in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html#key-policy-users-crypto) in the AWS Key Management Service Developer Guide and the [support article for adding key users](https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-access-default-encryption/).

**Using cookies with hosting domains**  
To augment the security for the off-console applications that you might use with Amazon EMR, the application hosting domains are registered in the Public Suffix List (PSL). Examples of these hosting domains include the following: `emrstudio-prod.us-east-1.amazonaws.com`, `emrnotebooks-prod.us-east-1.amazonaws.com`, `emrappui-prod.us-east-1.amazonaws.com`. For further security, if you ever need to set sensitive cookies in the default domain name, we recommend that you use cookies with a `__Host-` prefix. This helps to defend your domain against cross-site request forgery attempts (CSRF). For more information, see the [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#cookie_prefixes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#cookie_prefixes) page in the *Mozilla Developer Network*. 

# Create a Notebook in EMR Studio
<a name="emr-managed-notebooks-create"></a>

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

You create an EMR notebook using the old Amazon EMR console. Creating notebooks using the AWS CLI or the Amazon EMR API is not supported.

**To create an EMR notebook**

1. Open the Amazon EMR console at [https://console.aws.amazon.com/elasticmapreduce/](https://console.aws.amazon.com/elasticmapreduce/).

1. Choose **Notebooks**, **Create notebook**.

1. Enter a **Notebook name** and an optional **Notebook description**.

1. If you have an active cluster to which you want to attach the notebook, leave the default **Choose an existing cluster** selected, click **Choose**, select a cluster from the list, and then click **Choose cluster**. For information about cluster requirements for EMR Notebooks, see [Requirements, differences in release versions, and security for EMR Notebooks](emr-managed-notebooks-considerations.md).

   **—or—**

   Choose **Create a cluster**, enter a **Cluster name** and choose options according to the following guidelines. The cluster is created in the default VPC for the account using On-Demand instances.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-create.html)

1. For **Security groups**, choose **Use default security groups**. Alternatively, choose **Choose security groups** and select custom security groups that are available in the VPC of the cluster. You select one for the primary instance and another for the notebook client instance. For more information, see [Specifying EC2 security groups for EMR Notebooks](emr-managed-notebooks-security-groups.md).

1. For **AWS Service Role**, leave the default or choose a custom role from the list. The client instance for the notebook uses this role. For more information, see [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md).

1. For **Notebook location** choose the location in Amazon S3 where the notebook file is saved, or specify your own location. If the bucket and folder don't exist, Amazon EMR creates it.

   Amazon EMR creates a folder with the **Notebook ID** as folder name, and saves the notebook to a file named `NotebookName.ipynb`. For example, if you specify the Amazon S3 location `s3://amzn-s3-demo-bucket/MyNotebooks` for a notebook named `MyFirstEMRManagedNotebook`, the notebook file is saved to `s3://amzn-s3-demo-bucket/MyNotebooks/NotebookID/MyFirstEMRManagedNotebook.ipynb`.

   If you specify an encrypted location in Amazon S3, you must set up the [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md) as a key user. The default service role is `EMR_Notebooks_DefaultRole`. If you are using an AWS KMS key for encryption, see [Using key policies in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html#key-policy-users-crypto) in the AWS Key Management Service Developer Guide and the [support article for adding key users](https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-access-default-encryption/).

1. Optionally, if you have added a Git-based repository to Amazon EMR that you want to associate with this notebook, choose **Git repository**, select **Choose repository** and then select a repository from the list. For more information, see [Associating Git-based repositories with EMR Notebooks](emr-git-repo.md).

1. Optionally, choose **Tags**, and then add any additional key-value tags for the notebook.
**Important**  
A default tag with the **Key** string set to `creatorUserID` and the value set to your IAM user ID is applied for access purposes. We recommend that you do not change or remove this tag because it can be used to control access. For more information, see [Use cluster and Notebook tags with IAM policies for access control](security_iam_service-with-iam.md#emr-tag-based-access).

1. Choose **Create Notebook**.

# Working with EMR Notebooks
<a name="emr-managed-notebooks-working-with"></a>

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

After you create an EMR notebook, the notebook takes a short time to start. The **Status** in the **Notebooks** list shows **Starting**. You can open a notebook when its status is **Ready**. It might take a bit longer for a notebook to be **Ready** if you created a cluster along with it.

**Tip**  
Refresh your browser or choose the refresh icon above the notebooks list to refresh notebook status.

## Understanding Notebook status
<a name="emr-managed-notebooks-status"></a>

An EMR notebook can have the following for **Status** in the **Notebooks** list.


| Status | Meaning | 
| --- | --- | 
|  Ready  |  You can open the notebook using the notebook editor. While a notebook has a **Ready** status, you can stop or delete it. To change clusters, you must stop the notebook first. If a notebook in the **Ready** status is idle for a long period of time, it is stopped automatically.  | 
|  Starting  |  The notebook is being created and attached to the cluster. While a notebook is starting, you cannot open the notebook editor, stop it, delete it, or change clusters.  | 
|  Pending  |  The notebook has been created, and is waiting for integration with the cluster to complete. The cluster may still be provisioning resources or responding to other requests. You can open the notebook editor with the notebook in *local mode*. Any code that relies on cluster processes does not execute and fails.  | 
|  Stopping  |  The notebook is shutting down, or the cluster that the notebook is attached to is terminating. While a notebook is stopping, you can't open the notebook editor, stop it, delete it, or change clusters.  | 
|  Stopped  |  The notebook has shut down. You can start the notebook on the same cluster, as long as the cluster is still running. You can change clusters, and delete the cluster.  | 
|  Deleting  |  The cluster is being removed from the list of available clusters. The notebook file, `NotebookName.ipynb `remains in Amazon S3 and continues to accrue applicable storage charges.  | 

## Working with the Notebook editor
<a name="emr-managed-notebooks-editor"></a>

An advantage of using an EMR notebook is that you can launch the notebook in Jupyter or JupyterLab directly from the console.

With EMR Notebooks, the notebook editor you access from the Amazon EMR console is the familiar open-source Jupyter Notebook editor or JupyterLab. Because the notebook editor is launched within the Amazon EMR console, it's more efficient to configure access than it is with a notebook hosted on an Amazon EMR cluster. You don't need to configure a user's client to have web access through SSH, security group rules, and proxy configurations. If a user has sufficient permissions, they can simply open the notebook editor within the Amazon EMR console.

Only one user can have an EMR notebook open at a time from within Amazon EMR. If another user tries to open an EMR notebook that is already open, an error occurs.

**Important**  
Amazon EMR creates a unique pre-signed URL for each notebook editor session, which is valid only for a short time. We recommend that you do not share the notebook editor URL. Doing this creates a security risk because recipients of the URL adopt your permissions to edit the notebook and run notebook code for the lifetime of the URL. If others need access to a notebook, provide permissions to their a user through permissions policies and ensure that the service role for EMR Notebooks has access to the Amazon S3 location. For more information, see [EMR notebooks security and access control](emr-managed-notebooks-security.md) and [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md).

**To open the notebook editor for an EMR notebook**

1. Select a notebook with a **Status** of **Ready** or **Pending** from the **Notebooks** list.

1. Choose **Open in JupyterLab** or **Open in Jupyter**.

   A new browser tab opens to the JupyterLab or Jupyter Notebook editor.

1. From the **Kernel** menu, choose **Change kernel** and then select the kernel for your programming language.

   You are now ready to write and run code from within the notebook editor.

### Saving the contents of a Notebook
<a name="emr-managed-notebooks-saving"></a>

When you work in the notebook editor, the contents of notebook cells and output are saved automatically to the notebook file periodically in Amazon S3. A notebook that has no changes since the last time a cell was edited shows **(autosaved)** next to the notebook name in the editor. If changes have not yet been saved, **unsaved changes** appears.

You can save a notebook manually. From the **File** menu, choose **Save and Checkpoint** or press CTRL\$1S. This creates a file named `NotebookName.ipynb` in a **checkpoints** folder within the notebook folder in Amazon S3. For example, `s3://amzn-s3-demo-bucket/MyNotebookFolder/NotebookID/checkpoints/NotebookName.ipynb`. Only the most recent checkpoint file is saved in this location.

## Changing clusters
<a name="emr-managed-notebooks-changing-clusters"></a>

You can change the cluster that an EMR notebook is attached to without changing the contents of the notebook itself. You can change clusters for only those notebooks that have a **Stopped** status.

**To change the cluster of an EMR notebook**

1. If the notebook that you want to change is running, select it from the **Notebooks** list and choose **Stop**.

1. When the notebook status is **Stopped**, select the notebook from the **Notebooks** list, and then choose **View details**.

1. Choose **Change cluster**.

1. If you have an active cluster running Hadoop, Spark, and Livy to which you want to attach the notebook, leave the default, and select a cluster from the list. Only clusters that meet the requirements are listed.

   —or—

   Choose **Create a cluster** and then choose the cluster options. For more information, see [Cluster requirements](emr-managed-notebooks-considerations.md#considerations-limitations).

1. Choose an option for **Security groups**, and then choose **Change cluster and start notebook**.

## Deleting Notebooks and Notebook files
<a name="emr-managed-notebooks-deleting"></a>

When you delete an EMR notebook using the Amazon EMR console, you delete the notebook from the list of available notebooks. However, notebook files remain in Amazon S3 and continue to accrue storage charges.

**To delete a notebook and remove associated files**

1. Open the Amazon EMR console at [https://console.aws.amazon.com/elasticmapreduce/](https://console.aws.amazon.com/elasticmapreduce/).

1. Choose **Notebooks**, select your notebook from the list, and then choose **View details**.

1. Choose the folder icon next to **Notebook location** and copy the **URL**, which is in the pattern `s3://MyNotebookLocationPath/NotebookID/`.

1. Choose **Delete**.

   The notebook is removed from the list, and notebook details can no longer be viewed.

1. Follow the instructions for [How do I delete folders from an S3 bucket?](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-folders.html) in the Amazon Simple Storage Service User Guide. Navigate to the bucket and folder from step 3.

   —or—

   If you have the AWS CLI installed, open a command prompt and type the command at the end of this paragraph. Replace the Amazon S3 location with the location that you copied above. Make sure that the AWS CLI is configured with the access keys of a user with permissions to delete the Amazon S3 location. For more information, see [Configuring the AWS CLI](https://docs.aws.amazon.com/AmazonS3/latest/userguide/cli-chap-getting-started.html) in the *AWS Command Line Interface User Guide*.

   ```
   aws s3 rm s3://MyNotebookLocationPath/NotebookID
   ```

## Sharing Notebook files
<a name="emr-managed-notebooks-file-sharing"></a>

Each EMR notebook is saved to Amazon S3 as a file named `NotebookName.ipynb`. As long as a notebook file is compatible with the same version of Jupyter Notebook that EMR Notebooks is based on, you can open the notebook as an EMR notebook.

The easiest way to open a notebook file from another user is to save the \$1.ipynb file from another user to your local file system, and then use the upload feature in the Jupyter and JupyterLab editors.

You can use this process to use EMR notebooks shared by others, notebooks shared in the Jupyter community, or to restore a notebook that was deleted from the console when you still have the notebook file.

**To use a different notebook file as the basis for an EMR notebook**

1. Before proceeding, close the notebook editor for any notebooks that you will work with, and then stop the notebook if it's an EMR notebook.

1. Create an EMR notebook and enter a name for it. The name that you enter for the notebook will be the name of the file you need to replace. The new file name must match this file name exactly.

1. Make a note of the location in Amazon S3 that you choose for the notebook. The file that you replace is in a folder with a path and file name like the following pattern: `s3://MyNotebookLocation/NotebookID/MyNotebookName.ipynb`.

1. Stop the notebook.

1. Replace the old notebook file in the Amazon S3 location with the new one, using exactly the same name.

   The following AWS CLI command for Amazon S3 replaces a file saved to a local machine called `SharedNotebook.ipynb` for an EMR notebook with the name **MyNotebook**, an ID of `e-12A3BCDEFJHIJKLMNO45PQRST`, and created with `amzn-s3-demo-bucket/MyNotebooksFolder` specified in Amazon S3. For information about using the Amazon S3 console to copy and replace files, see [Uploading, downloading, and managing objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-download-objects.html) in the *Amazon Simple Storage Service User Guide*.

   ```
   aws s3 cp SharedNotebook.ipynb s3://amzn-s3-demo-bucket/MyNotebooksFolder/-12A3BCDEFJHIJKLMNO45PQRST/MyNotebook.ipynb
   ```

# Sample programmatic commands for EMR Notebooks
<a name="emr-managed-notebooks-headless"></a>

## Overview
<a name="emr-managed-notebooks-headless-overview"></a>

You can execute EMR notebooks with execution APIs from a script or from command line. When you start, stop, list, and describe EMR notebook executions outside of the AWS console, you can programmatically control an EMR notebook. You can pass different parameter values to a notebook with a parameterized notebook cell. This eliminates the need to create a copy of the notebook for each new set of parameter values. For more information, see [Amazon EMR API actions](https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html).

You can schedule or batch EMR notebook executions with Amazon CloudWatch events and AWS Lambda. For more information, see [Using AWS Lambda with Amazon CloudWatch Events](https://docs.aws.amazon.com/lambda/latest/dg/services-cloudwatchevents.html).

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

## Role permissions for programmatic execution
<a name="emr-managed-notebooks-headless-permissions"></a>

To use programmatic execution with EMR Notebooks, you must configure user permissions with the following policies:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowExecutionActions",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:StartNotebookExecution",
        "elasticmapreduce:DescribeNotebookExecution",
        "elasticmapreduce:ListNotebookExecutions"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "AllowPassingServiceRole",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/EMR_Notebooks_DefaultRole"
      ]
    }
  ]
}
```

------

When you programmatically execute EMR Notebooks on an EMR Notebooks cluster, you must add these additional permissions:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowRetrievingManagedEndpointCredentials",
      "Effect": "Allow",
      "Action": [
        "emr-containers:GetManagedEndpointSessionCredentials"
      ],
      "Resource": [
        "arn:aws:emr-containers:*:123456789012:/virtualclusters/virtual-cluster-id/endpoints/managed-endpoint-id"
      ],
      "Condition": {
        "StringEquals": {
          "emr-containers:ExecutionRoleArn": [
            "arn:aws:iam::123456789012:role/emr-on-eks-execution-role"
          ]
        }
      }
    },
    {
      "Sid": "AllowDescribingManagedEndpoint",
      "Effect": "Allow",
      "Action": [
        "emr-containers:DescribeManagedEndpoint"
      ],
      "Resource": [
        "arn:aws:emr-containers:*:123456789012:/virtualclusters/virtual-cluster-id/endpoints/managed-endpoint-id"
      ]
    }
  ]
}
```

------

## Limitations with programmatic execution
<a name="emr-managed-notebooks-headless-limit"></a>
+ A maximum of 100 concurrent executions are supported per AWS Region per account.
+ An execution is terminated if it runs for more than 30 days.
+ Programmatic execution of notebooks isn't supported with Amazon EMR Serverless interactive applications.

## Examples of programmatic EMR notebook execution
<a name="emr-managed-notebooks-headless-examples"></a>

The following sections provide several examples of programmatic EMR notebook execution with the AWS CLI, Boto3 SDK (Python), and Ruby:
+ [Notebook CLI command samples in EMR Studio](emr-managed-notebooks-headless-cli.md)
+ [Python samples for an EMR notebook](emr-managed-notebooks-headless-python.md)
+ [Ruby samples for an EMR notebook](emr-managed-notebooks-headless-ruby.md)

You can also run parameterized notebooks as part of scheduled workflows with an orchestration tool such as Apache Airflow or Amazon Managed Workflows for Apache Airflow (MWAA). For more information, see [Orchestrating analytics jobs on EMR Notebooks using MWAA](https://aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-on-amazon-emr-notebooks-using-amazon-mwaa/) in the *AWS Big Data Blog*.

# Notebook CLI command samples in EMR Studio
<a name="emr-managed-notebooks-headless-cli"></a>

This topic shows CLI command samples for an EMR notebook. The example uses the demo notebook from the EMR Notebooks console. To locate the notebook, use the file path relative to the home directory. In this example, there are two notebook files that you can run: `demo_pyspark.ipynb` and `my_folder/python3.ipynb`. 

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

The relative path for file `demo_pyspark.ipynb` is `demo_pyspark.ipynb`, shown below.

![\[Jupyter notebook interface showing a file explorer and code editor with PySpark content.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/notebook_exe_folder_structure_1.png)


The relative path for `python3.ipynb` is `my_folder/python3.ipynb`, shown below.

![\[File explorer showing python3.ipynb in my_folder, and Jupyter notebook interface with code.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/notebook_exe_folder_structure_2.png)


For information about the Amazon EMR API `NotebookExecution` actions, see [Amazon EMR API actions.](https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html).

## Run a notebook
<a name="emr-managed-notebooks-api-actions"></a>

You can use the AWS CLI to run your notebook with the `start-notebook-execution` action, as the following examples demonstrate. 

**Example – Executing an EMR notebook in an EMR Studio Workspace with an Amazon EMR (running on Amazon EC2) cluster**  

```
aws emr --region us-east-1 \
start-notebook-execution \
--editor-id e-ABCDEFG123456 \
--notebook-params '{"input_param":"my-value", "good_superhero":["superman", "batman"]}' \
--relative-path test.ipynb \
--notebook-execution-name my-execution \
--execution-engine '{"Id" : "j-1234ABCD123"}' \
--service-role EMR_Notebooks_DefaultRole 
 
{
    "NotebookExecutionId": "ex-ABCDEFGHIJ1234ABCD"
}
```

**Example – Executing an EMR notebook in an EMR Studio Workspace with an EMR Notebooks cluster**  

```
aws emr start-notebook-execution \
    --region us-east-1 \
    --service-role EMR_Notebooks_DefaultRole \
    --environment-variables '{"KERNEL_EXTRA_SPARK_OPTS": "--conf spark.executor.instances=1", "KERNEL_LAUNCH_TIMEOUT": "350"}' \
    --output-notebook-format HTML \
    --execution-engine Id=arn:aws:emr-containers:us-west-2:account-id:/virtualclusters/ABCDEFG/endpoints/ABCDEF,Type=EMR_ON_EKS,ExecutionRoleArn=arn:aws:iam::account-id:role/execution-role \
    --editor-id e-ABCDEFG \
    --relative-path EMRonEKS-spark_python.ipynb
```

**Example – Executing an EMR notebook specifying its Amazon S3 location**  

```
aws emr start-notebook-execution \
    --region us-east-1 \
    --notebook-execution-name my-execution-on-emr-on-eks-cluster \
    --service-role EMR_Notebooks_DefaultRole \
    --environment-variables '{"KERNEL_EXTRA_SPARK_OPTS": "--conf spark.executor.instances=1", "KERNEL_LAUNCH_TIMEOUT": "350"}' \
    --output-notebook-format HTML \
    --execution-engine Id=arn:aws:emr-containers:us-west-2:account-id:/virtualclusters/ABCDEF/endpoints/ABCDEF,Type=EMR_ON_EKS,ExecutionRoleArn=arn:aws:iam::account-id:role/execution-role \
    --notebook-s3-location '{"Bucket": "amzn-s3-demo-bucket","Key": "s3-prefix-to-notebook-location/EMRonEKS-spark_python.ipynb"}' \
    --output-notebook-s3-location '{"Bucket": "amzn-s3-demo-bucket","Key": "s3-prefix-for-storing-output-notebook"}'
```

## Notebook output
<a name="emr-managed-notebooks-headless-cli-output"></a>

 Here's the output from a sample notebook. Cell 3 shows the newly-injected parameter values.

![\[Jupyter notebook cells showing Python code and output for parameter injection and manipulation.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/HelloWorld_notebook.png)


## Describe a notebook
<a name="emr-managed-notebooks-headless-cli-describe"></a>

You can use the `describe-notebook-execution` action to access information about a specific notebook execution.

```
aws emr --region us-east-1 \
describe-notebook-execution --notebook-execution-id ex-IZWZZVR9DKQ9WQ7VZWXJZR29UGHTE
 
{
    "NotebookExecution": {
        "NotebookExecutionId": "ex-IZWZZVR9DKQ9WQ7VZWXJZR29UGHTE",
        "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
        "ExecutionEngine": {
            "Id": "j-2QMOV6JAX1TS2",
            "Type": "EMR",
            "MasterInstanceSecurityGroupId": "sg-05ce12e58cd4f715e"
        },
        "NotebookExecutionName": "my-execution",
        "NotebookParams": "{\"input_param\":\"my-value\", \"good_superhero\":[\"superman\", \"batman\"]}",
        "Status": "FINISHED",
        "StartTime": 1593490857.009,
        "Arn": "arn:aws:elasticmapreduce:us-east-1:123456789012:notebook-execution/ex-IZWZZVR9DKQ9WQ7VZWXJZR29UGHTE",
        "LastStateChangeReason": "Execution is finished for cluster j-2QMOV6JAX1TS2.",
        "NotebookInstanceSecurityGroupId": "sg-0683b0a39966d4a6a",
        "Tags": []
    }
}
```

## Stop a notebook
<a name="emr-managed-notebooks-headless-cli-stop"></a>

If your notebook is running an execution that you'd like to stop, you can do so with the `stop-notebook-execution` command.

```
# stop a running execution
aws emr --region us-east-1 \
stop-notebook-execution --notebook-execution-id ex-IZWZX78UVPAATC8LHJR129B1RBN4T
 
 
# describe it
aws emr --region us-east-1 \
describe-notebook-execution --notebook-execution-id ex-IZWZX78UVPAATC8LHJR129B1RBN4T
 
{
    "NotebookExecution": {
        "NotebookExecutionId": "ex-IZWZX78UVPAATC8LHJR129B1RBN4T",
        "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
        "ExecutionEngine": {
            "Id": "j-2QMOV6JAX1TS2",
            "Type": "EMR"
        },
        "NotebookExecutionName": "my-execution",
        "NotebookParams": "{\"input_param\":\"my-value\", \"good_superhero\":[\"superman\", \"batman\"]}",
        "Status": "STOPPED",
        "StartTime": 1593490876.241,
        "Arn": "arn:aws:elasticmapreduce:us-east-1:123456789012:editor-execution/ex-IZWZX78UVPAATC8LHJR129B1RBN4T",
        "LastStateChangeReason": "Execution is stopped for cluster j-2QMOV6JAX1TS2. Internal error",
        "Tags": []
    }
}
```

## List the executions for a notebook by start time
<a name="emr-managed-notebooks-headless-cli-list"></a>

You can pass a `--from` parameter to `list-notebook-executions` to list your notebook's executions by start time.

```
# filter by start time 
aws emr --region us-east-1 \ 
list-notebook-executions --from 1593400000.000
 
{
    "NotebookExecutions": [
        {
            "NotebookExecutionId": "ex-IZWZX78UVPAATC8LHJR129B1RBN4T",
            "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
            "NotebookExecutionName": "my-execution",
            "Status": "STOPPED",
            "StartTime": 1593490876.241
        },
        {
            "NotebookExecutionId": "ex-IZWZZVR9DKQ9WQ7VZWXJZR29UGHTE",
            "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
            "NotebookExecutionName": "my-execution",
            "Status": "RUNNING",
            "StartTime": 1593490857.009
        },
        {
            "NotebookExecutionId": "ex-IZWZYRS0M14L5V95WZ9OQ399SKMNW",
            "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
            "NotebookExecutionName": "my-execution",
            "Status": "STOPPED",
            "StartTime": 1593490292.995
        },
        {
            "NotebookExecutionId": "ex-IZX009ZK83IVY5E33VH8MDMELVK8K",
            "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
            "NotebookExecutionName": "my-execution",
            "Status": "FINISHED",
            "StartTime": 1593489834.765
        },
        {
            "NotebookExecutionId": "ex-IZWZXOZF88JWDF9J09GJ91R57VI0N",
            "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
            "NotebookExecutionName": "my-execution",
            "Status": "FAILED",
            "StartTime": 1593488934.688
        }
    ]
}
```

## List the executions for a notebook by start time and status
<a name="emr-managed-notebooks-headless-cli-list"></a>

The `list-notebook-executions` command can also take a `--status` parameter to filter results.

```
# filter by start time and status 
aws emr --region us-east-1 \                 
list-notebook-executions --from 1593400000.000 --status FINISHED
{
    "NotebookExecutions": [
        {
            "NotebookExecutionId": "ex-IZWZZVR9DKQ9WQ7VZWXJZR29UGHTE",
            "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
            "NotebookExecutionName": "my-execution",
            "Status": "FINISHED",
            "StartTime": 1593490857.009
        },
        {
            "NotebookExecutionId": "ex-IZX009ZK83IVY5E33VH8MDMELVK8K",
            "EditorId": "e-BKTM2DIHXBEDRU44ANWRKIU8N",
            "NotebookExecutionName": "my-execution",
            "Status": "FINISHED",
            "StartTime": 1593489834.765
        }
    ]
}
```

# Python samples for an EMR notebook
<a name="emr-managed-notebooks-headless-python"></a>

This topic contains a sample command file. The code example is an SDK for Python (Boto3) file called `demo.py`. It shows the notebook execution APIs.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

For information about the Amazon EMR API `NotebookExecution` actions, see [Amazon EMR API actions.](https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html)

```
import boto3,time

emr = boto3.client(
    'emr',
    region_name='us-west-1'
)     
     
start_resp = emr.start_notebook_execution(
    EditorId='e-40AC8ZO6EGGCPJ4DLO48KGGGI',
    RelativePath='boto3_demo.ipynb',
    ExecutionEngine={'Id':'j-1HYZS6JQKV11Q'},
    ServiceRole='EMR_Notebooks_DefaultRole'
)

execution_id = start_resp["NotebookExecutionId"]
print(execution_id)
print("\n")
     
describe_response = emr.describe_notebook_execution(NotebookExecutionId=execution_id)
     
print(describe_response)
print("\n")
     
list_response = emr.list_notebook_executions()
print("Existing notebook executions:\n")
for execution in list_response['NotebookExecutions']:
    print(execution)
    print("\n")  
     
print("Sleeping for 5 sec...")
time.sleep(5)
     
print("Stop execution " + execution_id)
emr.stop_notebook_execution(NotebookExecutionId=execution_id)
describe_response = emr.describe_notebook_execution(NotebookExecutionId=execution_id)
print(describe_response)
print("\n")
```

Here's the output from running `demo.py`.

```
ex-IZX56YJDW1D29Q1PHR32WABU2SAPK
     
{'NotebookExecution': {'NotebookExecutionId': 'ex-IZX56YJDW1D29Q1PHR32WABU2SAPK', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'ExecutionEngine': {'Id': 'j-1HYZS6JQKV11Q', 'Type': 'EMR'}, 'NotebookExecutionName': '', 'Status': 'STARTING', 'StartTime': datetime.datetime(2020, 8, 19, 0, 49, 19, 418000, tzinfo=tzlocal()), 'Arn': 'arn:aws:elasticmapreduce:us-west-1:123456789012:notebook-execution/ex-IZX56YJDW1D29Q1PHR32WABU2SAPK', 'LastStateChangeReason': 'Execution is starting for cluster j-1HYZS6JQKV11Q.', 'Tags': []}, 'ResponseMetadata': {'RequestId': '70f12c5f-1dda-45b7-adf6-964987d373b7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '70f12c5f-1dda-45b7-adf6-964987d373b7', 'content-type': 'application/x-amz-json-1.1', 'content-length': '448', 'date': 'Wed, 19 Aug 2020 00:49:22 GMT'}, 'RetryAttempts': 0}}
     
Existing notebook executions:
     
{'NotebookExecutionId': 'ex-IZX56YJDW1D29Q1PHR32WABU2SAPK', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'NotebookExecutionName': '', 'Status': 'STARTING', 'StartTime': datetime.datetime(2020, 8, 19, 0, 49, 19, 418000, tzinfo=tzlocal())}
     
     
{'NotebookExecutionId': 'ex-IZX5ABS5PR1E5AHMFYEMX3JJIORRB', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'NotebookExecutionName': '', 'Status': 'RUNNING', 'StartTime': datetime.datetime(2020, 8, 19, 0, 48, 36, 373000, tzinfo=tzlocal())}
     
     
{'NotebookExecutionId': 'ex-IZX5GLVXIU1HNI8BWVW057F6MF4VE', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'NotebookExecutionName': '', 'Status': 'FINISHED', 'StartTime': datetime.datetime(2020, 8, 19, 0, 45, 14, 646000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2020, 8, 19, 0, 46, 26, 543000, tzinfo=tzlocal())}
     
     
{'NotebookExecutionId': 'ex-IZX5CV8YDUO8JAIWMXN2VH32RUIT1', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'NotebookExecutionName': '', 'Status': 'FINISHED', 'StartTime': datetime.datetime(2020, 8, 19, 0, 43, 5, 807000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2020, 8, 19, 0, 44, 31, 632000, tzinfo=tzlocal())}
     
     
{'NotebookExecutionId': 'ex-IZX5AS0PPW55CEDEURZ9NSOWSUJZ6', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'NotebookExecutionName': '', 'Status': 'FINISHED', 'StartTime': datetime.datetime(2020, 8, 19, 0, 42, 29, 265000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2020, 8, 19, 0, 43, 48, 320000, tzinfo=tzlocal())}
     
     
{'NotebookExecutionId': 'ex-IZX57YF5Q53BKWLR4I5QZ14HJ7DRS', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'NotebookExecutionName': '', 'Status': 'FINISHED', 'StartTime': datetime.datetime(2020, 8, 19, 0, 38, 37, 81000, tzinfo=tzlocal()), 'EndTime': datetime.datetime(2020, 8, 19, 0, 40, 39, 646000, tzinfo=tzlocal())}
     
Sleeping for 5 sec...
Stop execution ex-IZX56YJDW1D29Q1PHR32WABU2SAPK
{'NotebookExecution': {'NotebookExecutionId': 'ex-IZX56YJDW1D29Q1PHR32WABU2SAPK', 'EditorId': 'e-40AC8ZO6EGGCPJ4DLO48KGGGI', 'ExecutionEngine': {'Id': 'j-1HYZS6JQKV11Q', 'Type': 'EMR'}, 'NotebookExecutionName': '', 'Status': 'STOPPING', 'StartTime': datetime.datetime(2020, 8, 19, 0, 49, 19, 418000, tzinfo=tzlocal()), 'Arn': 'arn:aws:elasticmapreduce:us-west-1:123456789012:notebook-execution/ex-IZX56YJDW1D29Q1PHR32WABU2SAPK', 'LastStateChangeReason': 'Execution is being stopped for cluster j-1HYZS6JQKV11Q.', 'Tags': []}, 'ResponseMetadata': {'RequestId': '2a77ef73-c1c6-467c-a1d1-7204ab2f6a53', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '2a77ef73-c1c6-467c-a1d1-7204ab2f6a53', 'content-type': 'application/x-amz-json-1.1', 'content-length': '453', 'date': 'Wed, 19 Aug 2020 00:49:30 GMT'}, 'RetryAttempts': 0}}
```

# Ruby samples for an EMR notebook
<a name="emr-managed-notebooks-headless-ruby"></a>

This topic contains a Ruby sample that demonstrate notebook functionality.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

The following Ruby code samples demonstrate using the notebook execution API.

```
# prepare an Amazon EMR client

emr = Aws::EMR::Client.new(
  region: 'us-east-1',
  access_key_id: 'AKIA...JKPKA',
  secret_access_key: 'rLMeu...vU0OLrAC1',
)
```

## Starting notebook execution and getting the execution id
<a name="emr-managed-notebooks-headless-ruby-startretrieve"></a>

In this example, the Amazon S3 editor and EMR notebook are `s3://amzn-s3-demo-bucket/notebooks/e-EA8VGAA429FEQTC8HC9ZHWISK/test.ipynb`.

For information about the Amazon EMR API `NotebookExecution` actions, see [Amazon EMR API actions.](https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html)

```
start_response = emr.start_notebook_execution({
    editor_id: "e-EA8VGAA429FEQTC8HC9ZHWISK",
    relative_path: "test.ipynb",
    
    execution_engine: {id: "j-3U82I95AMALGE"},
    
    service_role: "EMR_Notebooks_DefaultRole",
})


notebook_execution_id = start_resp.notebook_execution_id
```

## Describing notebook execution and printing the details
<a name="emr-managed-notebooks-headless-ruby-describeprint"></a>

```
describe_resp = emr.describe_notebook_execution({
    notebook_execution_id: notebook_execution_id
})
puts describe_resp.notebook_execution
```

The output from the above commands will be as follows.

```
{
:notebook_execution_id=>"ex-IZX3VTVZWVWPP27KUB90BZ7V9IEDG", 
:editor_id=>"e-EA8VGAA429FEQTC8HC9ZHWISK",
:execution_engine=>{:id=>"j-3U82I95AMALGE", :type=>"EMR", :master_instance_security_group_id=>nil}, 
:notebook_execution_name=>"", 
:notebook_params=>nil, 
:status=>"STARTING", 
:start_time=>2020-07-23 15:07:07 -0700, 
:end_time=>nil, 
:arn=>"arn:aws:elasticmapreduce:us-east-1:123456789012:notebook-execution/ex-IZX3VTVZWVWPP27KUB90BZ7V9IEDG", 
:output_notebook_uri=>nil, 
:last_state_change_reason=>"Execution is starting for cluster j-3U82I95AMALGE.", :notebook_instance_security_group_id=>nil, 
:tags=>[]
}
```

## Notebook filters
<a name="emr-managed-notebooks-headless-ruby-filters"></a>

```
"EditorId": "e-XXXX",           [Optional]
"From" : "1593400000.000",    [Optional]
"To" :
```

### Stopping notebook execution
<a name="emr-managed-notebooks-headless-ruby-stop"></a>

```
stop_resp = emr.stop_notebook_execution({
    notebook_execution_id: notebook_execution_id
})
```

# Enabling user impersonation to monitor Spark user and job activity
<a name="emr-managed-notebooks-spark-monitor"></a>

EMR Notebooks allows you to configure user impersonation on a Spark cluster. This feature helps you track job activity initiated from within the notebook editor. In addition, EMR Notebooks has a built-in Jupyter Notebook widget to view Spark job details alongside query output in the notebook editor. The widget is available by default and requires no special configuration. However, to view the history servers, your client must be configured to view Amazon EMR web interfaces that are hosted on the primary node.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

## Setting up Spark user impersonation
<a name="emr-managed-notebooks-user-impersonation"></a>

By default, Spark jobs that users submit using the notebook editor appear to originate from an indistinct `livy` user identity. You can configure user impersonation for the cluster so that these jobs are associated with the a user identity that ran the code instead. HDFS user directories on the primary node are created for each user identity that runs code in the notebook. For example, if user `NbUser1` runs code from the notebook editor, you can connect to the primary node and see that `hadoop fs -ls /user` shows the directory `/user/user_NbUser1`.

You enable this feature by setting properties in the `core-site` and `livy-conf` configuration classifications. This feature is not available by default when you have Amazon EMR create a cluster along with a notebook. For more information about using configuration classifications to customize applications, see [Configuring applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html) in the *Amazon EMR Release Guide*.

Use the following configuration classifications and values to enable user impersonation for EMR Notebooks:

```
[
    {
        "Classification": "core-site",
        "Properties": {
          "hadoop.proxyuser.livy.groups": "*",
          "hadoop.proxyuser.livy.hosts": "*"
        }
    },
    {
        "Classification": "livy-conf",
        "Properties": {
          "livy.impersonation.enabled": "true"
        }
    }
]
```

## Using the Spark job monitoring widget
<a name="emr-managed-notebooks-monitoring-widget"></a>

When you run code in the notebook editor that execute Spark jobs on the EMR cluster, the output includes a Jupyter Notebook widget for Spark job monitoring. The widget provides job details and useful links to the Spark history server page and the Hadoop job history page, along with convenient links to job logs in Amazon S3 for any failed jobs.

To view history server pages on the cluster primary node, you must set up an SSH client and proxy as appropriate. For more information, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md). To view logs in Amazon S3, cluster logging must be enabled, which is the default for new clusters. For more information, see [View log files archived to Amazon S3](emr-manage-view-web-log-files.md#emr-manage-view-web-log-files-s3).

The following is an example of the Spark job monitoring.

![\[alt_text\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/spark_monitoring_job_progress.png)


# EMR notebooks security and access control
<a name="emr-managed-notebooks-security"></a>

Several features are available to help you tailor the security posture of EMR Notebooks. This helps ensure that only authorized users have access to an EMR notebook, can work with notebooks, and use the notebook editor to execute code on the cluster. These features work along with the security features available for Amazon EMR and Amazon EMR clusters. For more information, see [Security in Amazon EMR](emr-security.md).
+ You can use AWS Identity and Access Management policy statements together with notebook tags to limit access. For more information, see [How Amazon EMR works with IAM](security_iam_service-with-iam.md) and [Example identity-based policy statements for EMR Notebooks](emr-fine-grained-cluster-access.md#emr-managed-notebooks-tags-examples).
+ Amazon EC2 security groups act as virtual firewalls that control network traffic between the cluster's primary instance and the notebook editor. You can use defaults or customize these security groups. For more information, see [Specifying EC2 security groups for EMR Notebooks](emr-managed-notebooks-security-groups.md).
+ You specify an AWS Service Role that determines what permissions an EMR notebook has when interacting with other AWS services. For more information, see [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md).

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

# Installing and using kernels and libraries in EMR Studio
<a name="emr-managed-notebooks-installing-libraries-and-kernels"></a>

Each EMR notebook comes with a set of pre-installed libraries and kernels. You can install additional libraries and kernels in an EMR cluster if the cluster has access to the repository where the kernels and libraries are located. For example, for clusters in private subnets, you might need to conﬁgure network address translation (NAT) and provide a path for the cluster to access the public PyPI repository to install a library. For more information about configuring external access for different network configurations, see [Scenarios and examples](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenarios.html) in the *Amazon VPC User Guide*.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).<a name="emr-managed-notebooks-serverless"></a>

EMR Serverless applications come with the following pre-installed libraries for Python and PySpark: 
+ **Python libraries** – ggplot, matplotlib, numpy, pandas, plotly, bokeh, scikit-learn, scipy, scipy
+ **PySpark libraries** – ggplot, matplotlib, numpy, pandas, plotly, bokeh, scikit-learn, scipy, scipy

## Installing kernels and Python libraries on a cluster primary node
<a name="emr-managed-notebooks-cluster-kernel"></a>

With Amazon EMR release version 5.30.0 and later, excluding 6.0.0, you can install additional Python libraries and kernels on the primary node of the cluster. After installation, these kernels and libraries are available to any user running an EMR notebook attached to the cluster. Python libraries installed this way are available only to processes running on the primary node. The libraries are not installed on core or task nodes and are not available to executors running on those nodes.

**Note**  
For Amazon EMR versions 5.30.1, 5.31.0, and 6.1.0, you must take additional steps in order to install kernels and libraries on the primary node of a cluster.   
To enable the feature, do the following:  
Make sure that the permissions policy attached to the service role for EMR Notebooks allows the following action:  
`elasticmapreduce:ListSteps`  
For more information, see [Service role for EMR Notebooks](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-service-role.html).
Use the AWS CLI to run a step on the cluster that sets up EMR Notebooks as shown in the following example. You must use the step name `EMRNotebooksSetup`. Replace *us-east-1* with the Region in which your cluster resides. For more information, see [Adding steps to a cluster using the AWS CLI](https://docs.aws.amazon.com/emr/latest/ManagementGuide/add-step-cli.html).  

   ```
   aws emr add-steps --cluster-id MyClusterID --steps Type=CUSTOM_JAR,Name=EMRNotebooksSetup,ActionOnFailure=CONTINUE,Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://awssupportdatasvcs.com/bootstrap-actions/EMRNotebooksSetup/emr-notebooks-setup.sh"]
   ```

You can install kernels and libraries using `pip` or `conda` in the `/emr/notebook-env/bin` directory on the primary node. 

**Example – Installing Python libraries**  
From the Python3 kernel, run the `%pip` magic as a command from within a notebook cell to install Python libraries.  

```
%pip install pmdarima
```
You may need to restart the kernel to use updated packages. You can also use the [https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-sh](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-sh) Spark magic to invoke `pip`.  

```
%%sh
/emr/notebook-env/bin/pip install -U matplotlib
/emr/notebook-env/bin/pip install -U pmdarima
```
When using a PySpark kernel, you can either install libraries on the cluster using `pip` commands or use notebook-scoped libraries from within a PySpark notebook.   
To run `pip` commands on the cluster from the terminal, first connect to the primary node using SSH, as the following commands demonstrate.  

```
sudo pip3 install -U matplotlib
sudo pip3 install -U pmdarima
```
Alternatively, you can use notebook-scoped libraries. With notebook-scoped libraries, your library installation is limited to the scope of your session and occurs on all Spark executors. For more information, see [Using Notebook Scoped Libraries](#emr-managed-notebooks-custom-libraries-limitations).   
If you want to package multiple Python libraries within a PySpark kernel, you can also create an isolated Python virtual environment. For examples, see [Using Virtualenv](https://spark.apache.org/docs/latest/api/python/tutorial/python_packaging.html#using-virtualenv).   
To create a Python virtual environment in a session, use the Spark property `spark.yarn.dist.archives` from the `%%configure` magic command in the first cell in a notebook, as the following example demonstrates.  

```
%%configure -f
{
   "conf": {
   "spark.yarn.appMasterEnv.PYSPARK_PYTHON":"./environment/bin/python",
   "spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON":"./environment/bin/python",
   "spark.yarn.dist.archives":"s3://amzn-s3-demo-bucket/prefix/my_pyspark_venv.tar.gz#environment",
   "spark.submit.deployMode":"cluster"
   }
}
```
You can similarly create a Spark executor environment.  

```
%%configure -f
{
   "conf": {
   "spark.yarn.appMasterEnv.PYSPARK_PYTHON":"./environment/bin/python",
   "spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON":"./environment/bin/python",
   "spark.executorEnv.PYSPARK_PYTHON":"./environment/bin/python",
   "spark.yarn.dist.archives":"s3://amzn-s3-demo-bucket/prefix/my_pyspark_venv.tar.gz#environment",
   "spark.submit.deployMode":"cluster"
   }
}
```
You can also use `conda` to install Python libraries. You don't need sudo access to use `conda`. You must connect to the primary node with SSH, and then run `conda` from the terminal. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md). 

**Example – Installing kernels**  
The following example demonstrates installing the Kotlin kernel using a terminal command while connected to the primary node of a cluster:  

```
sudo /emr/notebook-env/bin/conda install kotlin-jupyter-kernel -c jetbrains
```
These instructions do not install kernel dependencies. If your kernel has third-party dependencies, you may need to take additional setup steps before you can use the kernel with your notebook.

## Considerations and limitations with notebook-scoped libraries
<a name="emr-managed-notebooks-custom-libraries-limitations"></a>

When you use notebook-scoped libraries, consider the following:
+ Notebook-scoped libraries are available for clusters that you create with Amazon EMR releases 5.26.0 and higher.
+ Notebook-scoped libraries are intended to be used only with the PySpark kernel.
+ Any user can install additional notebook-scoped libraries from within a notebook cell. These libraries are only available to that notebook user during a single notebook session. If other users need the same libraries, or the same user needs the same libraries in a different session, the library must be re-installed.
+ You can uninstall only the libraries that were installed with the `install_pypi_package` API. You cannot uninstall any libraries that were pre-installed on the cluster.
+ If the same libraries with different versions are installed on the cluster and as notebook-scoped libraries, the notebook-scoped library version overrides the cluster library version.

## Working with Notebook-scoped libraries
<a name="emr-managed-notebooks-work-with-libraries"></a>

To install libraries, your Amazon EMR cluster must have access to the PyPI repository where the libraries are located.

The following examples demonstrate simple commands to list, install, and uninstall libraries from within a notebook cell using the PySpark kernel and APIs. For additional examples, see [Install Python libraries on a running cluster with EMR Notebooks](https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/) post on the AWS Big Data Blog.

**Example – Listing current libraries**  
The following command lists the Python packages available for the current Spark notebook session. This lists libraries installed on the cluster and notebook-scoped libraries.  

```
sc.list_packages()
```

**Example – Installing the Celery library**  
The following command installs the [Celery](https://pypi.org/project/celery/) library as a notebook-scoped library.  

```
sc.install_pypi_package("celery")
```
After installing the library, the following command confirms that the library is available on the Spark driver and executors.  

```
import celery
sc.range(1,10000,1,100).map(lambda x: celery.__version__).collect()
```

**Example – Installing the Arrow library, specifying the version and repository**  
The following command installs the [Arrow](https://pypi.org/project/arrow/) library as a notebook-scoped library, with a specification of the library version and repository URL.  

```
sc.install_pypi_package("arrow==0.14.0", "https://pypi.org/simple")
```

**Example – Uninstalling a library**  
The following command uninstalls the Arrow library, removing it as a notebook-scoped library from the current session.  

```
sc.uninstall_package("arrow")
```

# Associating Git-based repositories with EMR Notebooks
<a name="emr-git-repo"></a>

You can associate Git-based repositories with your Amazon EMR notebooks to save your notebooks in a version controlled environment. You can associate up to three repositories with a notebook. The following Git-based services are supported:
+ [AWS CodeCommit](https://aws.amazon.com/codecommit)
+ [GitHub](https://www.github.com)
+ [Bitbucket](https://bitbucket.org/)
+ [GitLab](https://about.gitlab.com/)

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

Associating Git-based repositories with your notebook has the following benefits.
+ **Version control** – You can record code changes in a version-control system so that you can review the history of your changes and selectively reverse them.
+ **Collaboration** – Colleagues working in diﬀerent notebooks can share code through remote Git-based repositories. Notebooks can clone or merge code from remote repositories and push changes back to those remote repositories.
+ **Code reuse** – Many Jupyter notebooks that demonstrate data analysis or machine learning techniques are available in publicly hosted repositories, such as GitHub. You can associate your notebooks with a repository to reuse the Jupyter notebooks contained in a repository.

To use Git-based repositories with EMR Notebooks, you add the repositories as resources in the Amazon EMR console, associate credentials for repositories that require authentication, and link them with your notebooks. You can view a list of repositories that are stored in your account and details about each repository in the Amazon EMR console. You can associate an existing Git-based repository with a notebook when you create it. 

**Topics**
+ [Prerequisites and considerations when integrating an EMR notebook with a repository](emr-managed-notebooks-git-considerations.md)
+ [Add a Git-based repository to Amazon EMR](emr-git-repo-add.md)
+ [Update or delete a Git-based repository from an EMR Studio Workspace](emr-git-repo-delete.md)
+ [Link or unlink a Git-based repository in EMR Studio](emr-git-repo-link.md)
+ [Create a new Notebook with an associated Git repository in EMR Studio](emr-git-repo-create-notebook.md)
+ [Use Git repositories in an EMR Studio Notebook](emr-git-repo-open.md)

# Prerequisites and considerations when integrating an EMR notebook with a repository
<a name="emr-managed-notebooks-git-considerations"></a>

Consider the following best practices regarding commits, permissions, and hosting when planning to integrate a Git-based repository with EMR Notebooks.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

## AWS CodeCommit
<a name="code-commit-considerations"></a>

If you use a CodeCommit repository, you must use Git credentials and HTTPS with CodeCommit. SSH Keys, and HTTPS with the AWS CLI credential helper are not supported. CodeCommit does not support personal access tokens (PATs). For more information, see [Using IAM with CodeCommit: Git credentials, SSH keys, and AWS access keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_ssh-keys.html) in the *IAM User Guide* and [Setup for HTTPS users using Git credentials](https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-gc.html) in the *AWS CodeCommit User Guide*.

## Access and permission considerations
<a name="access-considerations"></a>

Before associating a repository with your notebook, make sure that your cluster, IAM role for EMR Notebooks, and security groups have the correct settings and permissions. You can also configure Git-based repositories that you host in a private network by following the instructions in [Configure a privately-hosted Git repository for EMR Notebooks](#emr-managed-notebooks-private-git-repo).
+ **Cluster internet access** – The network interface that is launched has only a private IP address. This means that the cluster that your notebook connects to must be in a private subnet with a network address translation (NAT) gateway or must be able to access the internet through a virtual private gateway. For more information, see [Amazon VPC options](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-clusters-in-a-vpc.html).

  The security groups for your notebook must include an outbound rule that allows the notebook to route traffic to the internet from the cluster. We recommend that you create your own security groups. For more information, see [Specifying EC2 security groups for EMR Notebooks](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-security-groups.html).
**Important**  
If the network interface is launched into a public subnet, it won't be able to communicate with the internet through an internet gateway (IGW).
+ **Permissions for AWS Secrets Manager** – If you use Secrets Manager to store secrets that you use to access a repository, the [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md) must have a permissions policy attached that allows the `secretsmanager:GetSecretValue` action.

## Configure a privately-hosted Git repository for EMR Notebooks
<a name="emr-managed-notebooks-private-git-repo"></a>

Use the following instructions to configure privately-hosted repositories for EMR Notebooks. You must provide a configuration file with information about your DNS and Git servers. Amazon EMR uses this information to configure EMR notebooks that can route traffic to your privately-hosted repositories.

**Prerequisites**

Before you configure a privately-hosted Git repository for EMR Notebooks, you must have the following:
+ An Amazon S3 Control location where files for your EMR notebook will be saved.

**To configure one or more privately-hosted Git repositories for EMR Notebooks**

1. Create a configuration file using the provided template. Include the following values for each Git server that you want to specify in your configuration:
   + **`DnsServerIpV4`**- The IPv4 address of your DNS server. If you provide values for both `DnsServerIpV4` and `GitServerIpV4List`, the value for `DnsServerIpV4` takes precedence and will be used to resolve your `GitServerDnsName`.
**Note**  
To use privately-hosted Git repositories, your DNS server must allow inbound access from EMR Notebooks. We strongly recommend that you secure your DNS server against other, unauthorized access.
   + **`GitServerDnsName`** - The DNS name of your Git server. For example `"git.example.com"`.
   + **`GitServerIpV4List`** - A list of IPv4 addresses that belong to your Git server(s).

   ```
   [
       {
           "Type": "PrivatelyHostedGitConfig",
           "Value": [
               {
                   "DnsServerIpV4": "<10.24.34.xxx>",
                   "GitServerDnsName": "<enterprise.git.com>",
                   "GitServerIpV4List": [
                       "<xxx.xxx.xxx.xxx>",
                       "<xxx.xxx.xxx.xxx>"
                   ]
               },
               {
                   "DnsServerIpV4": "<10.24.34.xxx>",
                   "GitServerDnsName": "<git.example.com>",
                   "GitServerIpV4List": [
                       "<xxx.xxx.xxx.xxx>",
                       "<xxx.xxx.xxx.xxx>"
                   ]
               }
           ]
       }
   ]
   ```

1. Save your configuration file as `configuration.json`.

1. Upload the configuration file into your designated Amazon S3 storage location in a folder called `life-cycle-configuration`. For example, if your default S3 location is `s3://amzn-s3-demo-bucket/notebooks`, your configuration file should be located at `s3://amzn-s3-demo-bucket/notebooks/life-cycle-configuration/configuration.json`.
**Important**  
We strongly recommend that you restrict access to your `life-cycle-configuration` folder to only your EMR Notebooks administrators, and to the service role for EMR Notebooks. You should also secure `configuration.json` against unauthorized access. For instructions, see [Controlling access to a bucket with user policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/walkthrough1.html) or [Security Best Practices for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html).

   For upload instructions, see [Creating a folder](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html#create-folder) and [Uploading objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html) in the *Amazon Simple Storage Service User Guide*.

# Add a Git-based repository to Amazon EMR
<a name="emr-git-repo-add"></a>

Refer to the following sections for information on how to add a Git-based repository to an EMR notebook in the old console, or to an EMR Studio Workspace in the console.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

------
#### [ Console ]

Because EMR Notebooks are EMR Studio Workspaces in the new console, you can follow the instructions in [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md) to associate up to three Git repositories with your Workspace.

Alternatively, you can use the JupyterLab Git extension. Choose the **Git** icon from the left sidebar of your Jupyterlab notebook to access the extension. For information about the extension, see the [jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git) GitHub repo.

To associate a Git repository with a Workspace, your Studio administrator must take steps to configure the Studio to allow Git repository linking. For more information, see [Establish access and permissions for Git-based repositories](emr-studio-enable-git.md).

------

# Update or delete a Git-based repository from an EMR Studio Workspace
<a name="emr-git-repo-delete"></a>

Refer to the following sections for information on how to delete a Git-based repository from an EMR notebook in the old console, or from an EMR Studio Workspace in the console.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

------
#### [ Console ]

Because EMR Notebooks are EMR Studio Workspaces in the new console, you can you can refer to [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md) for more information on working with Git repositories in your Workspace. But at this time, you can't delete Git repositories from Workspaces.

------

# Link or unlink a Git-based repository in EMR Studio
<a name="emr-git-repo-link"></a>

Use the following steps to link or unlink a Git-based repository to an EMR notebook in the old console, or to an EMR Studio Workspace in the console.

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

------
#### [ Console ]

Because EMR Notebooks are EMR Studio Workspaces in the new console, you can you can refer to [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md) for more information on working with Git repositories in your Workspace. But at this time, you can't delete Git repositories from Workspaces.

------

## Understanding repository status
<a name="emr-managed-notebooks-repository-status"></a>

A Git repository may have any of the following status in the repository list. For more information about linking EMR notebooks with Git repositories, see [Link or unlink a Git-based repository in EMR Studio](#emr-git-repo-link).


| Status | Meaning | 
| --- | --- | 
|  Linking  |  The Git repository is being linked to the notebook. While the repository is **Linking**, you cannot stop the notebook.  | 
|  Linked  |  The Git repository is linked to the notebook. While the repository has a **Linked** status, it is connected to the remote repository.  | 
|  Link Failed  |  The Git repository failed to link to the notebook. You can try again to link it.  | 
|  Unlinking  |  The Git repository is being unlinked from the notebook. While the repository is **Unlinking**, you cannot stop the notebook. Unlinking a Git repository from a notebook only disconnects it from the remote repository; it doesn't delete any code from the notebook.  | 
|  Unlink Failed  |  The Git repository failed to unlink from the notebook. You can try again to unlink it.  | 

# Create a new Notebook with an associated Git repository in EMR Studio
<a name="emr-git-repo-create-notebook"></a>

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

**To create a notebook and associate it with Git repositories in the old Amazon EMR console**

1. Follow the instructions at [Create a Notebook in EMR Studio](emr-managed-notebooks-create.md).

1. For **Security group**, choose **Use your own security group**.
**Note**  
The security groups for your notebook must include an outbound rule to allow the notebook to route traffic to the internet via the cluster. We recommend that you create your own security groups. For more information, see [Specifying EC2 security groups for EMR Notebooks](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-security-groups.html).

1. For **Git repositories**, **Choose repository** to associate with the notebook.

   1. Choose a repository that is stored as a resource in your account, and then choose **Save**.

   1. To add a new repository as a resource in your account, choose **add a new repository**. Complete the **Add repository** workflow in a new window. 

# Use Git repositories in an EMR Studio Notebook
<a name="emr-git-repo-open"></a>

**Note**  
EMR Notebooks are available as EMR Studio Workspaces in the console. The **Create Workspace** button in the console lets you create new notebooks. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see [Amazon EMR Notebooks are Amazon EMR Studio Workspaces in the console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-migration.html) and [Amazon EMR console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html).

You can choose to **Open in JupyterLab** or **Open in Jupyter** when you open a notebook. 

If you choose to open the notebook in Jupyter, a list of expandable files and folders within the notebook are displayed. You can manually run Git commands like the following in a notebook cell. 

```
!git pull origin primary
```

To open any of the additional repositories, navigate to other folders. 

If you choose to open the notebook with a JupyterLab interface, you can use the pre-installed JupyterLab Git extension. For information about the extension, see [jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git).