# Notebooks


## Overview


Notebooks in Amazon SageMaker Unified Studio provide an interactive environment for data analysis, exploration, engineering, and machine learning workflows. You can run SQL, Python, and natural language queries to discover, transform, analyze, visualize, and share insights on data at scale.

Amazon SageMaker Unified Studio offers multiple coding experiences to meet different development preferences and use cases. JupyterLab IDE provides a traditional Jupyter notebook environment with extensive customization options and plugin support. Code Editor, based on [Code-OSS, Visual Studio Code - Open Source](https://github.com/microsoft/vscode#visual-studio-code---open-source-code---oss), helps you write, test, debug, and run your analytics and machine learning code. Code Editor extends and is fully integrated with Amazon SageMaker Unified Studio. The new notebook experience, documented in this guide, provides a streamlined, AI-enhanced interface optimized for data analysis workflows with built-in visualization capabilities and seamless integration with AWS data services.

Notebooks support multiple cell types including Python code cells, SQL code cells, markdown cells, table cells, and chart cells. Each notebook runs on a managed compute environment that you can configure based on your processing requirements. You can use spark code to leverage [Amazon Athena for Apache Spark](https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark.html). Athena for Spark makes it easy to interactively run data analytics and exploration using Apache Spark without the need to plan for, configure, or manage resources. You can transition between local Python and remote Spark workloads from a single notebook.

The notebook interface integrates with AI assistance through SageMaker Data Agent, the AI agent that helps generate code, diagnose errors, and provide data analysis recommendations.

**Note**  
SageMaker notebooks are only available in IAM-based domains.

**Note**  
SageMaker Notebooks do not support VPC. For VPC support, you can use JupyterLab spaces.

## Key capabilities


1. Execute Python, Spark, and SQL code in interactive cells

1. Integrate with Amazon Athena for Apache Spark for distributed processing

1. Connect to multiple data sources including Amazon Simple Storage Service, Amazon S3 Tables, AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift. List of supported sources can be found [here](https://docs.aws.amazon.com/sagemaker-lakehouse-architecture/latest/userguide/lakehouse-data-connection.html#lakehouse-data-connection-supported). 

1. Work with Apache Iceberg REST Catalogs located anywhere to read/write Iceberg tables using Iceberg REST APIs in Python/SQL.

1. Visualize data with interactive tables and charts

1. Auto code completion, formatting, linting supported in Cell editor

1. Use AI assistance for code generation and error diagnosis

1. Manage compute environments with configurable instance types

1. Export notebooks in multiple formats including Jupyter, and Python files

1. Install and manage Python packages

## Roles and permissions


To use notebooks in Amazon SageMaker Unified Studio, you need:

1. Access to an Amazon SageMaker Unified Studio domain

1. Appropriate IAM permissions to access data sources

1. Project membership with notebook creation permissions

# Create and manage notebooks


## Overview


You can create new notebooks in Amazon SageMaker Unified Studio to start data analysis workflows. Notebooks are automatically saved as you work, and you can organize them within your project structure.

The notebook interface provides access to sample notebooks that demonstrate common data analysis patterns. You can copy these samples to create starting points for your own analysis.

## Procedure


1. Navigate to the Notebooks section in your Amazon SageMaker Unified Studio project.

1. Click Create notebook to start a new notebook.

1. Enter a name for your notebook or use the auto-generated name.

1. A Python cell is added by default. You can begin by adding Python code or adding cells to your notebook using the buttons (Python, SQL, Markdown, Table, Charts).

1. Your notebook saves automatically as you work.

To access sample notebooks:

1. In the Notebooks section, review the Build with sample data section.

1. Select a sample notebook that matches your use case.

1. Click on the sample to open it in read-only mode.

1. Copy the sample notebook to create your own editable version.

You can view all your notebooks in the notebooks list, which shows the name, ID, last updated information, and creation details for each notebook.

## Examples


Amazon SageMaker Unified Studio comes with loaded sample notebooks for easy getting started and understanding the available capabilities. To view the available samples, go to the overview or notebooks pages in Amazon SageMaker Unified Studio. In addition, here are some cell code examples to demonstrate different cell types you can use in your notebooks:

Python cell example:

```
import pandas as pd
import numpy as np

# Create sample data
sample_data = {
    'product_id': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones'],
    'category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Accessories'],
    'price': [999.99, 29.99, 79.99, 299.99, 149.99],
    'in_stock': [True, True, False, True, True]
}

df = pd.DataFrame(sample_data)
df
```

SQL cell example:

```
SELECT
    category,
    COUNT(*) as product_count,
    AVG(price) as avg_price,
    SUM(CASE WHEN in_stock THEN 1 ELSE 0 END) as in_stock_count
FROM df
GROUP BY category
ORDER BY avg_price DESC;
```

Markdown cell example:

```
# Data Analysis Report

## Overview
This notebook analyzes product inventory and pricing data.

## Key Findings
1. Electronics have higher average prices than accessories
2. Inventory levels are generally well-maintained
3. Price distribution shows clear category segmentation

## Next Steps
1. Analyze seasonal trends
2. Review pricing strategy
3. Optimize inventory levels
```

Spark cell example in Python:

```
from pyspark.sql.connect import functions as F

# Create Spark DataFrame
spark_df = spark.createDataFrame(df)

# Perform aggregations using Spark Connect
result = (
    spark_df.groupBy("category")
    .agg(
        F.count("*").alias("product_count"),
        F.avg("price").alias("avg_price"),
        F.count(F.when(F.col("in_stock") == True, 1)).alias("in_stock_count"),
    )
    .orderBy(F.col("avg_price").desc())
)
result.show()
```

# Work with cells


## Overview


Notebooks support a variety of languages such as Python, SQL, and Markdown. Each cell is associated with a language, and the editor in the cell supports functionality such as auto code complete, formatting, and linting.

All the code in the notebook is executed on the notebook kernel, which is built on IPython. The notebook kernel runs on SageMaker notebook compute, which has a configurable form factor that includes different types of instances. When cells are executed, they may produce output that is shown below each cell. Notebooks support rich rendering of data frames (pandas or Spark), where the output is rendered in an interactive data table and charts.

## Procedure


To create and run Python code:

1. Choose **Python** to add a new Python cell.

1. Enter your Python code in the cell editor.

1. Choose the play icon or press **Shift\$1Enter** to run the cell.

1. View the results displayed below the cell.

All notebook code is executed in the notebook kernel, which runs on SageMaker compute. You can configure the form factor of this compute. The notebook runs an IPython kernel that can execute Python code. For larger scale data processing, the notebook's Python environment comes with Spark without requiring you to configure or manage any infrastructure. You can start writing Spark code to run interactive analytics and exploration on serverless, autoscalable Athena Spark.

To create and run SQL code:

1. Choose **SQL** to add a new SQL cell.

1. Select your data connection from the dropdown if prompted.

1. Enter your SQL query in the cell editor.

1. Choose the play icon or press **Shift\$1Enter** to run the cell.

1. View the query results in the interactive table below the cell.

SQL cells can query your existing Python data frames using DuckDB, or run SQL against Athena (SQL), Athena Spark, or any other connection to first-party and third-party engines like Amazon Redshift, Snowflake, and BigQuery. Add a connection from the [available connections that are supported](https://docs.aws.amazon.com/sagemaker-lakehouse-architecture/latest/userguide/lakehouse-data-connection.html).

### Run multiple SQL statements in a single cell


You can write and run multiple SQL statements in a single SQL cell by separating each statement with a semicolon (`;`). When you run a cell that contains multiple statements, the notebook executes them sequentially and displays the results in a tabbed interface.

To run multiple SQL statements:

1. In a SQL cell, enter two or more SQL statements separated by semicolons. For example:

   ```
   SELECT COUNT(*) FROM customers WHERE region = 'US';
   SELECT COUNT(*) FROM customers WHERE region = 'EU';
   SELECT COUNT(*) FROM customers WHERE region = 'ASIA';
   ```

1. Choose the play icon or press **Shift\$1Enter** to run the cell.

1. The output area displays a row of tabs labeled **Result 1**, **Result 2**, **Result 3**, and so on — one tab for each statement.

1. Choose a tab to view the results for that statement.

Each tab displays a status indicator:
+ A success indicator (✓) when the statement completed successfully.
+ An error indicator (⚠) when the statement encountered an error. Choose the tab to view the error details.

### Reference multi-statement SQL results in Python


When you run multiple SQL statements in a single cell, the notebook creates a separate data frame variable for each statement result. The variables follow this naming convention:
+ `dataframe_name` — A list containing all results.
+ `dataframe_name_0` — The result of the first statement.
+ `dataframe_name_1` — The result of the second statement.
+ `dataframe_name_2` — The result of the third statement, and so on.

For example, if the cell's data frame variable is `df_cell_1`, the following variables are created:


| Variable | Description | 
| --- | --- | 
| df\$1cell\$11 | A list containing all statement results | 
| df\$1cell\$11\$10 | Result of the first statement (Result 1 tab) | 
| df\$1cell\$11\$11 | Result of the second statement (Result 2 tab) | 
| df\$1cell\$11\$12 | Result of the third statement (Result 3 tab) | 

You can reference these variables in subsequent Python cells:

```
# Access the first statement's result
print(df_cell_1_0.head())

# Access by index from the list
print(df_cell_1[1].head())
```

You can rename the base data frame variable using the variable name editor in the cell's symbol bar. When you rename the base variable, all indexed variables update automatically. For example, renaming `df_cell_1` to `sales_data` updates the indexed variables to `sales_data_0`, `sales_data_1`, and so on.

**Note**  
Individual indexed variables (such as `sales_data_0`) cannot be renamed independently. Only the base data frame name can be changed.

### Visualize multi-statement results


Each result tab supports the full set of visualization options independently. You can:
+ Switch between table and chart views for each result tab separately.
+ Apply column filters, sorting, and pinning per result tab.
+ Create different chart types for different statement results.

Visualization settings for the first result tab are saved with the notebook. Settings for additional result tabs are maintained during your current session but reset when you reopen the notebook.

### Rename cells


You can assign custom names to cells to make them easier to identify and navigate, especially in large notebooks. By default, cells are labeled with sequential numbers (1, 2, 3, and so on).

To rename a cell:

1. Choose the cell number in the cell header. The number becomes an editable text field.

1. Enter a custom name for the cell.

1. Press **Enter** or choose outside the text field to save the name.

The custom name replaces the default number in the cell header. This is useful for:
+ Identifying the purpose of specific cells at a glance (for example, "Data cleanup" or "Model training").
+ Navigating large notebooks more efficiently.
+ Making notebooks easier for collaborators to understand.

### Keyboard shortcuts


Amazon SageMaker Unified Studio notebooks provide keyboard shortcuts that enable you to perform common actions and navigate within your notebooks. These shortcuts are familiar to developers who have used other Python or SQL notebook editors.

To view all available shortcuts, press **⌘\$1Shift\$1/** (Mac) or **Ctrl\$1Shift\$1/** (Windows/Linux).

Notebooks support two modes for keyboard shortcuts: edit mode and command mode. Shortcuts without modifier keys are generally used in command mode.

#### Edit mode


Edit mode is active when you are editing the content of a cell. In this mode, keyboard shortcuts require modifier keys to avoid interfering with normal text input.


| Action | Mac | Windows/Linux | 
| --- | --- | --- | 
| Run the current cell | ⌘\$1Enter | Ctrl\$1Enter | 
| Run cell and select next cell | Shift\$1Enter | Shift\$1Enter | 
| Toggle Gen AI prompting | Option\$1A | Alt\$1A | 

#### Command mode


To enter command mode, press **Esc**. A blue ring appears around the selected cell to indicate that command mode is active.

In command mode, the following actions are available in addition to the edit mode shortcuts:

**Navigation**


| Action | Shortcut | 
| --- | --- | 
| Select above cell | Up | 
| Select below cell | Down | 
| Edit cell content | Enter | 

**Cell creation**


| Action | Shortcut | 
| --- | --- | 
| Add cell above | A | 
| Add cell below | B | 
| Add Python cell | P | 
| Add SQL cell | Q | 
| Add Markdown cell | M | 
| Add table cell | T | 
| Add charts cell | C | 

**Cell operations**


| Action | Mac | Windows/Linux | 
| --- | --- | --- | 
| Copy cell | ⌘\$1C | Ctrl\$1C | 
| Paste cell | ⌘\$1V | Ctrl\$1V | 
| Duplicate cell | ⌘\$1D | Ctrl\$1D | 
| Run cell and insert cell below | Option\$1Enter | Alt\$1Enter | 
| Save notebook | ⌘\$1S | Ctrl\$1S | 

**Tip**  
When the intercell menu is open, you can use the typed cell shortcuts (**P**, **Q**, **M**, **T**, **C**) to add a cell of a specific type.

To add documentation:

1. Choose **Markdown** to add a Markdown cell.

1. Enter your documentation using Markdown syntax.

1. Choose the play icon or press **Shift\$1Enter** to render the formatted text.

To reference data between cells: Python variables created in one cell are available in subsequent cells. SQL query results can be referenced by variable name in Python cells. You can also use the variable explorer in the left navigation to see all available variables and their schemas.

# Visualize and explore data


## Overview


Amazon SageMaker Unified Studio notebooks provide rich data visualization and exploration capabilities. Data frames automatically render as interactive tables, and you can create dedicated chart cells for custom visualizations.

On the left navigation, the data explorer provides access to your data catalog for discovering and accessing datasets. The variable explorer shows all active variables in your notebook session, including their data types and schemas.

## Procedure


To view data in interactive tables:

1. Execute a Python or SQL cell that returns a data frame – compatible types include pandas, pyarrow, pyspark. Note: There is a limit of 20,000 rows for loading on the interactive tables and charts.

1. The results automatically display as an interactive table below the cell.

1. Use the table controls to filter, sort, and explore the data.

1. Click column headers to see data distribution visualizations.

To create custom charts:

1. Click the Charts button to add a chart cell.

1. Select the data frame you want to visualize from the dropdown.

1. Choose your chart type and configure the axes.

1. The chart renders automatically based on your selections.

To explore variables:

1. Open the variable explorer panel in the notebook interface.

1. View all active variables, their types, and memory usage.

1. Click on data frame variables to expand and see their schema.

1. Use variable names to reference data in new cells.

To access the data catalog:

1. Open the data explorer panel.

1. Navigate through your available data catalogs and databases.

1. Use the actions menu to read data directly into your notebook.

1. Generate code to access specific tables or datasets.

# Manage compute environments


## Overview


Each notebook runs on a managed compute environment that provides the processing power for code execution. You can start, stop, and configure compute environments based on your workload requirements.

Compute environments support different instance types and sizes. You can also configure automatic shutdown to manage costs when the notebook is idle.

## Procedure


To start a compute environment:

1. Open your notebook if the compute environment is not already running.

1. The compute environment starts automatically when you execute your first cell.

1. Monitor the status in the kernel footer at the bottom right of the notebook.

To stop a compute environment:

1. Click the kernel status indicator in the notebook footer.

1. Select Stop to shut down the compute environment.

1. Your notebook content is preserved, but variables and session state are lost.

To change the instance type and storage:

1. Stop the current compute environment if it's running.

1. Click the kernel status indicator and select Configure.

1. Choose a different instance type and storage from the available options.

1. Start the compute environment with the new configuration.

To manage packages on kernel:

1. Open your notebook

1. Go to Packages on the left navigation

1. Add a new package or delete existing packages as required

To configure idle shutdown:

1. Access the idle shutdown setting through the left navigation.

1. Set the idle timeout period for automatic shutdown. The default shutdown is 1 hour.

1. The environment will automatically stop after the specified idle time.

To view Spark UI and Driver Logs:

1. Open your notebook

1. Navigate to the kernel footer at the bottom of the notebook, you can click on Spark UI and Driver Logs to open in a separate tab.

# Connect to data sources


## Overview


Amazon SageMaker Unified Studio notebooks can connect to multiple data sources including Amazon Simple Storage Service, AWS Glue Data Catalog, Amazon Athena, Amazon Redshift, and third-party sources. You can query data directly from these sources using SQL cells or Python code.

The notebook interface provides built-in connectors for AWS services and supports custom connections for external data sources. Data connections are configured at the project level and shared across notebooks.

## Prerequisites


1. Configured data connections in your Amazon SageMaker Unified Studio project

1. Appropriate IAM permissions to access data sources

1. Network connectivity to external data sources if applicable

## Procedure


To query data from Amazon Simple Storage Service:

1. Create a Python cell and use the AWS SDK to access S3 objects.

1. Use pandas or other libraries to read data from S3 into data frames.

1. Reference the data frame variables in subsequent cells.

To query AWS Glue Data Catalog tables:

1. Create a SQL cell and select the Athena (SQL) connection.

1. Write SQL queries against your cataloged tables.

1. The queries execute using Amazon Athena.

1. Results display as interactive tables below the cell.

To connect to Amazon Redshift:

1. Create a SQL cell and select your Redshift connection from the dropdown.

1. Write SQL queries against your Redshift data warehouse.

1. Execute the queries to retrieve results into the notebook.

To use Amazon Athena for Apache Spark:

1. Create Python cells that use Spark DataFrames for large-scale data processing.

1. Create SQL cells and choose the Athena (Spark) Connection to write SQL queries against your Spark DataFrames. 

1. Reference Spark DataFrames by variable name to see rich table visualizations.

1. Access the Spark UI to monitor job progress and performance.

To work with third-party data sources:

1. Configure connections to external sources like Snowflake in your project settings. You can [view supported data sources here](https://docs.aws.amazon.com/sagemaker-lakehouse-architecture/latest/userguide/lakehouse-data-connection.html).

1. Use SQL cells with the appropriate connection to query external data.

1. Combine data from multiple sources using Python code to join datasets.

# Import and export notebooks


## Overview


Amazon SageMaker Unified Studio supports importing and exporting notebooks. You can import notebook files from your local machine to migrate work from other environments, and export notebooks in multiple formats for sharing and offline use.

## Import a notebook


You can import a notebook file from your local machine into your project. The import process converts the file to a Amazon SageMaker Unified Studio notebook and recreates cells, cell outputs, and metadata from the source file.

Supported import formats:
+ Jupyter notebook (`.ipynb`)
+ Amazon SageMaker Unified Studio native JSON (`.json`)
+ Python script (`.py`)

**To import a notebook**

1. In the navigation pane, choose **Notebooks**.

1. Choose the dropdown arrow next to **Create notebook**, and then choose **Import notebook**.

1. In the **Import notebook** dialog box, choose **Choose file**, or drag and drop a file. The file must be in `.ipynb`, `.json`, or `.py` format, with a maximum size of 50 MiB.

1. Choose **Import**.

Once imported, the notebook opens and ready to use. Here, you can edit the name, description, cells and run the notebook.

What gets preserved during import:
+ All cell types: Document, Code, and Visualization
+ Cell outputs and execution history, when available in the source file
+ Amazon SageMaker Unified Studio metadata such as connection IDs and cell configurations, when available

The following table describes the import limits.


| Resource | Limit | 
| --- | --- | 
| Maximum file size | 50 MiB | 
| Maximum cells per notebook | 200 | 
| Maximum cell content size | 300 KB | 
| Maximum cell run results per cell | 100 | 
| Maximum cell run results size | 300 KB | 

**Note**  
The notebook cannot be modified while an import is in progress. Import is all-or-nothing — either the entire notebook imports successfully, or the import fails.

## Export a notebook


You can export a notebook to your local machine in multiple formats for sharing and offline use.

Supported export formats:
+ Jupyter notebook with requirements (`.zip`)
+ Jupyter notebook (`.ipynb`)
+ Python (`.py`)
+ Notebook (`.json`)

**To export a notebook**

1. Open the notebook you want to export.

1. In the notebook toolbar, choose the more options menu (three vertical dots).

1. Choose **Download**, and then select your desired format:
   + Jupyter notebook with requirements (`.zip`) — includes the notebook and a requirements file for reproducing the environment
   + Jupyter notebook (`.ipynb`) — for use with other Jupyter-compatible environments
   + Python (`.py`) — extracts code cells as a Python script
   + Notebook (`.json`) — Amazon SageMaker Unified Studio native format

The file downloads to your local machine.

**To download results and data**

1. Execute code that generates output files or processed datasets.

1. Use the download interface to save files to your local machine or Copy to clipboard.

1. Select the output format (CSV or JSON or pdf).