

# SQL extension features and usage
Features overview and usage

This section details the various features of the JupyterLab SQL extension in Studio, and provides instructions on how to use them. Before you can use the SQL extension to access and query data from your JupyterLab notebooks, an administrator must first configure the connection to your data sources. For information on how administrators can create connections to data sources, see [SQL extension data source connections](sagemaker-sql-extension-datasources-connection.md).

**Note**  
To use the SQL extension, your JupyterLab application must run on a [SageMaker AI distribution](https://github.com/aws/sagemaker-distribution/blob/main/README.md) image version 1.6 or higher. These SageMaker images have the extension pre-installed.

The extension provides two components to help you access, discover, query, and analyze data from pre-configured data sources.
+ Use the *user interface* of the SQL extension to discover and explore your data sources. The UI capabilities can be further divided into the following subcategories.
  + With the **data exploration** UI element, you can browse your data sources and explore their tables, columns, and metadata. For details on the data exploration features of the SQL extension, see [Browse data using SQL extension](sagemaker-sql-extension-features-data-discovery.md).
  + The **connection caching** element caches connections for quick access. For details on connection caching in the SQL extension, see [SQL extension connection caching](sagemaker-sql-extension-features-connection-caching.md).
+ Use the *SQL Editor and Executor* to write, edit, and run SQL queries against connected data sources.
  + With the **SQL editor** element, you can write, format, and validate SQL statements within the notebooks of your JupyterLab application in Studio. For details on the SQL editor features, see [SQL editor features of the JupyterLab SQL extension](sagemaker-sql-extension-features-editor.md).
  + With the **SQL execution** element, you can run your SQL queries and visualize their results from the notebooks of your JupyterLab application in Studio. For details on the SQL execution capabilities, see [SQL execution features of the JupyterLab SQL extension](sagemaker-sql-extension-features-sql-execution.md).

# Browse data using SQL extension
Browse data

To open the SQL extension user interface (UI), choose the SQL extension icon (![\[Purple circular icon with a clock symbol representing time or scheduling.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/sqlexplorer/sqlexplorer-icon.png)) in the navigation pane of your JupyterLab application in Studio. The left panel data discovery view expands and displays all pre-configured data store connections to Amazon Athena, Amazon Redshift, and Snowflake.

From there, you can:
+ Expand a specific connection to explore its databases, schemas, tables or views, and columns.
+ Search for a specific connection using the search box in the SQL extension UI. The search returns any databases, schemas, tables, or views that partially match the string you enter.

**Note**  
If Athena is already set up in your AWS account, you can enable a `default-athena-connection` in your JupyterLab application. This allows you to run Athena queries without needing to manually create the connection. To enable the default Athena connection:  
Check with your administrator that your execution role has the required permissions to access Athena and the AWS Glue catalog. For details on the permissions required, see [Configure an AWS Glue connection for Athena](sagemaker-sql-extension-datasources-glue-connection.md#sagemaker-sql-extension-athena-glue-connection-config)
In your JupyterLab application, navigate to the **Settings** menu in the top navigation bar and open the **Settings Editor** menu.
Choose **Data Discovery**.
Check the box for **Enable default Athena connection**.
You can update the default `primary` WorkGroup if needed.

To query a database, schema, or table in a JupyterLab notebook, from a given connection in the SQL extension pane:
+ Choose the three dots icon (![\[SQL extension three dots icon.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/sqlexplorer/sqlexplorer-3dots-icon.png)) on the right side of any database, schema, or table.
+ Select **Query in notebook** from the menu.

  This automatically populates a notebook cell in JupyterLab with the relevant `%%sm_sql` magic command to connect to the data source. It also adds a sample SQL statement to help you start querying right away. You can further refine the SQL query using the auto-complete and highlighting features of the extension. See [SQL editor features of the JupyterLab SQL extension](sagemaker-sql-extension-features-editor.md) for more information on using the SQL extension SQL editor.

At the table level, the three dots icon provides the additional option to choose to **Preview** a table's metadata.

The JupyterLab notebook cell content below shows an example of what is automatically generated when selecting the **Query in notebook** menu on a `redshift-connection` data source in the SQL extension pane.

```
%%sm_sql --metastore-id redshift-connection --metastore-type GLUE_CONNECTION

-- Query to list tables from schema 'dev.public'
SHOW TABLES
FROM
  SCHEMA "dev"."public"
```

Use the *less than* symbol (![\[Icon to clear the SQL extension search box.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/sqlexplorer/sqlexplorer-search-clear.png)) at the top of the SQL extension pane to clear the search box or return to the list of your connections.

**Note**  
The extension caches your exploration results for fast access. If the cached results are outdated or a connection is missing from your list, you can manually refresh the cache by choosing the **Refresh** button at the bottom of the SQL extension panel. For more information on connection caching, see [SQL extension connection caching](sagemaker-sql-extension-features-connection-caching.md).

# SQL editor features of the JupyterLab SQL extension
SQL editor

The SQL extension provides magic commands that enable the SQL editor functionalities within your JupyterLab notebook cells.

If you are a user of the SageMaker distribution image version 1.6, you must load the SQL extension magic library by running `%load_ext amazon_sagemaker_sql_magic` in a JupyterLab notebook. This turns on SQL editing features.

For users of SageMaker distribution image versions 1.7 and later, no action is needed, the SQL extension loads automatically.

Once the extension is loaded, add the `%%sm_sql` magic command at the beginning of a cell to activate the following capabilities of the SQL editor.
+ **Connection-selection dropdown**: Upon adding an `%%sm_sql` magic command to a cell, a dropdown menu appears at the top of the cell with your available data source connections. Select a connection to automatically fill in the parameters needed to query that data source. The following is an example of an `%%sm_sql` magic command string generated by selecting the connection named `connection-name`. 

  ```
  %%sm_sql --metastore-type GLUE_CONNECTION --metastore-id connection-name
  ```

  Use the SQL editor's features below to build your SQL queries, then run the query by running the cell. For more information on the SQL execution capabilities, see [SQL execution features of the JupyterLab SQL extension](sagemaker-sql-extension-features-sql-execution.md).
+ **Query result dropdown**: You can specify how to render query results by selecting a result type from the dropdown menu next to your connection-selection dropdown menu. Choose between the following two alternatives:
  + **Cell Output**: (default) This option displays the result of your query in the notebook cell output area.
  + **Pandas Dataframe**: This option populates a pandas DataFrame with the query results. An extra input box lets you name the DataFrame when you choose this option.
+ **SQL syntax highlight**: The cell automatically visually distinguishes SQL keywords, clauses, operators, and more by color and styling. This makes SQL code easier to read and understand. Keywords such as `SELECT`, `FROM`, `WHERE`, and built-in functions such as `SUM` and `COUNT`, or clauses such as `GROUP BY` and more are highlighted in a different color and bold style.
+ **SQL formatting**: You can apply consistent indents, capitalization, spacing, and line breaks to group or separate SQL statements and clauses in one of the following ways. This makes SQL code easier to read and understand.
  + Right-click on the SQL cell and choose **Format SQL**.
  + When the SQL cell is in focus, use the *ALT \$1 F* shortcut on Windows or *Option \$1 F* on MacOS.
+ **SQL auto-completion**: The extension provides automatic suggestions and completion of SQL keywords, functions, table names, column names, and more as you type. As you start typing an SQL keyword such as `SELECT` or `WHERE`, the extension displays a pop-up with suggestions to auto-complete the rest of the word. For example, when typing table or column names, it suggests matching table and column names defined in the database schema.
**Important**  
To enable SQL auto-completion in JupyterLab notebooks, users of the SageMaker AI distribution image version 1.6 must run the following `npm install -g vscode-jsonrpc sql-language-server` command in a terminal. After the installation completes, restart the JupyterLab server by running `restart-jupyter-server`.  
For users of SageMaker distribution image versions 1.7 and later, no action is needed.

  The cell offers two methods for auto-completing recognized SQL keywords:
  + Explicit invocation (recommended): Choose the **Tab** key to initiate the context-aware suggestion menu, then choose **Enter** to accept the suggested item.
  + Continuous hinting: The cell automatically suggests completions as you type.
**Note**  
Auto-completion is only triggered if the SQL keywords are in uppercase. For instance, entering `SEL` prompts for `SELECT`, but typing `sel` does not.
The first time you connect to a data source, SQL auto-completion indexes the data source's metadata. This indexing process may take some time to complete depending on the size of your databases.

# SQL execution features of the JupyterLab SQL extension
SQL execution

You can execute SQL queries against your connected data sources in the SQL extension of JupyterLab. The following sections explain the most common parameters for running SQL queries inside JupyterLab notebooks:
+ Create a simple connection in [Create a simple magic command connection string](sagemaker-sql-extension-features-sql-execution-create-connection.md).
+ Save your query results in a pandas DataFrame in [Save SQL query results in a pandas DataFrame](sagemaker-sql-extension-features-sql-execution-save-dataframe.md).
+ Override or add to connection properties defined by your administrator in [Override connection properties](sagemaker-sql-extension-features-sql-execution-override-connection.md).
+ [Use query parameters to provide dynamic values in SQL queries](sagemaker-sql-extension-features-sql-execution-query-parameters.md).

When you run a cell with the `%%sm_sql` magic command, the SQL extension engine executes the SQL query in the cell against the data source specified in the magic command parameters.

To see the details of the magic command parameters and supported formats, run `%%sm_sql?`.

**Important**  
To use Snowflake, users of the SageMaker distribution image version 1.6 must install the Snowflake Python dependency by running the following `micromamba install snowflake-connector-python -c conda-forge` command in a terminal of their JupyterLab application. Restart the JupyterLab server by running `restart-jupyter-server` in the terminal after the installation is complete.  
For SageMaker distribution image versions 1.7 and later, the Snowflake dependency is pre-installed. No action is needed.

# Create a simple magic command connection string
Create a simple connection

If your administrator has configured the connections to your data sources, follow these steps to easily create a connection string in a notebook cell:

1. Open a notebook cell that uses `%%sm_sql`.

1. Select a pre-configured connection to your desired data source from the connection dropdown menu above the cell.

1. This will automatically populate the parameters needed to query that data source.

Alternatively, you can specify connection properties inline in the cell.

Choosing a connection from the dropdown menu inserts the following two parameters into the default magic command string. The parameters contain the connection information your administrator configured.
+ `--metastore-id`: The name of the connection object that holds your connection parameters.
+ `--metastore-type`: The type of meta-store corresponding to `--metastore-id`. The SQL extension uses AWS Glue connections as a connection meta-store. This value is automatically set to `GLUE_CONNECTION`.

For example, the connection string to a pre-configured Amazon Athena data store looks like the following:

```
%%sm_sql --metastore-id athena-connection-name --metastore-type GLUE_CONNECTION 
```

# Save SQL query results in a pandas DataFrame
Save results in a DataFrame

You can store the results of your SQL query in a pandas DataFrame. The easiest way to output query results to a DataFrame is to use the [SQL editor features of the JupyterLab SQL extension](sagemaker-sql-extension-features-editor.md) query-result dropdown and choose the **Pandas dataframe** option.

Alternatively, you can add the parameter `--output '{"format": "DATAFRAME", "dataframe_name": "dataframe_name"}'` to your connection string.

For example, the following query extracts details of customers with the highest balance from the `Customer` table in Snowflake's `TPCH_SF1` database, using both pandas and SQL:
+ In this example, we extract all the data from the customer table and save then in a DataFrame named `all_customer_data`.

  ```
  %%sm_sql --output '{"format": "DATAFRAME", "dataframe_name": "all_customer_data"}' --metastore-id snowflake-connection-name --metastore-type GLUE_CONNECTION
  SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
  ```

  ```
  Saved results to all_customer_data
  ```
+ Next, we extract the details of the highest account balance from the DataFrame.

  ```
  all_customer_data.loc[all_customer_data['C_ACCTBAL'].idxmax()].values
  ```

  ```
  array([61453, 'Customer#000061453', 'RxNgWcyl5RZD4qOYnyT3', 15,
  '25-819-925-1077', Decimal('9999.99'), 'BUILDING','es. carefully regular requests among the blithely pending requests boost slyly alo'],
  dtype=object)
  ```

# Override connection properties
Override connection properties

Your administrator's predefined connection definitions may not have the exact parameters you need to connect to a specific data store. You can add or override parameters in the connection string by using the `--connection-properties` argument.

The arguments are applied in the following order of precedence:

1. Overridden connection properties provided as inline arguments.

1. Connection properties present in the AWS Secrets Manager.

1. Connection properties in the AWS Glue connection.

If the same connection property is present in all three (command line argument, Secrets Manager, and connection), the value provided in the command line argument takes precedence.

For more information on the available connection properties per data source, see the [Connection parameters](sagemaker-sql-extension-connection-properties.md).

The following example illustrates a connection property argument that sets the schema name for Amazon Athena.

```
%%sm_sql --connection-properties '{"schema_name": "athena-db-name"}' --metastore-id athena-connection-name --metastore-type GLUE_CONNECTION
```

# Use query parameters to provide dynamic values in SQL queries
Provide dynamic values in SQL queries

Query parameters can be used to provide dynamic values in SQL queries.

In the following example, we pass a query parameter to the `WHERE` clause of the query.

```
# How to use '--query-parameters' with ATHENA as a data store
%%sm_sql --metastore-id athena-connection-name --metastore-type GLUE_CONNECTION --query-parameters '{"parameters":{"name_var": "John Smith"}}'
SELECT * FROM my_db.my_schema.my_table WHERE name = (%(name_var)s);
```

# SQL extension connection caching
Connection caching

The SQL extension extension defaults to caching connections to prevent the creation of multiple connections for the same set of connection properties. The cached connections can be managed using the `%sm_sql_manage` magic command.

The following topics describe how to manage your cached connections.

**Topics**
+ [

# Create cached connections
](sagemaker-sql-extension-features-create-cached-connection.md)
+ [

# List cached connections
](sagemaker-sql-extension-features-list-cached-connection.md)
+ [

# Clear cached connections
](sagemaker-sql-extension-features-clear-cached-connection.md)
+ [

# Disable cached connections
](sagemaker-sql-extension-features-disable-cached-connection.md)

# Create cached connections


You can create cached connections by specifying a connection name in the `--connection-name` parameter of your connection string. This is particularly useful when multiple connection properties are overridden for a specific use case, and there's a need to reuse the same properties without retyping them.

For example, the code below saves an Athena connection with an overridden schema connection property using the name `--connection-name my_athena_conn_with_schema`, and then reuses it in another cell:

```
%%sm_sql --connection-name my_athena_conn_with_schema --connection-properties '{"schema_name": "sm-sql-private-beta-db"}' --metastore-id sm-sql-private-beta-athena-connection --metastore-type GLUE_CONNECTION 
SELECT * FROM "covid_table" LIMIT 2
```

```
%%sm_sql --connection-name my_athena_conn_with_schema
SELECT * FROM "covid_table" LIMIT 2
```

# List cached connections


You can list your cached connections by running the following command:

```
%sm_sql_manage --list-cached-connections
```

# Clear cached connections


To clear all cached connections, run the following command:

```
%sm_sql_manage --clear-cached-connections
```

# Disable cached connections


To disable connection caching, run the following command:

```
%sm_sql_manage --set-connection-reuse False
```