# Getting started
<a name="getting-started"></a>

The following getting started topics apply to setting up SageMaker Unified Studio unified domains configured with AWS IAM Identity Center. For more details, see [Domains in Amazon SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/working-with-domains.html).

The information in this section helps you get started using Amazon SageMaker Unified Studio. If you are new to Amazon SageMaker Unified Studio, start by becoming familiar with the concepts and terminology presented in [Amazon SageMaker Unified Studio terminology and concepts](concepts.md).

To get started with Amazon SageMaker Unified Studio as a user, start by gaining access to Amazon SageMaker Unified Studio and creating a project. You can then add members to the project and use the sample JupyterLab notebook to begin building with a variety of tools and resources.

**Topics**
+ [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md)
+ [Create a project](getting-started-create-a-project.md)
+ [Run your first SQL query](gs-sql.md)
+ [Analyze and visualize data](gs-analyze.md)
+ [Build a data pipeline with visual ETL](gs-etl.md)
+ [Train an ML model](gs-ml.md)
+ [Get started with Amazon Bedrock in SageMaker Unified Studio](getting-started-use-amazon-bedrock-ide.md)
+ [Get started with the query editor in Amazon SageMaker Unified Studio](getting-started-querying.md)
+ [Get started adding on-demand Amazon EMR on EC2 instances](getting-started-emr-ec2-page.md)
+ [Use the sample notebook](getting-started-use-sample-notebook.md)
+ [Getting started with Amazon Q Developer generative AI chat and command line tools](qdeveloper-integration.md)

# Access Amazon SageMaker Unified Studio
<a name="getting-started-access-the-portal"></a>

For you to get started with Amazon SageMaker Unified Studio, your admin must create a domain in the Amazon SageMaker Unified Studio console and provide you with a URL. For more information, see the Amazon SageMaker Unified Studio Administrator Guide.

When you have the URL from your admin, you can sign in to Amazon SageMaker Unified Studio in one of the following ways:
+ By using your AWS IAM credentials. For more information, see [Sign up for an AWS account](#getting-started-sign-up).
+ If your admin has configured single sign-on (SSO) access, you can also sign in to Amazon SageMaker Unified Studio using SSO credentials that you configure with IAM Identity Center or through an identity provider. For more information, see [Configure SSO credentials with IAM Identity Center](#set-up-SSO-IDC).

**Note**  
 Amazon SageMaker Unified Studio supports the following browsers:   


| Browser | Version | 
| --- | --- | 
|  Microsoft Edge  |  Latest 3 major versions  | 
|  Google Chrome  |  Latest 3 major versions  | 
|  Apple Safari  |  Latest 3 major versions  | 
JupyterLab IDE requires third-party cookies to be allowed in your Amazon SageMaker Unified Studio domain. For more information, see [Invalid or expired auth token when accessing an IDE](troubleshooting-issues.md#invalid-auth-token-ide).

## Configure credentials
<a name="getting-started-configure-credentials"></a>

If you want to sign in to Amazon SageMaker Unified Studio using AWS IAM user or SSO credentials using IAM Identity Center, follow the instructions in the optional prerequiste sections below.

**Note**  
You only need one method to sign in to Amazon SageMaker Unified Studio. If you have already configured an AWS account or SSO credentials that work with the domain URL you received from your admin, you can skip the steps in this section.

**Topics**
+ [Sign up for an AWS account](#getting-started-sign-up)
+ [Configure SSO credentials with IAM Identity Center](#set-up-SSO-IDC)

### Sign up for an AWS account
<a name="getting-started-sign-up"></a>

If you do not have an AWS account, complete the following steps to create one.

1. Open [https://portal.aws.amazon.com/billing/signup](https://portal.aws.amazon.com/billing/signup).

1. Follow the online instructions.

When you sign up for an AWS account, an AWS account root user is created. The root user has access to all AWS services and resources in the account. As a security best practice, [assign administrative access to an administrative user](https://docs.aws.amazon.com/singlesignon/latest/userguide/getting-started.html), and use only the root user to perform [tasks that require root user access](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_root-user.html#root-user-tasks).

### Configure SSO credentials with IAM Identity Center
<a name="set-up-SSO-IDC"></a>

You can use SSO with IAM Identity Center or with an identity provider. To use SSO with IAM Identity Center, work with your admin to get added to their IAM Identity Center directory and set up your SSO credentials.

The process is as follows:

1. After your admin adds your user information to their IAM Identity Center directory, you receive an email with your username and configuration instructions for single sign-on (SSO). Use the link in the email to set your password for SSO.

1. Your admin creates configurations and adds you to a domain using the Amazon SageMaker Unified Studio console. They then copy the link to that domain from the Amazon SageMaker Unified Studio console and send it to you. Use the domain URL from your admin to navigate to Amazon SageMaker Unified Studio.

1. Sign in to Amazon SageMaker Unified Studio with the SSO username and password that you configured in step 1.

1. If your admin's IAM Identity Center is configured to require multi-factor authentication (MFA), set up and use an MFA device. Follow the instructions on the screen to register or use an MFA device as needed, or contact your admin for support. For more information about MFA device enforcement, see [Configure MFA device enforcement](https://docs.aws.amazon.com/singlesignon/latest/userguide/how-to-configure-mfa-device-enforcement.html) in the IAM Identity Center User Guide.

You are then able to view Amazon SageMaker Unified Studio landing page, where you can create new projects and view projects that you have been added to.

# Create a project
<a name="getting-started-create-a-project"></a>

In Amazon SageMaker Unified Studio, projects enable a group of users to collaborate on various business use cases. Within projects, you can manage data assets in the Amazon SageMaker Unified Studio catalog, perform data analysis, organize workflows, develop machine learning models, build generative AI apps, and more. 

In order to create a project in Amazon SageMaker Unified Studio, you must gain access to Amazon SageMaker Unified Studio. A domain unit owner must also grant you access to create projects through an authorization policy. For more information, see [Domain units and authorization policies in Amazon SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/domain-units.html).

1. Navigate to the Amazon SageMaker Unified Studio landing page using the URL from your admin.
**Note**  
 Amazon SageMaker Unified Studio supports the following browsers:       
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/getting-started-create-a-project.html)

1. Access Amazon SageMaker Unified Studio using your IAM or single sign-on (SSO) credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Choose **Create project**.

1. Enter a name for your project. The name of the project is final.

1. (Optional) Enter a description for your project. You can edit this later.

1. (Optional) If your domain has configured domain units, select a domain unit for your project. If nobody in the domain has created domain units, you create a project in the root domain unit by default and no action is needed here.

1. Select the project profile that contains the resources you will need in your project.

   1. Select **All capabilities** to access all of the supported services and resources in a single project.

   1. Select **SQL analytics** to get started querying and analyzing SQL data.

   1. Select **Generative AI application development** to get started with generative AI.

1. Choose **Continue**.

1. (Optional) Customize parameters, if desired. For more information about customizing parameters, see [Step 2: Customize parameters](create-new-project.md#create-project-parameters).

1. Choose **Continue**.

1. Choose **Create project**.

You can then navigate to your project at any time from the Amazon SageMaker Unified Studio home page by choosing **Select a project** and **Browse all projects**, then choosing the name of your project. After you navigate to your project, you can begin adding data and compute resources and using tools.

# Run your first SQL query
<a name="gs-sql"></a>

**Time:** 5 minutes

**Prerequisites:** As a member of a SageMaker Unified Studio project, your IAM role needs the following managed policies:
+ [SageMakerStudioUserIAMConsolePolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMConsolePolicy.html) to sign in and access the project.
+ [SageMakerStudioUserIAMDefaultExecutionPolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMDefaultExecutionPolicy.html) to access data and resources within the project.

If you don't have access, contact your administrator. If you are the administrator who set up the project, you already have the required permissions.

**Outcome:** You query sample data using the built-in query editor, see results inline, and understand how to browse your data catalog.

## What you will do
<a name="gs-sql-what-you-will-do"></a>

In this tutorial, you will:
+ Open the query editor in your project
+ Browse available tables in the data catalog
+ Write and run a SQL query on sample data
+ View and explore the results

SageMaker Unified Studio includes a built-in query editor that lets you write SQL queries against data stored in your lakehouse. The data can be in Amazon S3, Amazon Redshift, or other connected sources. You don't need to set up a separate query tool or configure credentials. Everything is already connected through your project.

## Step 1: Open the query editor
<a name="gs-sql-step1"></a>

1. When you first sign in to SageMaker Unified Studio, you are in your default project. If you need to switch projects, use the project selector at the top of the page.

1. In the left navigation bar, choose **Query editor** under **Data analytics**.

![\[The SageMaker Unified Studio project overview page showing the Query editor option in the left navigation bar under Data analytics.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-query-editor-nav.png)


This opens a *querybook*, an interactive SQL notebook where you can write multiple queries, add notes in markdown, and visualize results in one place.

![\[The querybook interface showing the Data explorer panel on the left with Catalogs, Connections, and Buckets, and an empty SQL cell on the right.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-querybook.png)


**What is a querybook?**  
A querybook is like a notebook for SQL. Each cell contains a SQL query or markdown text. You can run cells individually or all at once, and results appear inline below each query.

## Step 2: Browse your data
<a name="gs-sql-step2"></a>

Before writing a query, review what data is available.

1. In the data explorer panel on the left side, expand **Catalogs**, then expand **AWSDataCatalog**.

1. Expand the **sagemaker\$1sample\$1db** database to see its tables. You should see a **churn** table.

1. Choose the **churn** table and review its columns and data types. This is sample data that was pre-configured in your project.

![\[The Data explorer panel showing the sagemaker_sample_db database expanded with the churn table and its columns including state, account_length, intl_plan, day_mins, and others.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-data-explorer.png)


## Step 3: Run a SQL query
<a name="gs-sql-step3"></a>

Now query the sample data. Copy the following SQL into a querybook cell:

```
SELECT
    state,
    COUNT(*) AS total_customers,
    ROUND(AVG(day_mins), 2) AS avg_day_mins,
    ROUND(AVG(eve_mins), 2) AS avg_eve_mins,
    ROUND(AVG(custserv_calls), 2) AS avg_service_calls
FROM sagemaker_sample_db.churn
GROUP BY state
ORDER BY avg_service_calls DESC
LIMIT 10;
```

This query analyzes customer usage patterns by state. For each state, it calculates the total number of customers, their average daytime and evening minutes, and how often they contact customer service. The results show the top 10 states with the highest average service calls.

1. Paste the query into the SQL cell.

1. Choose **Athena (SQL)** from the engine selector dropdown at the top of the querybook.

1. Choose the **Run** button (▶) next to the cell.

1. Results appear in a table directly below the cell.

**Which query engine is running this?**  
By default, your query runs on *Amazon Athena*, a serverless query engine that reads data directly from Amazon S3 without requiring you to load it into a database first. You can also switch to Amazon Redshift for data warehouse workloads using the engine selector in the querybook. You do not need to know the details of either engine to get started. Write standard SQL.

## Step 4: Explore your results
<a name="gs-sql-step4"></a>

After your query runs, the results table shows each state with its usage metrics. You can:
+ **Sort columns** by choosing column headers.
+ **Download results** as a CSV file using the download button.
+ **Add another query** by choosing the **\$1** button to add a new SQL cell.

![\[Query results showing a table with columns for state, total_customers, avg_day_mins, avg_eve_mins, and avg_service_calls.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-query-results.png)


Try a different query. For example, compare usage patterns for customers with and without an international plan:

```
SELECT
    intl_plan,
    COUNT(*) AS total_customers,
    ROUND(AVG(intl_mins), 2) AS avg_intl_mins,
    ROUND(AVG(intl_charge), 2) AS avg_intl_charge,
    ROUND(AVG(custserv_calls), 2) AS avg_service_calls
FROM sagemaker_sample_db.churn
GROUP BY intl_plan;
```

This shows whether customers with an international plan use more international minutes and how their support call patterns compare.

![\[Query results showing a table comparing customers with and without an international plan, including total_customers, avg_intl_mins, avg_intl_charge, and avg_service_calls.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-intl-plan-results.png)


## Ask in natural language
<a name="gs-sql-try-it"></a>

Instead of writing SQL yourself, you can ask the **Data Agent** to generate a query for you.

1. In the querybook, choose the **Chat with AI** icon in the top navigation bar. The Data Agent panel opens on the right side.

1. In the **Ask a question** text box at the bottom of the panel, type what you want in plain English, for example: *"Show me the top 5 states by average daytime charges"*

1. The Data Agent generates the SQL for you. Review it, then run it. The Data Agent has access to your project's data catalog, so it can identify which tables and columns are available.

![\[The querybook with the Data Agent chat panel open on the right side, showing suggested prompts for natural language queries.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-data-agent.png)


After the Data Agent generates a query, choose **Accept** to insert it into a querybook cell, then choose **Run**.

![\[A generated SQL query inserted into a querybook cell, ready to run.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-run-query.png)


The results appear in a table below the cell.

![\[Query results from the Data Agent-generated SQL, showing a table with the top states by average daytime charges.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-sql/gs-sql-first-query-results.png)


**SQL in notebooks**  
You can also run SQL queries in a notebook using SQL cells. Notebooks let you combine SQL with Python code, visualizations, and markdown notes in a single document.

## What you learned
<a name="gs-sql-learned"></a>

In this tutorial, you:
+ Opened the query editor and browsed available data in the catalog
+ Wrote and ran a SQL query to analyze customer usage by state
+ Explored results and ran a second query to compare international plan usage
+ Used the Data Agent to generate SQL from natural language

# Analyze and visualize data
<a name="gs-analyze"></a>

**Time:** 10 minutes

**Prerequisites:** As a member of a SageMaker Unified Studio project, your IAM role needs the following managed policies:
+ [SageMakerStudioUserIAMConsolePolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMConsolePolicy.html) to sign in and access the project.
+ [SageMakerStudioUserIAMDefaultExecutionPolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMDefaultExecutionPolicy.html) to access data and resources within the project.

If you don't have access, contact your administrator. If you are the administrator who set up the project, you already have the required permissions. Completing "Run your first SQL query" is helpful, but not required.

**Outcome:** You load data into a notebook, calculate summary statistics with Python, analyze patterns across states, and create a visualization.

## What you will do
<a name="gs-analyze-what-you-will-do"></a>

In this tutorial, you will:
+ Open a notebook in your project
+ Load sample data into a DataFrame for analysis
+ Calculate summary statistics
+ Analyze patterns by grouping data
+ Create a chart to visualize the results

SageMaker Unified Studio notebooks give you a single environment for Python, SQL, and data visualization with serverless compute that scales automatically. The notebook connects directly to your project's data through the lakehouse, so the same tables you queried with SQL in the previous tutorial are available here too.

## Step 1: Open a notebook
<a name="gs-analyze-step1"></a>

1. When you first sign in to SageMaker Unified Studio, you are in your default project. If you need to switch projects, use the project selector at the top of the page.

1. On the project overview page, choose **Build in the notebook**. Alternatively, choose **Notebooks** in the left navigation pane and choose **Create notebook**. A new notebook opens with an empty Python cell.

![\[The SageMaker Unified Studio project overview page showing the Notebooks option in the left navigation bar.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-project-overview.png)


The notebook opens with a Python cell ready for input. You can also add SQL, Markdown, Table, and Charts cells using the options at the bottom of the notebook.

![\[An empty notebook showing a Python cell with options to add Python, SQL, Markdown, Table, and Charts cells.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-empty-notebook.png)


**What runs your code?**  
Notebooks run on serverless compute powered by Amazon Athena for Apache Spark by default. Your code runs on managed infrastructure that scales automatically, without you provisioning anything.

## Step 2: Load the data
<a name="gs-analyze-step2"></a>

The same `sagemaker_sample_db.churn` table you browsed in the data catalog is available directly from your notebook. Load it into a pandas DataFrame so you can analyze it with Python. Paste the following code into the first Python cell and run it:

```
import pandas as pd

df = spark.sql("SELECT * FROM sagemaker_sample_db.churn").toPandas()
print(f"Rows: {len(df)}, Columns: {len(df.columns)}")
df.head()
```

The output shows the first few rows of the dataset, including columns for state, account length, call minutes, service calls, and churn status.

![\[Notebook cell output showing the loaded churn dataset with 10002 rows and 21 columns, displaying a table preview with columns for state, account_length, area_code, and others.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-load-data.png)


The dataset contains telecom customers with attributes including call minutes, service calls, charges, and whether the customer churned.

**Work with DataFrames using SQL**  
Once you create a DataFrame, you can also query it using SQL cells in the notebook. This means you can use Python for some steps and SQL for others, depending on which is more convenient for the task.

## Step 3: Clean the data and calculate statistics
<a name="gs-analyze-step3"></a>

Before analyzing the data, you need to handle a few issues. The first row contains duplicate header values, some numeric columns are stored as strings, and the churn column uses `"True."` and `"False."` instead of standard booleans. The following code removes the extra header row, converts the numeric columns to the correct data type, and maps the churn values to booleans. Add a new Python cell and paste it:

```
# Clean the data
df_clean = df.iloc[1:].copy()

# Convert numeric columns to float
numeric_cols = ['day_mins', 'eve_mins', 'custserv_calls']
for col in numeric_cols:
    df_clean[col] = pd.to_numeric(df_clean[col], errors='coerce')

# Convert churn to boolean for percentage calculation
df_clean['churn'] = df_clean['churn'].map({'True.': True, 'False.': False})

print(f"Total customers: {len(df_clean)}")
print(f"Avg daytime minutes: {df_clean['day_mins'].mean():.2f}")
print(f"Avg evening minutes: {df_clean['eve_mins'].mean():.2f}")
print(f"Avg service calls: {df_clean['custserv_calls'].mean():.2f}")
print(f"Churn rate: {df_clean['churn'].mean():.1%}")
```

This gives you a quick overview of the dataset: 10,001 customers with an average of 5.52 daytime minutes, 5.03 evening minutes, 5.53 service calls, and a 50% churn rate.

![\[Notebook cell output showing summary statistics: Total customers 10001, average daytime minutes 5.52, average evening minutes 5.03, average service calls 5.53, and churn rate 50.0%.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-summary-stats.png)


## Step 4: Analyze patterns by state
<a name="gs-analyze-step4"></a>

Group the data by state to find which states have the most customer service calls. Add a new Python cell:

```
top_states = (
    df_clean.groupby('state')
    .agg(total_customers=('state', 'count'),
         avg_day_mins=('day_mins', 'mean'),
         avg_eve_mins=('eve_mins', 'mean'),
         avg_service_calls=('custserv_calls', 'mean'))
    .round(2)
    .sort_values('avg_service_calls', ascending=False)
    .head(10)
)
top_states
```

The results show the top 10 states sorted by average service calls, along with customer counts and usage patterns. This is the same analysis from the SQL tutorial, now done with pandas.

![\[Notebook output showing a table of the top 10 states by average service calls, with columns for total_customers, avg_day_mins, avg_eve_mins, and avg_service_calls.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-top-states-table.png)


## Step 5: Create a visualization
<a name="gs-analyze-step5"></a>

Tables show the numbers, but a chart makes it easier to spot patterns at a glance. Create a grouped bar chart that compares average daytime and evening call minutes across the top 10 states. Add a new Python cell:

```
import matplotlib.pyplot as plt

usage = df_clean.groupby('state')[['day_mins', 'eve_mins']].mean()
usage.columns = ['Day', 'Evening']
top10 = usage.sort_values('Day', ascending=False).head(10)

top10.plot(kind='bar', figsize=(10, 5), color=['#0073bb', '#ff9900'])
plt.title('Average Call Minutes by Time of Day \u2014 Top 10 States')
plt.ylabel('Minutes')
plt.xlabel('State')
plt.xticks(rotation=0)
plt.legend(title='Time of Day')
plt.tight_layout()
plt.show()
```

The chart renders inline, directly below the cell. Each state shows two bars comparing daytime and evening call minutes.

![\[A grouped bar chart showing average daytime and evening call minutes for the top 10 states, with blue bars for daytime and orange bars for evening.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-chart-code.png)


**Create charts without code**  
Instead of writing plotting code, you can use the built-in **Charts** cell type. Choose **Charts** from the cell type options at the bottom of the notebook, then configure your chart visually:  
For **Data frame**, select `top_states`.
For **Type**, choose **Bar chart**.
For **X-axis**, select **state**.
For **Y-axis**, select **total\$1customers**.
The chart updates automatically as you change the configuration.

![\[The Charts cell configuration panel showing a bar chart with state on the X-axis and total_customers on the Y-axis, with the resulting bar chart displayed on the right.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-chart-nocode.png)


The chart shows that call patterns vary across states, which could inform regional support staffing or targeted retention campaigns. From here, you could extend this analysis by correlating call minutes with churn rates, segmenting customers by international plan usage, or building a predictive model to identify customers at risk of churning.

## Use the Data Agent to generate code
<a name="gs-analyze-data-agent"></a>

Instead of writing code yourself, you can ask the **Data Agent** to generate it for you. The Data Agent can create transformations, aggregations, and visualizations from natural language descriptions.

1. In the notebook, choose the **Chat with AI** icon in the top navigation bar.

1. In the **Ask a question** text box at the bottom of the panel, type what you want in plain English, for example: *"Create a bar chart showing average daytime minutes by state for the top 10 states"*

![\[The Chat with AI icon in the notebook top navigation bar.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-data-agent-1.png)


![\[The Data Agent panel open with the Ask a question text box.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-data-agent-2.png)


![\[A natural language query entered in the Data Agent panel requesting a bar chart.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-data-agent-3.png)


The Data Agent generates the code for you. Review it, then run it.

![\[The Data Agent generating code to create the requested bar chart.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-data-agent-generated.png)


The chart renders inline in the notebook.

![\[A bar chart generated by the Data Agent showing average daytime minutes for the top 10 states.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-analyze/gs-analyze-data-agent-chart-result.png)


## What you learned
<a name="gs-analyze-learned"></a>

In this tutorial, you:
+ Created a notebook and loaded lakehouse data into a pandas DataFrame
+ Cleaned sample data and calculated summary statistics
+ Grouped data by state to analyze customer service patterns
+ Created a visualization with Python code and with the no-code Charts feature
+ Used the Data Agent to generate Python code from natural language

# Build a data pipeline with visual ETL
<a name="gs-etl"></a>

**Time:** 10 minutes

**Prerequisites:** As a member of a SageMaker Unified Studio project, your IAM role needs the following managed policies:
+ [SageMakerStudioUserIAMConsolePolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMConsolePolicy.html) to sign in and access the project.
+ [SageMakerStudioUserIAMDefaultExecutionPolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMDefaultExecutionPolicy.html) to access data and resources within the project.

If you don't have access, contact your administrator. If you are the administrator who set up the project, you already have the required permissions. Completing "Run your first SQL query" is helpful, but not required.

**Note**  
You can use your identity-based permissions to create a pipeline. However, you need an IAM role to run the pipeline on a schedule.

**Outcome:** You create a visual ETL job that reads sample data, filters and reshapes it, and writes clean output to Amazon S3 without writing any code.

## What you will do
<a name="gs-etl-what-you-will-do"></a>

In this tutorial, you will:
+ Create a Visual ETL job in your project
+ Add a data source node that reads from the AWS Glue Data Catalog
+ Apply filter and select transformations to reshape the data
+ Write the transformed output to Amazon S3
+ Run the job and verify the results

ETL (Extract, Transform, Load) is how you prepare raw data for analysis. SageMaker Unified Studio provides a visual ETL editor where you build data pipelines by dragging and connecting nodes on a canvas. No code required. Under the hood, it runs on AWS Glue, a serverless data integration service, but you don't need to know Glue to use it.

## Step 1: Create a visual ETL job
<a name="gs-etl-step1"></a>

1. Go to your project using the menu at the top of the page.

1. In the left navigation pane, choose **Visual ETL** under **Data analytics**.

1. Choose **Create visual job**.

![\[The left navigation pane showing the Visual ETL option under Data analytics.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-build-menu.png)


![\[The Visual ETL page with the Create job button.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-create-job.png)


The visual ETL canvas opens with an empty workspace.

![\[An empty visual ETL canvas ready for adding source, transform, and target nodes.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-empty-canvas.png)


The canvas is where you design your pipeline. You add nodes for data sources, transformations, and targets, then connect them to define the data flow.

**What is ETL?**  
ETL stands for Extract, Transform, Load. *Extract* reads data from a source. *Transform* cleans, filters, or reshapes it. *Load* writes the result to a destination. It's the standard pattern for preparing data before analysis or machine learning.

## Step 2: Add a data source
<a name="gs-etl-step2"></a>

1. On the canvas, choose the **Add Nodes** (\$1) button on the left. Under **Data sources**, choose **AWS Glue Data Catalog** and click on the canvas to place the node.

1. Choose the node to open its configuration panel.

1. For **Database**, choose `sagemaker_sample_db`.

1. For **Table**, choose `churn`.

![\[The visual ETL canvas with an AWS Glue Data Catalog source node configured to read from the sagemaker_sample_db database and churn table.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-add-source.png)


**Other data sources**  
You can also read from Amazon S3, Amazon Redshift, JDBC connections, and other sources. For this tutorial, you use the sample data that's already available in your project's data catalog.

## Step 3: Add transformations
<a name="gs-etl-step3"></a>

Now clean and reshape the data. You add two transformation nodes.

1. Choose the **\$1** icon on the right edge of the source node, or choose the **\$1** icon on the left of the canvas to **Add Nodes** and choose **Transforms**.

1. Choose **Filter**. A filter node appears on the canvas, connected to the source node.

1. Choose the filter node to open its configuration panel.

1. Set the filter condition: `custserv_calls > 5`. This keeps only customers who contacted customer service more than 5 times.

After you set the filter condition, the data preview updates to show only the rows that match.

![\[The filter transform node configured with the condition custserv_calls greater than 5.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-filter-transform.png)


1. Choose the **\$1** icon on the right edge of the filter node, or choose the **\$1** icon on the left of the canvas to **Add Nodes** and choose **Transforms**.

1. Choose **Select Columns**. A select columns node appears on the canvas.

1. Choose the select columns node to open its configuration panel.

1. Choose the columns to keep: `state`, `day_mins`, `eve_mins`, `custserv_calls`, and `churn`.

![\[The select columns transform node configured to keep the state, day_mins, eve_mins, custserv_calls, and churn columns.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-select-fields.png)


**More transforms available**  
The visual ETL editor includes dozens of built-in transforms: joins, aggregations, derived columns, deduplication, and more. For example, if you noticed in the previous tutorial that the `churn` column contains `True.` instead of `True`, you could add a **Derived Column** transform to clean those values as part of your pipeline. You can also write custom transforms using SQL or Python.

## Step 4: Add a target
<a name="gs-etl-step4"></a>

1. Choose the **\$1** icon on the right edge of the select columns node, or choose the **\$1** icon on the left of the canvas to **Add Nodes** and choose **Data targets**.

1. Choose **Amazon S3**. A target node appears on the canvas.

1. Choose the target node to open its configuration panel.

1. For **Format**, choose **Parquet**.

1. For **S3 Target Location**, choose **Browse S3**, select your project's S3 bucket, select the `shared/` folder, and then add `filtered-churn/` at the end of the path in the S3 URI field.

![\[The Amazon S3 target node configured with Parquet format and an S3 target location.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-add-target.png)


Your pipeline now reads sample data from the AWS Glue Data Catalog, filters for customers with more than 5 service calls, selects specific columns, and writes the result to Amazon S3.

## Step 5: Save and run the job
<a name="gs-etl-step5"></a>

1. Enter a name for your job in the title field at the top of the canvas.

1. Choose **Save** to save your job.

1. Choose **Run**.

1. Choose the **View runs** tab at the top of the canvas to see the job progress. A job on this sample data typically completes in 2 to 3 minutes.

![\[The Runs tab showing the job execution progress.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-run-job.png)


![\[The View runs tab showing the job run in progress.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-view-runs.png)


![\[The completed job run showing success status.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-run-complete.png)


**Scheduling**  
In production, you'd schedule this job to run on a recurring basis (hourly, daily, or triggered by new data arriving). You can set up schedules directly from the job configuration.

## Step 6: Verify the output
<a name="gs-etl-step6"></a>

After the job completes, you can find it listed under **Data processing jobs** in the left navigation pane.

![\[The Data processing jobs page showing the completed visual ETL job.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-data-processing-jobs.png)


Choose the job to view its details, including run history, status, and configuration.

![\[The job details page showing run history and job configuration.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-job-details.png)


To verify the output:

1. In the left navigation pane, choose **Data**.

1. Under **S3 buckets**, expand your project's bucket.

1. Navigate to the output folder you specified in the target node (for example, `shared/filtered-churn/`).

1. You should see Parquet files containing only the filtered rows and selected columns.

![\[The S3 output folder in the Data explorer showing Parquet files generated by the visual ETL job.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-etl/gs-etl-s3-output.png)


**Viewing the output data**  
Parquet is a compressed columnar format optimized for analytics. To view the contents of these files, you can create an external table pointing to the S3 location and query it using the query editor, or load the files into a notebook using pandas or Spark.

You now have a clean, filtered dataset in Amazon S3. This prepared data is ready for downstream use: analysts can query it directly with SQL, data scientists can load it into a notebook for deeper analysis, or it can serve as input for machine learning training.

## What you learned
<a name="gs-etl-learned"></a>

In this tutorial, you:
+ Created a visual ETL job without writing code
+ Connected an AWS Glue Data Catalog source to read sample data
+ Applied filter and select column transformations to reshape the data
+ Wrote transformed data to Amazon S3
+ Ran the job and verified the output in the Data explorer

# Train an ML model
<a name="gs-ml"></a>

**Time:** 15 minutes

**Prerequisites:** As a member of a SageMaker Unified Studio project, your IAM role needs the following managed policies:
+ [SageMakerStudioUserIAMConsolePolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMConsolePolicy.html) to sign in and access the project.
+ [SageMakerStudioUserIAMDefaultExecutionPolicy](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/security-iam-awsmanpol-SageMakerStudioUserIAMDefaultExecutionPolicy.html) to access data and resources within the project.

If you don't have access, contact your administrator. If you are the administrator who set up the project, you already have the required permissions. Completing "Analyze and visualize data" is helpful, but not required.

**Outcome:** You open a sample notebook, explore a customer churn dataset, train a classification model, and identify the key factors that predict churn.

## What you will do
<a name="gs-ml-what-you-will-do"></a>

In this tutorial, you will:
+ Open a sample notebook in your project
+ Load and explore a customer churn dataset
+ Prepare features for model training
+ Train and compare two classification models
+ Identify the top factors that drive customer churn
+ Save the trained model for future use

Machine learning uses historical data to find patterns and make predictions. In this tutorial, you train a model to predict which telecom customers are likely to cancel their service (churn). SageMaker Unified Studio provides a notebook environment with popular ML libraries pre-installed, so you can start training models immediately without any setup.

## Step 1: Open the sample notebook
<a name="gs-ml-step1"></a>

1. Go to your project using the menu at the top of the page.

1. On the project overview page, find the **Customer Churn Prediction** sample notebook.

1. Choose the notebook to open it.

1. Choose **Open in notebook**.

![\[The project overview page with the Customer Churn Prediction sample notebook highlighted.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-overview-page.png)


![\[The Customer Churn Prediction sample notebook opened in the notebook editor.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-sample-notebook.png)


The notebook contains pre-written code cells that walk through the complete ML workflow. You run each cell in order.

**What is a sample notebook?**  
Sample notebooks are pre-built tutorials included in your project. They contain working code and explanations for common ML and data science tasks. You can run them as-is or modify them to use your own data.

## Step 2: Set up and load the data
<a name="gs-ml-step2"></a>

Run the first cell to import the required libraries. Choose the **Run** button (▶) in the top left corner of the cell:

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import boto3
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
import warnings
np.random.seed(2)
warnings.filterwarnings('ignore')
```

In this cell, `np.random.seed(2)` sets a random seed so you get the same results each time you run the notebook. The `warnings.filterwarnings` line suppresses deprecation warnings for cleaner output.

![\[The notebook cell output after running the imports and setup.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-setup.png)


Run the next cell to load the customer churn dataset:

**How to know when a cell finishes running**  
When a cell completes, a check mark appears next to it along with the elapsed time. Wait for this before running the next cell.

```
session = boto3.Session()
aws_region = session.region_name or 'us-west-2'

s3 = boto3.client('s3')
os.makedirs('notebook_outputs', exist_ok=True)

s3.download_file(
    f'sagemaker-example-files-prod-{aws_region}',
    'datasets/tabular/synthetic/churn.txt',
    'notebook_outputs/churn.txt'
)

df = pd.read_csv('notebook_outputs/churn.txt')
print(f'Dataset: {df.shape[0]:,} customers with {df.shape[1]} data points each')
df.head()
```

**Note**  
The `sagemaker-example-files-prod` bucket is an AWS-managed public bucket that contains sample datasets. You do not need to create this bucket. The code downloads the dataset from this bucket to your notebook's local storage.

![\[The notebook output showing the loaded dataset with customer records and a preview of the first rows.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-load-data.png)


The dataset contains telecom customers with attributes including call minutes, service calls, charges, and whether the customer churned.

## Step 3: Explore the churn problem
<a name="gs-ml-step3"></a>

Run the next cell to calculate the churn rate and visualize the problem:

```
total_customers = len(df)
churned_customers = len(df[df['Churn?'] == 'True.'])
churn_rate = churned_customers / total_customers

print(f'Total Customers: {total_customers:,}')
print(f'Customers Lost: {churned_customers:,}')
print(f'Churn Rate: {churn_rate:.1%}')

fig, axes = plt.subplots(1, 2, figsize=(15, 6))

churn_counts = df['Churn?'].value_counts()
colors = ['#2ecc71', '#e74c3c']
axes[0].pie(churn_counts.values, labels=['Retained', 'Churned'],
           autopct='%1.1f%%', colors=colors, startangle=90,
           explode=(0, 0.1))
axes[0].set_title('Customer Retention vs Churn')

plt.tight_layout()
plt.show()
```

![\[A pie chart showing the proportion of retained versus churned customers.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-churn-visualization.png)


The visualization shows the split between retained and churned customers. Understanding this distribution helps you choose the right approach for training your model.

**Why explore before training?**  
Understanding your data before building a model helps you choose the right approach. For example, if the classes are heavily imbalanced (far more retained than churned customers), that affects how you evaluate model performance.

## Step 4: Prepare features and train models
<a name="gs-ml-step4"></a>

Before training, you need to convert the data into a format that ML algorithms can process. The following code encodes text columns as numbers, creates new features, and splits the data into training and test sets. Run the next cell:

```
df_processed = df.copy()
df_processed['Churn'] = (df_processed['Churn?'] == 'True.').astype(int)
df_processed.drop('Churn?', axis=1, inplace=True)
df_processed.drop('Phone', axis=1, inplace=True)

categorical_cols = ['State', "Int'l Plan", 'VMail Plan']
label_encoders = {}
for col in categorical_cols:
    le = LabelEncoder()
    df_processed[col] = le.fit_transform(df_processed[col])
    label_encoders[col] = le

df_processed['Total_Charge'] = (df_processed['Day Charge'] +
                               df_processed['Eve Charge'] +
                               df_processed['Night Charge'] +
                               df_processed['Intl Charge'])

df_processed['High_Service_Calls'] = (df_processed['CustServ Calls'] >= 4).astype(int)

X = df_processed.drop('Churn', axis=1)
y = df_processed['Churn']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2, stratify=y
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f'Training samples: {X_train.shape[0]:,}')
print(f'Test samples: {X_test.shape[0]:,}')
```

![\[The notebook output showing the preprocessing results with training and test sample counts.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-preprocessing.png)


Now train two different classification models and compare their performance. Run the next cell:

```
models = {
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=2),
    'Logistic Regression': LogisticRegression(random_state=2, max_iter=1000)
}

model_results = {}

for name, model in models.items():
    print(f'Training {name}...')

    if 'Logistic' in name:
        model.fit(X_train_scaled, y_train)
        y_pred = model.predict(X_test_scaled)
        y_pred_proba = model.predict_proba(X_test_scaled)[:, 1]
    else:
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        y_pred_proba = model.predict_proba(X_test)[:, 1]

    auc_score = roc_auc_score(y_test, y_pred_proba)

    model_results[name] = {
        'model': model,
        'predictions': y_pred,
        'probabilities': y_pred_proba,
        'auc_score': auc_score
    }

    print(f'  AUC Score: {auc_score:.4f}')
    print(f'  Accuracy: {(y_pred == y_test).mean():.1%}')

best_model_name = max(model_results.keys(),
                      key=lambda k: model_results[k]['auc_score'])
print(f'\nBest model: {best_model_name}')
print(f'AUC Score: {model_results[best_model_name]["auc_score"]:.4f}')
```

![\[The notebook output showing the training results for Random Forest and Logistic Regression, with AUC scores and the winning model.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-model-training.png)


**What are these models?**  
A *Random Forest* builds many decision trees and combines their predictions. A *Logistic Regression* finds a mathematical boundary between the two classes. AUC (Area Under the Curve) measures how well the model distinguishes between churners and non-churners, where 1.0 is perfect and 0.5 is random guessing.

## Step 5: Understand what drives churn
<a name="gs-ml-step5"></a>

The model can tell you which customer attributes are the strongest predictors of churn. Run the next cell to see the top churn drivers:

```
rf_model = model_results['Random Forest']['model']
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print('Top 5 churn drivers:')
for i, (_, row) in enumerate(feature_importance.head(5).iterrows(), 1):
    print(f'  {i}. {row["feature"]} (Impact: {row["importance"]:.1%})')
```

![\[The notebook output showing the top 5 features that predict customer churn, ranked by importance.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-feature-importance.png)


Feature importance reveals which factors have the biggest impact on churn predictions. These insights help the business focus retention efforts on the areas that matter most.

**Use the Data Agent for deeper analysis**  
You don't need ML expertise to interpret these results. The **Data Agent** can help you understand feature importance, suggest next steps, and generate code for additional analysis. Open the Data Agent from the top navigation bar and ask questions like *"Why is night charge the top predictor of churn?"* or *"Write code to plot feature importance as a bar chart."*

## Step 6: Save the model
<a name="gs-ml-step6"></a>

Run the final cell to save the trained model and its supporting artifacts. You can use these artifacts to load the model later for batch predictions, deploy it to a real-time SageMaker endpoint, or share it with your team through the model registry.

```
import joblib

best_model = model_results[best_model_name]['model']
joblib.dump(best_model, 'notebook_outputs/churn_prediction_model.pkl')
joblib.dump(scaler, 'notebook_outputs/feature_scaler.pkl')
joblib.dump(label_encoders, 'notebook_outputs/label_encoders.pkl')

print('Model artifacts saved:')
print('  churn_prediction_model.pkl - Trained ML model')
print('  feature_scaler.pkl - Data preprocessing scaler')
print('  label_encoders.pkl - Categorical encoders')
```

![\[The notebook output confirming that the model artifacts have been saved.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/gs-ml/gs-ml-save-model.png)


To reuse this model later, load the saved `.pkl` files using `joblib.load()` and call `model.predict()` on your data. For production use cases like real-time predictions or sharing the model with your team, see the What's next section below.

## What's next
<a name="gs-ml-next-steps"></a>

You trained a model using a sample notebook. Here are ways to go further:
+ **Track experiments with MLflow**: Log your model parameters, metrics, and artifacts so you can compare runs and reproduce results. To set up MLflow for your project, see [Track experiments using MLflow](sagemaker-experiments.xml.md).
+ **Deploy the model**: Serve your trained model as a real-time endpoint for predictions. To learn about model deployment, see [Machine learning](sagemaker.md).
+ **Use your own data**: Use similar techniques to load data from your lakehouse tables instead of the sample dataset. The Data Agent is already aware of the tables available in your catalog and can help you build and train your models.

## What you learned
<a name="gs-ml-learned"></a>

In this tutorial, you:
+ Opened a sample notebook and loaded a customer churn dataset
+ Explored the data and visualized the churn problem
+ Prepared features and split data into training and test sets
+ Trained and compared two classification models
+ Identified the top factors that drive customer churn
+ Saved the trained model for future use

# Get started with Amazon Bedrock in SageMaker Unified Studio
<a name="getting-started-use-amazon-bedrock-ide"></a>

Get started with Amazon Bedrock in SageMaker Unified Studio by experimenting with a model in a [playground](bedrock-playgrounds.md).

The Amazon Bedrock in SageMaker Unified Studio playgrounds that lets you easily experiment with Amazon Bedrock models. The [chat](bedrock-explore-chat-playground.md) playground lets you chat with a model by providing text and image prompts to the model (not all models support images). The [image and video](explore-image-playground.md) playground lets you generate images and videos with a suitable model. With both playgrounds you can experiment by making configuration changes. For example, you can influence the response from a model by changing [inference](explore-prompts.md#inference-parameters) parameters.

After trying the chat and image playgrounds, you can try creating a chat agent app or flows app. A chat agent app allows users to chat with an Amazon Bedrock model through a conversational interface, typically by sending prompts (text or image) and receiving responses. To create a chat agent app, see [Build a chat agent app with Amazon Bedrock](create-chat-app.md).

You can also create a [flows app](create-flows-app.md) that lets you visually design the flow of an app.

## Chat with a model in the chat playground
<a name="getting-started-use-amazon-bedrock-ide-playground"></a>

In these instructions, you use the Amazon Bedrock in SageMaker Unified Studio chat playground to chat with an Amazon Bedrock in SageMaker Unified Studio model. You chat by sending a prompt to the model and answering the response that the model generates. For more information, see [Experiment with the Amazon Bedrock playgrounds](bedrock-playgrounds.md).

If you don't have access to a model, contact your administrator.

**To chat with a model**

1. Navigate to the Amazon SageMaker Unified Studio landing page by using the URL from your administrator.

1. Access Amazon SageMaker Unified Studio using your IAM or single sign-on (SSO) credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. At the top of the page, choose the **Discover**.

1. In the **Generative AI** section, choose **Chat playground** to open the chat playground.  
![\[Open Amazon Bedrock in SageMaker Unified Studio chat playground.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/bedrock/bedrock-ide-discover.png)

1. In **Type** select **Model** and then select a model to use in **Model**. For full information about the model, choose **View full model details** in the information panel. For more information, see [Find serverless models with the Amazon Bedrock model catalog](model-catalog.md). If you don't have access to an appropriate model, contact your administrator. Different models might not support all features.

1. In the **Enter prompt** text box, enter **What is Avebury stone circle?**.

1. (Optional) If the model you chose is a reasoning model, you can choose **Reason** to have the model include its reasoning in the reponse. For more information, see [Enhance model responses with model reasoning](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-reasoning.html) in the *Amazon Bedrock user guide*.

1. Press Enter on your keyboard, or choose the run button, to send the prompt to the model. Amazon Bedrock in SageMaker Unified Studio shows the response from the model in the playground.  
![\[Run prompt in Amazon Bedrock in SageMaker Unified Studio chat playground.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/bedrock/bedrock-ide-chat-playground-run-prompt.png)

1. Continue the chat by entering the prompt **Is there a museum there?** and pressing Enter. 

   The response shows how the model uses the previous prompt as context for generating its next response.

1. Choose **Reset** to start a new chat with the model.

1. Influence the model response by doing the following:

   1. Enter and run a prompt. Note the response from the model.

   1. Choose the configurations menu to open the **Configurations** pane.  
![\[Inference parameters in Amazon Bedrock in SageMaker Unified Studio chat playground.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/bedrock/bedrock-ide-chat-playground-inference.png)

   1. Influence the model response by making [inference parameters](explore-prompts.md#inference-parameters) changes.

   1. (Optional) In **System instructions**, enter any overarching system instructions that you want the model to apply for future interactions.

   1. Run the prompt again and compare the response with the previous response. 

1. Choose **Reset** to start a new chat with the model.

1. Try sending an image to a model by doing the following:

   1. For **Model**, choose a model that supports [images](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html).

   1. Choose the attachment button at the left of the **Enter prompt** text box.   
![\[Run prompt in Amazon Bedrock in SageMaker Unified Studio chat playground.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/bedrock/bedrock-ide-chat-playground-run-prompt-attach.png)

   1. In the open file dialog box, choose an image from your local computer.

   1. In the text box, next to the image that you uploaded, enter **What's in this image?**. 

   1. Press Enter on your keyboard enter to send the prompt to the model. The response from the models describes the model or image.

1. (Optional) Try using another model and different prompts. Different models have different recommendations for creating, or engineering, prompts. For more information, see [Prompt engineering guides](explore-prompts.md#prompt-guides).

1. (Optional) Compare the output from multiple models, or [shared apps](bedrock-explore-chat-playground-app.md).

   1. In the playground, turn on **Compare mode**.

   1. In both panes, select the model that you want to compare. If you want to use a shared app, select **App** in **Type** and then select the app in **App**.

   1. Enter a prompt in the text box and run the prompt. The output from each model is shown. You can choose the copy icon to copy the prompt or model response to the clipboard.

   1. (Optional) Choose **View configs** to make configuration changes, such as [inference parameters](explore-prompts.md#inference-parameters). Choose **View chats** to return to the chat page.

   1. (Optional) Choose **Add chat window** to add a third window. You can compare up to 3 models or apps.

   1. Turn off **Compare mode** to stop comparing models.

# Get started with the query editor in Amazon SageMaker Unified Studio
<a name="getting-started-querying"></a>

You can use the query editor to perform analysis using SQL. The query editor tool provides a place to write and run queries, view results, and share your work with your team.

## Prerequisites
<a name="start-querying-prerequisites"></a>

Before you get started with the query editor, you must access Amazon SageMaker Unified Studio and create a project with the **SQL analytics** project profile.

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

   For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Create a project with a **SQL analytics** project profile. This project profile sets up your project with access to Amazon Redshift Serverless and Amazon Athena resources. For more information, see [Create a new project](create-new-project.md).

## Query sample data using Amazon Athena in Amazon SageMaker Unified Studio
<a name="start-querying-create-with-athena"></a>

After you create a project, you can use the query editor to write and run queries.

1. Navigate to the project you created in the top center menu of the Amazon SageMaker Unified Studio home page.

1. Expand the **Build** menu in the top navigation bar, then choose **Query editor**.

1. Create a new querybook tab. A querybook is a kind of SQL notebook where you can draw from multiple engines to design and visualize data analytics solutions.

1. Select a data source for your queries by using the menu in the upper-right corner of the querybook.

   1. Under **Connections**, choose **Athena (Lakehouse)** to connect to your Lakehouse resources.

   1. Under **Catalogs**, choose **AwsDataCatalog**.

   1. Under **Databases**, choose the name of the AWS Glue database. This database was created for use when the project was created.

1. Choose **Choose** to connect to the database and query engine.

1. Copy the following SQL query into the querybook cell to create a table in the database.

   ```
   CREATE TABLE mkt_sls_table AS
   SELECT 146776932 AS ord_num, 23 AS sales_qty_sld, 23.4 AS wholesale_cost, 45.0 as lst_pr, 43.0 as sell_pr, 2.0 as disnt, 12 as ship_mode,13 as warehouse_id, 23 as item_id, 34 as ctlg_page, 232 as ship_cust_id, 4556 as bill_cust_id
   UNION ALL SELECT 46776931, 24, 24.4, 46, 44, 1, 14, 15, 24, 35, 222, 4551
   UNION ALL SELECT 46777394, 42, 43.4, 60, 50, 10, 30, 20, 27, 43, 241, 4565
   UNION ALL SELECT 46777831, 33, 40.4, 51, 46, 15, 16, 26, 33, 40, 234, 4563
   UNION ALL SELECT 46779160, 29, 26.4, 50, 61, 8, 31, 15, 36, 40, 242, 4562
   UNION ALL SELECT 46778595, 43, 28.4, 49, 47, 7, 28, 22, 27, 43, 224, 4555
   UNION ALL SELECT 46779482, 34, 33.4, 64, 44, 10, 17, 27, 43, 52, 222, 4556
   UNION ALL SELECT 46779650, 39, 37.4, 51, 62, 13, 31, 25, 31, 52, 224, 4551
   UNION ALL SELECT 46780524, 33, 40.4, 60, 53, 18, 32, 31, 31, 39, 232, 4563
   UNION ALL SELECT 46780634, 39, 35.4, 46, 44, 16, 33, 19, 31, 52, 242, 4557
   UNION ALL SELECT 46781887, 24, 30.4, 54, 62, 13, 18, 29, 24, 52, 223, 4561
   ```

1. Choose the **Run cell** icon. ![\[The chart icon used in the Amazon SageMaker Unified Studio query editor.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/qev2/qev2-run.png)

   When the query finishes running, a **Result** tab appears below the cell to display the outcome.

1. Refresh the **Data explorer** navigation pane, and view the table you created in the **Lakehouse** section.

1. Choose **Add SQL** to add another cell to the querybook. Then enter the following script:

   ```
   select * from mkt_sls_table limit 10
   ```

1. Choose the **Run cell** icon.

   In the **Results** tab, the first ten rows of the table you created are displayed.

1. Choose **Add SQL** to add another cell to the querybook. Then enter the following script:

   ```
   select item_id, sales_qty_sld 
   from mkt_sls_table 
   where sales_qty_sld > 30
   ```

1. Choose the **Run cell** icon.

   In the **Results** tab, only data that fulfills the specified requirements is displayed.

1. In the **Results** tab, choose the **Chart view** icon. ![\[The chart icon used in the Amazon SageMaker Unified Studio query editor.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/qev2/qev2-chart.png)

   This opens up a chart view with a line graph as a default.

1. Set up the chart to display a pie chart.

   1. For **Type**, choose **Pie**.

   1. For **Values**, choose **sales\$1qty\$1sold**.

   1. For **Labels**, choose **item\$1id**.

   This displays a pie chart so you can visualize results.

After you've finished querying the data, you can choose to view the queries in your query history and save them to share with other project members.
+ For more information about reviewing query history, see [Review query history](sql-query-save-share.md#sql-query-history).
+ For more information about other operations you can do with the query editor, such as using generative AI to create SQL queries, see [Query data with SQL](sql-query.md).

# Get started adding on-demand Amazon EMR on EC2 instances
<a name="getting-started-emr-ec2-page"></a>

## Overview
<a name="getting-started-emr-ec2-overview"></a>

 [Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html) on EC2 is a managed big data platform that simplifies running distributed data processing frameworks like Apache Spark, Hadoop, and Hive on [Amazon EC2 instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html). Amazon EMR handles the complexities of cluster provisioning, configuration, and scaling, allowing you to focus on your data processing tasks. For more details on Amazon EMR, visit the [Amazon EMR webpage](https://aws.amazon.com/emr/). 

 The Amazon EMR on EC2 integration with Amazon SageMaker Unified Studio streamlines your data analytics workflow, giving you a unified data and compute experience. This integration lets you easily access and create Amazon EMR clusters alongside other data tools in a single interface. You can organize Amazon EMR resources within Amazon SageMaker Unified Studio projects, connect Amazon EMR workloads with your data catalog, and provision clusters on-demand. With this integration, you can experiment by creating and terminating Amazon EMR clusters as needed, optimizing costs while maintaining a cohesive data experience. 

 With the help of this getting started guide you will be able to configure Amazon EMR cluster settings for EC2 deployment and launch Amazon EMR clusters. 

## Prerequisites
<a name="getting-started-emr-ec2-prerec"></a>

You must complete the following procedure through the AWS management console to create an Amazon EMR on EC2 in an Amazon SageMaker Unified Studio project.

### Set up Amazon SageMaker Unified Studio
<a name="getting-started-emr-ec2-set-up"></a>

 Before you get started with creating an Amazon EMR on EC2, you must access Amazon SageMaker Unified Studio and create a project with the **All capabilities** project profile. 

1.  If you haven't created an Amazon SageMaker Unified Studio domain, follow the steps in [Create a Amazon SageMaker Unified Studio domain - quick setup ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/create-domain-sagemaker-unified-studio-quick.html). 

1. To access Amazon SageMaker Unified Studio:

   1. Open the Amazon SageMaker Unified Studio console at [https://console.aws.amazon.com/sagemaker/.](https://console.aws.amazon.com/sagemaker/) 

   1. Choose **Studio**.

   1. Choose **Open Studio**.

   1. Sign in using your SSO or AWS credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Create a project with the **All capabilities** profile:

   1. In Amazon SageMaker Unified Studio, choose the **Projects** icon in the left sidebar.

   1. Choose **Create project**.

   1. Select the **All capabilities** project profile.

   1. Follow the prompts to complete project creation.

   1. This profile grants you access to Amazon EMR resources. For more information, see [Create a project](getting-started-create-a-project.md). 

### PEM certificate configuration
<a name="getting-started-emr-ec2-pem-cert"></a>

1. Create a PEM certificate, which saves your ZIP file on your local machine:

   1. Open your terminal on your local machine.

   1. The following commands demonstrate how to use [OpenSSL](https://www.openssl.org/) to generate a self-signed X.509 certificate with a 2048-bit RSA private key. Consider changing `us-west-2` to the region you are using throughout this tutorial. Other optional subject items such as country (C), state (S), and Locale (L), are specified. 
**Important**  
This example is a proof-of-concept demonstration only. Using self-signed certificates is not recommended and presents a potential security risk. For production systems, use a trusted certification authority (CA) to issue certificates. For more information see [Providing certificates for encrypting data in transit with Amazon EMR encryption](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-enable.html#emr-encryption-certificates).

      ```
      $ openssl req -x509 -newkey rsa:2048 -keyout privateKey.pem -out certificateChain.pem -days 365 -nodes -subj '/C=US/ST=Washington/L=Seattle/O=MyOrg/OU=MyDept/CN=*.us-west-2.compute.internal'
      $ cp certificateChain.pem trustedCertificates.pem
      $ zip -r -X my-certs.zip certificateChain.pem privateKey.pem trustedCertificates.pem
      ```

1. Upload the PEM certificate ZIP file to an Amazon S3 bucket:

   1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

   1. Under **General purpose buckets**, choose your amazon-sagemaker bucket.

   1. Navigate to your domain folder. For multiple domains, locate the folder matching your Domain ID. You can find your Domain ID in the **project details** tab of Amazon SageMaker Unified Studio.

   1. Choose **Create folder** and enter **certificate\$1location** as the folder name. You do not need to specify an encryption key during folder creation. 
**Note**  
The name **certificate\$1location** is required for this folder and cannot be customized.

   1. Select your new folder to open it.

   1. Under **Objects**, select **Upload** and **Add files**. Select your PEM certificate ZIP file (named "my-certs.zip") from your local machine, then choose **Upload**.

   1. Select the uploaded ZIP file and choose **Copy S3 URI**. You'll need this location value in step 3.

1. Specify your certificate location in Amazon SageMaker Unified Studio, following the instructions in [ Specify PEM certificate for EmrOnEc2 blueprint](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/blueprints.html#enable-emr-on-ec2-blueprint).

## Creating your Amazon EMR cluster
<a name="getting-started-emr-ec2-create"></a>

1. In Amazon SageMaker Unified Studio, choose your project to enter the project overview page and select **Compute** from the navigation bar.

1. In the **Compute** panel, select the **Data processing** tab.

1. To create a new Amazon EMR on EC2 cluster choose **Add compute**.

1. In the **Add compute** modal, you can select the type of compute you would like to add to your project. Select **Create new compute resources**.

1. Select **Amazon EMR on EC2 cluster** and choose **Next**.

1. The **Add compute** dialog box allows you to specify the name of the Amazon EMR on EC2 cluster. Default settings for the Amazon EMR are fine. Choose your EMR configuration according to your choice from the prerequisites. 

1. After configuring any settings if you choose, select **Add compute**. After some time, your Amazon EMR on EC2 cluster will be added to your project.

# Use the sample notebook
<a name="getting-started-use-sample-notebook"></a>

You can get started using Amazon SageMaker Unified Studio by using the sample notebook in the JupyterLab IDE within your project. This getting\$1started.ipynb notebook provides information about using AWS Glue, Amazon Redshift, Amazon Athena, and more. This is a multi-service, poly-compute notebook, designed to enable end-to-end development in a single notebook.

In an Amazon SageMaker Unified Studio notebook, you can select the language and framework for each cell based on the compute options or connections configured in your project. You can add or modify these compute connections from the project's compute management screen. The compute choices differ based on your project’s profile. However, all default profiles come with local Python, serverless Spark powered by AWS Glue, and Trino with Amazon Athena. There is a README file with additional information about the sample notebook and Amazon SageMaker Unified Studio.

You can also create new notebooks to input new code from scratch. For more information about using the JupyterLab IDE in Amazon SageMaker Unified Studio, see [Using the JupyterLab IDE in Amazon SageMaker Unified Studio](jupyterlab.md).

To navigate to the sample notebook, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to a project. To do this, choose **Select project** from the center menu.

1. Expand the **Build** menu, then choose **JupyterLab**.

# Getting started with Amazon Q Developer generative AI chat and command line tools
<a name="qdeveloper-integration"></a>

**Note**  
Powered by Amazon Bedrock: Amazon Q Developer is built on Amazon Bedrock and includes [automated abuse detection](https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-detection.html) implemented in Amazon Bedrock to enforce safety, security, and the responsible use of AI.

In this Getting Started procedure, you will use Amazon SageMaker Unified Studio, SageMaker Catalog, Sagemaker Lakehouse sample data, and Amazon Q Developer generative AI tools to analyze code in the JupyterLab IDE. The Amazon Q Developer tools include Q chat and Q CLI. 

Amazon Q Developer provides an agentic chat feature supporting read and write operations in the notebook (Code Editor, JupyterLab) with workspace context awareness. With Amazon Q chat, you can chat about AWS services, your development project, your data pipelines, and related topics. The Amazon Q CLI provides intelligent, contextual assistance for error debugging and development tasks, and it can run complex command line tasks for you.

**Warning**  
Generative AI may give inaccurate responses. Avoid sharing sensitive information. Chats may be visible to others in your organization.

For reference information about implementing Amazon Q Developer in Amazon SageMaker Unified Studio, see [Using Amazon Q Developer with Amazon SageMaker Unified Studio](q-actions.md).

**Topics**
+ [Discover Amazon Q Developer in Amazon SageMaker Unified Studio](#qdeveloper-integration-overview)
+ [Considerations for using the Amazon Q Developer feature](#qdeveloper-integration-considerations)
+ [Prerequisites for using the Amazon Q Developer feature](#qdeveloper-integration-prerequisites)
+ [Getting started using Q chat](qdeveloper-integration-start-chat.md)
+ [Getting started with Q CLI](qdeveloper-integration-start-CLI.md)

## Discover Amazon Q Developer in Amazon SageMaker Unified Studio
<a name="qdeveloper-integration-overview"></a>

You can use Agentic AI tools through Amazon Q Developer tools that use context and agents to summarize, analyze, perform tasks, and work on your code with you. In your JupyterLab notebook or Code Editor, you can use the Amazon Q chat and Amazon Q CLI tools to understand and configure your Amazon SageMaker Unified Studio project files. For more information about Amazon Q Developer, see [What is Amazon Q Developer](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/what-is.html) in the *Amazon Q Developer User Guide*.

## Considerations for using the Amazon Q Developer feature
<a name="qdeveloper-integration-considerations"></a>

The following considerations apply for working with Amazon Q Developer in Amazon SageMaker Unified Studio.
+ For Q CLI, for domains using the Amazon Q Free Tier, you will be automatically logged in. For domains using the Amazon Q Pro Tier, you will be prompted to login. You can use the AWS access portal URL (also called the Start URL) associated with the IAM Identity Center login attached to the domain and the IDC region for login. Q CLI will then use the profile and subscription the admin creates following the steps detailed in [Enable Amazon Q Developer Pro](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/amazonq.html#amazonq-enable).
**Note**  
If there is only one profile set up, then that is the profile that Q CLI will use. If there are multiple profiles set up, then Q CLI prompts you to choose one. Choose the profile associated with the domain.
+ When you enable Amazon Q, you can choose between the Free or Pro tiers of the service. JupyterLab in the default space supports both the free and paid tiers. However, in additional spaces, JupyterLab and Code Editor support the Free Tier only.
+ The level of use for the Q chat and Q CLI are set by the tier availability as detailed on the pricing page at [Amazon Q Developer Pricing](https://aws.amazon.com/q/developer/pricing/).

**Note**  
When using the Free Tier, request limits are shared at the account level, meaning that one customer can potentially use up all requests. The Pro Tier of Amazon Q is charged at the user level, with limits set at the user level as well. The Pro Tier also lets you manage users and policies with enterprise access control.

## Prerequisites for using the Amazon Q Developer feature
<a name="qdeveloper-integration-prerequisites"></a>

The following prerequisities are required for this getting started procedure.
+ You must have access to a SageMaker Unified Studio domain and project. Create a project with an **All capabilities** project profile. This project profile sets up your project with access to S3 and Athena resources. For more information, see [Projects](projects.md).
+ To use the Amazon Q Developer chat and CLI features in Amazon SageMaker Unified Studio feature, you need access to a domain where Amazon Q Developer is configured. 

  If the domain is set to use the Free Tier, you will have access to Q chat and Q CLI in JupyterLab without any additional login. For the Pro Tier, your administrator must set up a profile, subscribe users, and attach the profile to the Amazon SageMaker Unified Studio domain. In Q CLI, you can then use the start URL and IDC region to sign in with a Pro Tier license. See [Enable Amazon Q Developer Pro](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/amazonq.html#amazonq-enable).

  For more information, see [Using the coding assistant](using-the-coding-assistant.md).

# Getting started using Q chat
<a name="qdeveloper-integration-start-chat"></a>

Use Q chat as follows. Make sure you are signed in with an ID that is configured for Q chat access.

1. Log in to your AWS account and navigate to the access portal, such as with your SSO login.

   Open the SageMaker Unified Studio console through the access portal, and then navigate to your project.

1. Open a Jupyter notebook by choosing **Build**, and then choosing **JupyterLab**. A Jupyter notebook cell page opens.

1. Choose the icon on the left for Q chat with Amazon Q Developer. If this is the first time, a message displays for you to acknowledge the AWS policies for responsible AI.   
![\[An image of the Q chat icon.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_chat_icon.png)

1. Keep the toggle for **Agentic coding** ON.

1. Type questions to interact with Q chat. Type over the **Ask a question... ** line.

You can get started using Q chat with the following examples.

## Example 1: Ask for information about your project
<a name="qdeveloper-integration-chat-exampleinfo"></a>

This example shows how Q chat can provide context aware responses for your project resources.

1. To open JupyterLab, choose **Build**, and then choose **JupyterLab**. If you are in JupyterLab, you can chat with Q with additional Amazon Q chat contextual awareness. 

1. In the Q chat field, enter the following.

   ```
   Can you tell me about my project?
   ```

   The response returns where Q asks follow-up questions and shows your files.

## Example 2: Create and run a data pipeline
<a name="qdeveloper-integration-chat-examplepipeline"></a>

This example shows how Q chat can perform complex tasks for you, such as creating and running a data pipeline in your project.

1. To open JupyterLab, choose **Build**, and then choose **JupyterLab**. If you are in JupyterLab, you can chat with Q with additional Amazon Q chat contextual awareness. 

1. In the Q chat field, enter the following.

   ```
   Can you help me set up and run a data pipeline?
   ```

   The following diagram shows the response.  
![\[An example response.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_chat-pipeline-1.png)

   The following image shows Q asking questions and explaining the task.  
![\[An example response.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_chat-pipeline-2.png)

   The following image shows Q creating the shell file for you in your workspace.  
![\[An example response.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_chat-pipeline-4.png)

   The following image shows Q creating the files and describing them.  
![\[An example response.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_chat-pipeline-5.png)

   The following image shows Q providing the instructions to run the pipeline.  
![\[An example response.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_chat-pipeline-6.png)

   The following image shows the notebook file that Q created for you in your workspace.  
![\[An example response.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_chat-pipeline-notebook.png)

1. 

**Get access to data**

   Before visualizing data, you might need to request access to the data by subscribing to data in Amazon SageMaker Catalog.

1. 

**Create new connections**

   You can create connections directly to Amazon Redshift and other third party sources like Oracle and Snowflake from Amazon SageMaker Unified Studio. You configure connection details and credentials securely, and you can manage them within the project. For detailed steps, see [Amazon Redshift compute connections](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/compute-redshift.html) and [Data connections in lakehouse architecture](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/lakehouse-data-connection.html).

# Getting started with Q CLI
<a name="qdeveloper-integration-start-CLI"></a>

Use Q CLI as follows. Make sure you are signed in with an ID that is configured for Q CLI access. For more information about signing up, see [About signing up](q-actions.md#q-actions-aboutsignup).

1. Log in to your AWS account and navigate to the access portal, such as with your SSO login.

   Open the SageMaker Unified Studio through the access portal, and then navigate to your project.

1. Open a Jupyter notebook by choosing **Build, **and then choosing **JupyterLab**. Choose the icon for the python or console interface. A Jupyter notebook cell page opens.

1. Open a terminal window by choosing **New**, and then **Terminal**.

1. Type the following to open Q CLI.

   ```
   q chat
   ```

You can get started using Q CLI with the following examples.

## Example 1: Create a Glue table and create a python notebook for analysis
<a name="qdeveloper-integration-CLI-exampletable"></a>

This example shows how Q CLI can perform complex command line procedures for you, such as creating and visualizing data for a sample python notebook for a data engineer analyzing a Glue table in your project Lakehouse sample data source.

1. Download the diabetic data sample data set from the [sample data](https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008) site.

1. Create a new Glue table named `diabetic_data` and add the sample data that you just downloaded as a data source. Choose **Create table**. A schema shows for the sample table.  
![\[An image of the Add data screen\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_cli_notebook-1.png)

1. In the terminal for Q CLI, enter the following.

   ```
   You are a machine learning engineer, and you are working with data from the data engineer. Your responsibility is to analyze the output data in your notebook. Can you help me to create a python notebook for the following.
   		- Use the diabetic_data dataset in SageMaker Lakehouse.
   		- Create a notebook to perform typical data engineering tasks for the machine learning experience in JupyterLab.
   		- Make sure to handle missing values, perform descriptive analysis, feature analysis
                 - Create a comprehensive README.md file
   ```

   The following diagram shows the response where Q CLI asks questions and creates sample files.  
![\[An example image with the terminal window Q CLI page.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_cli_notebook-2.png)

1. The following diagram shows the response where Q CLI interacts with you while creating the files.  
![\[An example image with the terminal window Q CLI page.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_cli_notebook-3.png)

1. The following diagram shows the response where Q CLI provides the outline and description of what will be created.  
![\[An example image with the terminal window Q CLI page.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_cli_notebook-4.png)

1. The following diagram shows the response where Q CLI summarizes the files and their purpose.  
![\[An example image with the terminal window Q CLI page.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_cli_notebook-5.png)

## Example 2: Ask Q CLI to list project information
<a name="qdeveloper-integration-CLI-examplefunction"></a>

This example shows how Q CLI can provide context aware and complex command line help for your projects and data.
+ In the terminal, enter the following.

  ```
  Can you tell me my project and domain information?
  ```

  The response provides you with project information.