# Use the SageMaker Data Agent


The SageMaker Data Agent in Amazon SageMaker Unified Studio provides intelligent assistance for code generation, error diagnosis, and data analysis recommendations in Notebooks and Query Editor. The agent helps data engineers, analysts, and data scientists who spend significant time on manual setup tasks and boilerplate code when building analytics and ML applications. It generates code and execution plans from natural language prompts and integrates with data catalogs and business metadata to streamline the development process.

**Note**  
SageMaker Data Agent is available in Notebooks and Query Editor in IAM-based domains.

## Prerequisites

+ An open notebook or querybook in Amazon SageMaker Unified Studio
+ Access to the SageMaker Data Agent features in your domain

# Basic usage


**To generate code with the SageMaker Data Agent**

1. Type a natural language prompt describing what you want to accomplish.

1. Press  Alt A  (or  Opt A  on Mac) to trigger code generation.

1. Review the generated code in the suggestion panel.

1. Choose **Accept** to add the code to your cell or **Reject** to dismiss it.

**To use the SageMaker Data Agent chat panel**

1. Open the agent chat panel from the notebook interface.

1. Enter a prompt describing your data analysis needs.

1. Review the agent's plan and suggested approach.

1. The SageMaker Data Agent will generate a collection of cells to implement the analysis plan.

1. Review and execute the generated cells as needed.

**To diagnose errors with the SageMaker Data Agent**

1. When a cell execution results in an exception, look for the **Diagnose with agent** button.

1. Choose the button to get agent-generated suggestions for fixing the error.

1. Review the suggested fixes and code improvements.

1. Apply the suggestions or use them as guidance for manual fixes.

# Getting started with the SageMaker Data Agent for Notebook


The SageMaker Data Agent is available directly within your SageMaker AI notebook interface through two interaction modes:

Agent panel  
Access comprehensive multi-step workflows by opening the agent panel from the notebook interface. Use this mode for complex analytical tasks requiring planning, data discovery, and multiple operations.

In-line assistance  
Generate code directly within notebook cells by using the inline prompt interface. Use this mode for focused tasks like writing specific queries, creating visualizations, or modifying existing code.

# Getting started with the SageMaker Data Agent for Query Editor


The SageMaker Data Agent in Query Editor provides a conversational SQL development experience. Unlike single-turn SQL generation, the agent supports multi-turn conversations where you can ask follow-up questions, request modifications to generated queries, and receive contextual guidance on query optimization.

The agent is accessible directly within the Query Editor interface through the agent panel.

![\[The SageMaker Data Agent welcome screen in Query Editor, showing sample prompts and the chat interface.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/data-agent/qe-welcome.png)


Step-by-step planner  
When you describe a complex analytical task, the agent proposes a step-by-step plan to guide your workflow. You can review and approve the plan before the agent generates SQL for each step.  

![\[The SageMaker Data Agent proposing a multi-step analysis plan in Query Editor, with options to cancel or run step-by-step.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/data-agent/qe-plan.png)


Auto-injection of generated SQL  
The agent automatically creates cells with generated SQL directly in your querybook, matching the notebook experience. You can review and run the generated SQL in place.  

![\[The SageMaker Data Agent auto-injecting SQL cells into a querybook, with options to accept, reject, or accept and run the generated code.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/data-agent/qe-step-cells.png)


Fix with AI  
When a query fails, the agent can analyze the error and suggest corrections. Use the Fix with AI capability to get agent-generated fixes for failed queries.  

![\[The SageMaker Data Agent diagnosing a query error and suggesting a corrected SQL query in the Query Editor.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/data-agent/qe-fix-with-ai.png)


**To use the SageMaker Data Agent in Query Editor**

1. Navigate to a project and open the Query Editor from the **Build** menu.

1. Open the agent panel from the Query Editor interface.

1. Enter a natural language prompt describing your SQL task. For example: *"Write a query that calculates monthly recurring revenue by customer segment for Q4 2025, using the billing.invoices and customer.segments tables."*

1. Review the agent's proposed plan and choose to accept or modify it.

1. The agent generates SQL and injects it into your querybook cells.

1. Review and run the generated SQL. If a query fails, use **Fix with AI** to get suggested corrections.

**Example: Multi-turn SQL development**  
Initial prompt: *"Which Redshift tables in the analytics schema have columns related to customer churn? Show me their schemas."*  
The agent queries your data catalog and returns schema information. You can then follow up:  
Follow-up prompt: *"Help me find all customers who downgraded their subscription in the last 90 days and calculate the revenue impact by region."*  
The agent builds on the previous context to generate a multi-step query plan.

# Using the SageMaker Data Agent for data analytics tasks


## Data exploration and discovery


The agent helps you explore available data sources by querying AWS Glue Data Catalog metadata through MCP connections. When you request data exploration, the agent retrieves table information, schemas, and relationships from your catalog to understand what data is available.

**Example prompt for data analyst**  
*"Show me what customer tables are available and their schemas"*  
The agent queries your AWS Glue Data Catalog and generates code to display table metadata.

**Example prompt**  
*"Calculate monthly revenue by product category from the sales\$1transactions table"*  
The agent accesses your AWS Glue Data Catalog to verify the sales\$1transactions table structure and generates:  

```
SELECT 
    DATE_TRUNC('month', transaction_date) as month,
    product_category,
    SUM(transaction_amount) as total_revenue,
    COUNT(DISTINCT customer_id) as unique_customers,
    AVG(transaction_amount) as avg_transaction_value
FROM sales_db.sales_transactions
WHERE transaction_date >= DATE_ADD(CURRENT_DATE(), -180)
GROUP BY DATE_TRUNC('month', transaction_date), product_category
ORDER BY month DESC, total_revenue DESC
```

## Building multi-step analytical workflows


For complex analysis, the agent proposes step-by-step plans that break down your objective into discrete operations, requesting approval before executing each step.

**Example prompt for data scientist**  
*"Analyze the churn table and visualize retention rate by region"*  
The agent proposes a multi-step plan:  
Now I'll create a plan to analyze the churn table and visualize retention rate by region. Since the table has a 'state' column representing regions and a 'churn' boolean column, I can calculate the retention rate (percentage of customers who didn't churn) for each state.  

1. **Step 1: Load Churn Data from Catalog** - Use PySpark to read the churn table from the AWS Glue catalog (sagemaker\$1sample\$1db.churn) and preview the data structure.

1. **Step 2: Calculate Retention Rate by Region** - Aggregate the data by state to calculate the retention rate (percentage of customers where churn = False) for each state/region.

1. **Step 3: Visualize Retention Rates** - Create a bar chart showing the retention rate by region/state, sorted for better readability.
Each step generates code in a new cell, maintaining context from previous cells and building upon earlier results.

## Cell level inline code modification


The agent provides inline assistance to modify existing code based on your requirements. You can use the agent icon on every cell to trigger a prompt input box and type your request instead of code.

**Example workflow**  
Initial cell content:  

```
# Load sales data
sales_df = spark.table("sales_db.transactions")
sales_df.show(10)
```
Inline prompt: *"Add filtering for electronics category and last quarter only"*  
Updated cell content:  

```
# Load sales data with filters
from datetime import datetime, timedelta

quarter_start = datetime.now() - timedelta(days=90)
sales_df = spark.table("sales_db.transactions") \
    .filter(col("category") == "electronics") \
    .filter(col("transaction_date") >= quarter_start)
sales_df.show(10)
```

## Referencing previous cell results


The agent understands your notebook context and can generate code that references data frames and variables from previous cells.

**Example workflow**  
Cell 1:  

```
# Customer purchase analysis
customer_purchases = spark.sql("""
    SELECT customer_id, product_category, 
           SUM(amount) as total_spent
    FROM sales_db.transactions
    GROUP BY customer_id, product_category
""")
customer_purchases.createOrReplaceTempView("customer_summary")
```
Agent prompt: *"Create a visualization showing the distribution of customer spending from the results above"*  
Cell 2 (generated by agent):  

```
# Visualize customer spending distribution
import matplotlib.pyplot as plt
import pandas as pd

# Convert to pandas for visualization
spending_data = customer_purchases.toPandas()

plt.figure(figsize=(12, 6))
plt.hist(spending_data['total_spent'], bins=50, edgecolor='black')
plt.xlabel('Total Spending')
plt.ylabel('Number of Customers')
plt.title('Customer Spending Distribution')
plt.grid(True, alpha=0.3)
plt.show()

# Show summary statistics
print(spending_data['total_spent'].describe())
```

## Error handling and fix with the SageMaker Data Agent


When code execution fails, the SageMaker Data Agent can analyze error messages and suggest corrections. Use the fix with agent capability to automatically generate corrected code.

# Best Practices for the SageMaker Data Agent


Be specific about your data sources  
Good: *"Calculate total sales by region from the sales\$1transactions table"*  
Less effective: *"Show me sales data"*

Specify desired output format  
Good: *"Create a bar chart showing top 10 products by revenue with labels"*  
Less effective: *"Visualize product performance"*

Provide context for multi-step workflows  
Good: *"Using the customer segments @customer\$1segment, calculate average lifetime value for each segment"*  
Less effective: *"Calculate lifetime value"*

# Security and access control for the SageMaker Data Agent


**Topics**
+ [

## Required IAM permissions to use SageMaker Data Agent
](#data-agent-control-actions)

## Required IAM permissions to use SageMaker Data Agent


To use the SageMaker Data Agent in Notebooks or Query Editor, your project role needs the required IAM permissions. Your role must have the permissions to invoke the following Amazon DataZone APIs: SendMessage, GenerateCode, StartConversation, GetConversation, and ListConversations.

# Data storage in the SageMaker Data Agent


## Content that is stored by SageMaker Data Agent

+ Your natural language prompts and questions to the SageMaker Data Agent
+ The code and responses generated by the SageMaker Data Agent

## Content that is not stored by SageMaker Data Agent

+ Code that you write yourself in notebook cells
+ Code that you manually modify or change
+ Notebook context and metadata
+ Data from your AWS Glue Data Catalog or other data sources

## AWS Regions where content is processed and stored


Your content collected by the SageMaker Data Agent is stored in the AWS Region where your Amazon SageMaker Unified Studio domain was created. When you use the SageMaker Data Agent in notebooks, your prompts and generated responses remain in your domain's Region.

With cross-region inference, your requests to the SageMaker Data Agent may be processed in a different Region where your content is stored. For more information, see [Cross-region processing in the SageMaker Data Agent](data-agent-cross-region.md).

# Service improvement


To help the SageMaker Data Agent provide the most relevant information, AWS may use certain content for service improvement. This content includes your natural language prompts to the SageMaker Data Agent and the responses generated by the agent.

AWS may use this content, for example, to help the SageMaker Data Agent provide better responses to common questions, fix operational issues, or for debugging.

## Content that AWS may use for service improvement

+ Your natural language prompts and questions to the SageMaker Data Agent
+ The code and responses generated by the SageMaker Data Agent

## Content that AWS does not use for service improvement

+ Code that you write yourself in notebook cells
+ Code that you manually modify or change
+ Notebook context and metadata
+ Data from your AWS Glue Data Catalog or other data sources

Only Amazon employees will have access to the data. Your trust, privacy, and the security of your Customer Content are our highest priority and ensure that our use complies with our commitments to you. For more information, see [Data Privacy FAQ](https://aws.amazon.com/compliance/data-privacy-faq/).

## How to opt out


To opt out of your data being used to improve AWS services, you can configure an AI services opt-out policy for Amazon DataZone in AWS Organizations. This policy controls data usage across Amazon SageMaker Unified Studio, including the SageMaker Data Agent. For more information, see AI services opt-out policies in the AWS Organizations User Guide.

When you configure an AI services opt-out policy, it has the following effects:
+ AWS will delete the data that it collected and stored for service improvement prior to your opt out (if any).
+ After you opt out, AWS will no longer collect or store this data.
+ AWS will no longer use your content for service improvement.

# Cross-region processing in the SageMaker Data Agent


## Cross-region inference


The SageMaker Data Agent for Notebooks uses geographic cross-region inference to process natural language requests and generate code responses. With geographic cross-region inference, the agent will automatically route your inference request to optimize performance, maximizing available compute resources and model availability, and providing the best customer experience. The type of geographic cross-region inference used depends on your Amazon SageMaker Unified Studio domain's Region. Most Regions use geographic cross-region inference, which keeps requests within the same geography. However, some Regions use global cross-region inference, which may route requests to any AWS Region globally. 

### Cross-Region Inference


The SageMaker Data Agent is powered by Amazon Bedrock and uses cross-region inference to distribute traffic across different AWS Regions to enhance large language model (LLM) inference performance and reliability. With cross-region inference, you get:
+ Increased throughput and resilience during high demand periods
+ Improved performance

Although cross-region inference does not change where your notebook data or generated code is stored, your natural language prompts, code context, and AWS Glue Data Catalog metadata may be transmitted to different Regions for inference processing. All data is encrypted in transit across Amazon's secure network.

There is no additional cost for using cross-region inference.

## Supported regions for cross-region inference


**Regions using geographic cross-region inference**

For most Regions, cross-region inference requests are kept within the AWS Regions that are part of the same geography where your Amazon SageMaker Unified Studio domain resides. For example, a request made from a notebook in the US East (N. Virginia) Region is routed only to AWS Regions within the United States geography. The following table describes the Regions that your requests might be routed to, depending on the geography where the request originated.


| Supported geography | Inference regions | 
| --- | --- | 
|   United States  |   US East (N. Virginia) (us-east-1), US West (Oregon) (us-west-2), US East (Ohio) (us-east-2)  | 
|   Europe  |   Europe (Frankfurt) (eu-central-1), Europe (Ireland) (eu-west-1), Europe (Paris) (eu-west-3), Europe (Stockholm) (eu-north-1)  | 
|   Japan  |   Asia Pacific (Tokyo) (ap-northeast-1), Asia Pacific (Osaka) (ap-northeast-3)  | 
|   Australia  |   Asia Pacific (Sydney) (ap-southeast-2), Asia Pacific (Melbourne) (ap-southeast-4)  | 

## Regions using global cross-region inference


**Important**  
The following AWS Regions use global cross-region inference. An inference request made by the SageMaker Data Agent when your Amazon SageMaker Unified Studio domain's Region is listed below will be securely routed to all available compute resources across all global commercial AWS Regions, to optimize performance and availability:
+ Asia Pacific (Mumbai) (ap-south-1)
+ Asia Pacific (Seoul) (ap-northeast-2)
+ Asia Pacific (Singapore) (ap-southeast-1)
+ South America (São Paulo) (sa-east-1)
+ Canada (Central) (ca-central-1)