# Working with Aurora DSQL EXPLAIN plans
<a name="working-with-explain-plans"></a>

Aurora DSQL uses a similar EXPLAIN plan structure to PostgreSQL, but with key additions that reflect its distributed architecture and execution model.

In this documentation, we'll provide an overview of Aurora DSQL EXPLAIN plans, highlighting the similarities and differences compared to PostgreSQL. We'll cover the various types of scan operations available in Aurora DSQL and help you understand the cost of running your queries.

## PostgreSQL VS Aurora DSQL EXPLAIN plans
<a name="postgresql-explain-plans"></a>

 Aurora DSQL is built on top of the PostgreSQL database and shares most plan structures with PostgreSQL, but has key architectural diﬀerences that aﬀect query execution and optimization:


| Feature | PostgreSQL | Aurora DSQL | 
| --- | --- | --- | 
|  Data Storage  |  Heap Storage  |  No heap, all rows are indexed by a unique identifier  | 
|  Primary Key  |  Primary key index is separate from table data  |  Primary key index is the table with all extra columns as INCLUDE columns  | 
|  Secondary Indexes  |  Standard secondary indexes  |  Work the same as PostgreSQL, with ability to include non-key columns  | 
|  Filtering Capabilities  |  Index Condition, Heap Filter  |  Index Condition, Storage Filter, Query Processor Filter   | 
|  Scan Types  |  Sequential Scan, Index Scan, Index Only Scan  |  Full Scan, Index Only Scan, Index Scan  | 
|  Query Execution  |  Local to the Database  |  Distributed (compute and storage are separate)  | 

Aurora DSQL stores table data directly in primary key order rather than in a separate heap. Each row is identified by a unique key, typically the primary key, which allows the database to optimize lookups more efficiently. The architectural difference explains why Aurora DSQL often uses Index Only Scans in cases where PostgreSQL might choose a sequential scan. 

Another key distinction is that Aurora DSQL separates compute from storage, enabling filters to be applied earlier in the execution path to reduce data movement and improve performance.

For more using EXPLAIN plans with PostgreSQL, see the [PostgreSQL EXPLAIN documentation](https://www.postgresql.org/docs/current/using-explain.html).

## Key elements in Aurora DSQL EXPLAIN plans
<a name="explain-plan-elements"></a>

Aurora DSQL EXPLAIN plans provide detailed information about how queries are executed, including where filtering occurs and which columns are retrieved from storage. Understanding this output helps you optimize query performance.

Index Cond  
Conditions used to navigate the index. Most efficient filtering that reduces data scanned. In Aurora DSQL, index conditions can be applied at multiple layers of the execution plan.

Projections  
Columns retrieved from storage. Fewer projections mean better performance.

Storage Filter  
Conditions applied at storage level. More efficient than query processor filters.

Query Processor Filter  
Conditions applied at the query processor level. Requires transferring all data before filtering, which results in higher data movement and processing overhead.

## Filters in Aurora DSQL
<a name="filtering-and-projection"></a>

Aurora DSQL separates compute from storage, which means that the point where filters are applied during query execution has a significant impact on performance. Filters applied before large volumes of data are transferred reduce latency and improve efficiency. The earlier a filter is applied, the less data needs to be processed, moved, and scanned, resulting in faster queries.

Aurora DSQL can apply filters at multiple stages in the query path. Understanding these stages is key to interpreting query plans and optimizing performance.


| Level | Filter Type | Description | 
| --- | --- | --- | 
| 1 | Index Condition |  Applied while scanning the index. Limits how much data is read from storage and reduces the data sent to the compute layer.  | 
| 2 | Storage Filter | Applied after data is read from storage but before it’s sent to compute. An example here is a ﬁlter on an include column of an index. Reduces data transfer but not the amount read. | 
| 3 | Query Processor Filter | Applied after data reaches the compute layer. All data must be transferred first, which increases latency and cost. Currently, Aurora DSQL cannot perform all ﬁltering and projection operations on storage, so some queries might be forced to fall back to this type of filtering. | 

# Reading Aurora DSQL EXPLAIN plans
<a name="reading-dsql-explain-plans"></a>

Understanding how to read EXPLAIN plans is key to optimizing query performance. In this section, we’ll walk through real examples of Aurora DSQL query plans, show how different scan types behave, explain where filters are applied, and highlight opportunities for optimization.

## Sample tables used in these examples
<a name="explain-plan-sample-tables"></a>

The examples below reference two tables: `transaction` and `account`.

The `transaction` table does not have a primary key, which causes Aurora DSQL to perform full table scans when querying it.

The `account` table has an index on `customer_id`. This index includes `balance` and `status` as covering columns, which allows certain queries to be satisfied directly from the index without reading from the base table. However, the index does not include `created_at`, so queries that reference this column require additional table access.

```
CREATE TABLE transaction (
    account_id uuid,
    transaction_date timestamp,
    description text
);

CREATE TABLE account (
    customer_id uuid,
    balance numeric,
    status varchar,
    created_at timestamp
);

CREATE INDEX ASYNC idx1 ON account (customer_id) INCLUDE (balance, status);
```

## Full Scan example
<a name="full-scan-example"></a>

Aurora DSQL has both Sequential Scans, which is functionally identical to PostgreSQL, as well as Full Scans. The only difference between these two are that Full Scans can utilize extra filtering on storage. Due to this, it is almost always selected above Sequential Scans. Due to the similarity, we will only cover examples of the more interesting Full Scans.

Full Scans will mostly be used on tables with no primary key. Because Aurora DSQL primary keys are by default full covering indexes, Aurora DSQL will most likely use Index Only Scans on the primary key in many situations where PostgreSQL would use a Sequential Scan. As with most other databases, a table with no indexes on it will scale badly.

```
EXPLAIN SELECT account_id FROM transaction WHERE transaction_date > '2025-01-01' AND description LIKE '%external%';
```

```
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Full Scan (btree-table) on transaction  (cost=125100.05..177933.38 rows=33333 width=16)
   Filter: (description ~~ '%external%'::text)
   -> Storage Scan on transaction (cost=12510.05..17793.38 rows=66666 width=16)
        Projections: account_id, description
        Filters: (transaction_date > '2025-01-01 00:00:00'::timestamp without time zone)
        -> B-Tree Scan on transaction (cost=12510.05..17793.38 rows=100000 width=30)
```

This plan shows two filters applied at different stages. The `transaction_date > '2025-01-01'` condition is applied at the storage layer, reducing how much data is returned. The `description LIKE '%external%'` condition is applied later in the query processor, after data is transferred, making it less efficient. Pushing more selective filters into the storage or index layers generally improves performance.

## Index Only Scan example
<a name="index-only-scan-example"></a>

Index Only Scans are the most optimal scan types in Aurora DSQL as they result in the fewest round trips to the storage layer and can do the most filtering. But just because you see Index Only Scan does not mean that you have the best plan. Because of all the different levels of filtering that can happen, it is essential to still pay attention to the different places filtering can happen.

```
EXPLAIN SELECT balance FROM account 
WHERE customer_id = '4b18a761-5870-4d7c-95ce-0a48eca3fceb' 
AND balance > 100 
AND status = 'pending';
```

```
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Only Scan using idx1 on account  (cost=725.05..1025.08 rows=8 width=18)
   Index Cond: (customer_id = '4b18a761-5870-4d7c-95ce-0a48eca3fceb'::uuid)
   Filter: (balance > '100'::numeric)
   -> Storage Scan on idx1 (cost=12510.05..17793.38 rows=9 width=16)
        Projections: balance
        Filters: ((status)::text = 'pending'::text)
        -> B-Tree Scan on idx1 (cost=12510.05..17793.38 rows=10 width=30)
            Index Cond: (customer_id = '4b18a761-5870-4d7c-95ce-0a48eca3fceb'::uuid)
```

In this plan, the index condition, `customer_id = '4b18a761-5870-4d7c-95ce-0a48eca3fceb'`), is evaluated first during the index scan, which is the most efficient stage because it limits how much data is read from storage. The storage filter, `status = 'pending'`, is applied after data is read but before it’s sent to the compute layer, reducing the amount of data transferred. Finally, the query processor filter, `balance > 100`, runs last, after the data has been moved, making it the least efficient. Of these, the index condition provides the greatest performance because it directly controls how much data is scanned.

## Index Scan example
<a name="index-scan-example"></a>

Index Scans are similar to Index Only Scans, except they have the extra step of having to call into the base table. Because Aurora DSQL can specify storage filters, it is able to do so on both the index call as well as the lookup call.

To make this clear, Aurora DSQL presents the plan as two nodes. This way, you can clearly see how much adding an include column will help in terms of rows returned from storage.

```
EXPLAIN SELECT balance FROM account 
WHERE customer_id = '4b18a761-5870-4d7c-95ce-0a48eca3fceb'
AND balance > 100 
AND status = 'pending' 
AND created_at > '2025-01-01';
```

```
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Index Scan using idx1 on account  (cost=728.18..1132.20 rows=3 width=18)
   Filter: (balance > '100'::numeric)
   Index Cond: (customer_id = '4b18a761-5870-4d7c-95ce-0a48eca3fceb'::uuid)
   -> Storage Scan on idx1 (cost=12510.05..17793.38 rows=8 width=16)
        Projections: balance
        Filters: ((status)::text = 'pending'::text)
        -> B-Tree Scan on account (cost=12510.05..17793.38 rows=10 width=30)
            Index Cond: (customer_id = '4b18a761-5870-4d7c-95ce-0a48eca3fceb'::uuid)
   -> Storage Lookup on account (cost=12510.05..17793.38 rows=4 width=16)
        Filters: (created_at > '2025-01-01 00:00:00'::timestamp without time zone)
        -> B-Tree Lookup on transaction (cost=12510.05..17793.38 rows=8 width=30)
```

 This plan shows how filtering happens across multiple stages: 
+  The index condition on `customer_id ` filters data early. 
+ The storage filter on `status` further narrows results before they’re sent to compute. 
+ The query processor filter on `balance` is applied later, after transfer.
+ The lookup filter on `created_at` is evaluated when fetching additional columns from the base table. 

Adding frequently used columns as `INCLUDE` fields can often eliminate this lookup and improve performance. 

## Best Practices
<a name="best-practices"></a>
+ **Align filters with indexed columns** to push filtering earlier.
+ **Use INCLUDE columns** to allow Index-Only Scans and avoid lookups.
+ **Validate row estimates** when investigating performance issues. Aurora DSQL manages statistics automatically by running `ANALYZE` in the background based on data change rates. If estimates appear inaccurate, you can run `ANALYZE` manually to refresh statistics immediately.
+ **Avoid unindexed queries** on large tables to prevent expensive Full Scans.

# Understanding DPUs in EXPLAIN ANALYZE
<a name="understanding-dpus-explain-analyze"></a>

Aurora DSQL provides **statement-level** Distributed Processing Unit (DPU) information in `EXPLAIN ANALYZE VERBOSE` plan output, giving you deeper visibility into query cost during development. This section explains what DPUs are and how to interpret them in the `EXPLAIN ANALYZE VERBOSE` output.

## What is a DPU?
<a name="what-is-dpu"></a>

A Distributed Processing Unit (DPU) is the normalized measure of work done by Aurora DSQL. It is composed of:
+ **ComputeDPU** – Time spent executing SQL queries
+ **ReadDPU** – Resources used to read data from storage
+ **WriteDPU** - Resources used to write data to storage
+ **MultiRegionWriteDPU** – Resources used to replicate writes to peered clusters in multi-Region configurations.

## DPU usage in EXPLAIN ANALYZE VERBOSE
<a name="dpu-usage-explain-analyze"></a>

Aurora DSQL extends `EXPLAIN ANALYZE VERBOSE` to include a statement-level DPU usage estimate to the end of the output. This provides immediate visibility into query cost, helping you identify workload cost drivers, tune query performance, and better forecast resource usage.

The following examples show how to interpret the statement-level DPU estimates included in EXPLAIN ANALYZE VERBOSE output.

### Example 1: SELECT Query
<a name="select-query-example"></a>

```
EXPLAIN ANALYZE VERBOSE SELECT * FROM test_table;
```

```
QUERY PLAN
----------------------------------------------------
Index Only Scan using test_table_pkey on public.test_table  (cost=125100.05..171100.05 rows=1000000 width=36) (actual time=2.973..4.482 rows=120 loops=1)
  Output: id, context
  -> Storage Scan on test_table_pkey (cost=125100.05..171100.05 rows=1000000 width=36) (actual rows=120 loops=1)
      Projections: id, context
      -> B-Tree Scan on test_table_pkey (cost=125100.05..171100.05 rows=1000000 width=36) (actual rows=120 loops=1)
Query Identifier: qymgw1m77maoe
Planning Time: 11.415 ms
Execution Time: 4.528 ms
Statement DPU Estimate:
  Compute: 0.01607 DPU
  Read: 0.04312 DPU
  Write: 0.00000 DPU
  Total: 0.05919 DPU
```

In this example, the SELECT statement performs an index-only scan, so most of the cost comes from Read DPU (0.04312), representing the data retrieved from storage and Compute DPU (0.01607), which reflects the compute resources used to process and return the results. There is no Write DPU since the query doesn't modify data. The total DPU (0.05919) is the sum of Compute \$1 Read \$1 Write.

### Example 2: INSERT Query
<a name="insert-query-example"></a>

```
EXPLAIN ANALYZE VERBOSE INSERT INTO test_table VALUES (1, 'name1'), (2, 'name2'), (3, 'name3');
```

```
QUERY PLAN
----------------------------------------------------
Insert on public.test_table  (cost=0.00..0.04 rows=0 width=0) (actual time=0.055..0.056 rows=0 loops=1)
  ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=122) (actual time=0.003..0.008 rows=3 loops=1)
        Output: "*VALUES*".column1, "*VALUES*".column2
Query Identifier: jtkjkexhjotbo
Planning Time: 0.068 ms
Execution Time: 0.543 ms
Statement DPU Estimate:
  Compute: 0.01550 DPU
  Read: 0.00307 DPU (Transaction minimum: 0.00375)
  Write: 0.01875 DPU (Transaction minimum: 0.05000)
  Total: 0.03732 DPU
```

This statement primarily performs writes, so most of the cost is associated with Write DPU. The Compute DPU (0.01550) represents the work done to process and insert the values. The Read DPU (0.00307) reflects minor system reads (for catalog lookups or index checks).

Notice the Transaction minimums shown next to Read and Write DPUs. These indicate the baseline per-transaction costs that apply *only when the operation includes reads or writes*. They do not mean that every transaction automatically incurs a 0.00375 Read DPU or 0.05 Write DPU charge. Instead, these minimums are applied at the transaction level during cost aggregation and only if reads or writes occur within that transaction. Because of this difference in scope, statement-level estimates in `EXPLAIN ANALYZE VERBOSE` may not exactly match the transaction-level metrics reported in CloudWatch or billing data.

## Using DPU Information for Optimization
<a name="using-dpu-information-optimization"></a>

Per-statement DPU estimates give you a powerful way to optimize queries beyond just execution time. Common use cases include:
+ **Cost Awareness:** Understand how expensive a query is relative to others.
+ **Schema Optimization:** Compare the impact of indexes or schema changes on both performance and resource efficiency.
+ **Budget Planning:** Estimate workload cost based on observed DPU usage.
+ **Query Comparison:** Evaluate alternative query approaches by their relative DPU consumption.

## Interpreting DPU Information
<a name="interpreting-dpu-information"></a>

Keep the following best practices in mind when using DPU data from `EXPLAIN ANALYZE VERBOSE`:
+ **Use it directionally:** Treat the reported DPU as a way to understand the *relative* cost of a query rather than an exact match with CloudWatch metrics or billing data. Differences are expected because `EXPLAIN ANALYZE VERBOSE` reports statement-level cost, while CloudWatch aggregates transaction-level activity. CloudWatch also includes background operations (such as ANALYZE or compactions) and transaction overhead (`BEGIN`/`COMMIT`) that `EXPLAIN ANALYZE VERBOSE` intentionally excludes.
+ **DPU variability across runs is normal** in distributed systems and does not indicate errors. Factors such as caching, execution plan changes, concurrency, or shifts in data distribution can all cause the same query to consume different resources from one run to the next.
+ **Batch small operations:** If your workload issues many small statements, consider batching them into larger operations (not to exceed 10MB). This reduces rounding overhead and produces more meaningful cost estimates.
+ **Use for tuning, not billing:** DPU data in `EXPLAIN ANALYZE VERBOSE` is designed for cost awareness, query tuning, and optimization. It is not a billing-grade metric. Always rely on CloudWatch metrics or monthly billing reports for authoritative cost and usage data.