

# Best practices for using secondary indexes in DynamoDB
<a name="bp-indexes"></a>

Secondary indexes are often essential to support the query patterns that your application requires. At the same time, overusing secondary indexes or using them inefficiently can add cost and reduce performance unnecessarily.

**Contents**
+ [General guidelines for secondary indexes in DynamoDB](bp-indexes-general.md)
  + [Use indexes efficiently](bp-indexes-general.md#bp-indexes-general-efficiency)
  + [Choose projections carefully](bp-indexes-general.md#bp-indexes-general-projections)
  + [Optimize frequent queries to avoid fetches](bp-indexes-general.md#bp-indexes-general-fetches)
  + [Be aware of item-collection size limits when creating local secondary indexes](bp-indexes-general.md#bp-indexes-general-expanding-collections)
+ [Take advantage of sparse indexes](bp-indexes-general-sparse-indexes.md)
  + [Examples of sparse indexes in DynamoDB](bp-indexes-general-sparse-indexes.md#bp-indexes-sparse-examples)
+ [Using Global Secondary Indexes for materialized aggregation queries in DynamoDB](bp-gsi-aggregation.md)
  + [Example scenario and access patterns](bp-gsi-aggregation.md#bp-gsi-aggregation-scenario)
  + [Why pre-compute aggregations](bp-gsi-aggregation.md#bp-gsi-aggregation-why)
  + [Table design](bp-gsi-aggregation.md#bp-gsi-aggregation-table-design)
  + [Aggregation pipeline with Streams and AWS Lambda](bp-gsi-aggregation.md#bp-gsi-aggregation-pipeline)
  + [Sparse GSI design](bp-gsi-aggregation.md#bp-gsi-aggregation-sparse-gsi)
  + [Querying the GSI](bp-gsi-aggregation.md#bp-gsi-aggregation-querying)
  + [Considerations](bp-gsi-aggregation.md#bp-gsi-aggregation-considerations)
+ [Overloading Global Secondary Indexes in DynamoDB](bp-gsi-overloading.md)
+ [Using Global Secondary Index write sharding for selective table queries in DynamoDB](bp-indexes-gsi-sharding.md)
  + [Pattern design](bp-indexes-gsi-sharding.md#bp-indexes-gsi-sharding-pattern-design)
  + [Sharding strategy](bp-indexes-gsi-sharding.md#bp-indexes-gsi-sharding-strategy)
  + [Querying the sharded GSI](bp-indexes-gsi-sharding.md#bp-indexes-gsi-querying-the-sharded-GSI)
  + [Parallel query execution considerations](bp-indexes-gsi-sharding.md#bp-indexes-gsi-parallel-query-execution-considerations)
  + [Code example](bp-indexes-gsi-sharding.md#bp-indexes-gsi-code-example)
+ [Using Global Secondary Indexes to create an eventually consistent replica in DynamoDB](bp-indexes-gsi-replica.md)

# General guidelines for secondary indexes in DynamoDB
<a name="bp-indexes-general"></a>

Amazon DynamoDB supports two types of secondary indexes:
+ **Global secondary index (GSI)— **An index with a partition key and a sort key that can be different from those on the base table. A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions. A global secondary index has no size limitations and has its own provisioned throughput settings for read and write activity that are separate from those of the table.
+ **Local secondary index (LSI)—**An index that has the same partition key as the base table, but a different sort key. A local secondary index is "local" in the sense that every partition of a local secondary index is scoped to a base table partition that has the same partition key value. As a result, the total size of indexed items for any one partition key value can't exceed 10 GB. Also, a local secondary index shares provisioned throughput settings for read and write activity with the table it is indexing.

Each table in DynamoDB can have up to 20 global secondary indexes (default quota) and 5 local secondary indexes. 

Global secondary indexes are often more useful than local secondary indexes. Determining which type of index to use will also depend on your application's requirements. For a comparison of global secondary indexes and local secondary indexes, and more information on how to choose between them, see [Improving data access with secondary indexes in DynamoDB](SecondaryIndexes.md). 

The following are some general principles and design patterns to keep in mind when creating indexes in DynamoDB:

**Topics**
+ [Use indexes efficiently](#bp-indexes-general-efficiency)
+ [Choose projections carefully](#bp-indexes-general-projections)
+ [Optimize frequent queries to avoid fetches](#bp-indexes-general-fetches)
+ [Be aware of item-collection size limits when creating local secondary indexes](#bp-indexes-general-expanding-collections)

## Use indexes efficiently
<a name="bp-indexes-general-efficiency"></a>

**Keep the number of indexes to a minimum.** Don't create secondary indexes on attributes that you don't query often. Indexes that are seldom used contribute to increased storage and I/O costs without improving application performance. 

## Choose projections carefully
<a name="bp-indexes-general-projections"></a>

Because secondary indexes consume storage and provisioned throughput, you should keep the size of the index as small as possible. Also, the smaller the index, the greater the performance advantage compared to querying the full table. If your queries usually return only a small subset of attributes, and the total size of those attributes is much smaller than the whole item, project only the attributes that you regularly request.

If you expect a lot of write activity on a table compared to reads, follow these best practices:
+ Consider projecting fewer attributes to minimize the size of items written to the index. However, this only applies if the size of projected attributes would otherwise be larger than a single write capacity unit (1 KB). For example, if the size of an index entry is only 200 bytes, DynamoDB rounds this up to 1 KB. In other words, as long as the index items are small, you can project more attributes at no extra cost.
+ Avoid projecting attributes that you know will rarely be needed in queries. Every time you update an attribute that is projected in an index, you incur the extra cost of updating the index as well. You can still retrieve non-projected attributes in a `Query` at a higher provisioned throughput cost, but the query cost may be significantly lower than the cost of updating the index frequently.
+ Specify `ALL` only if you want your queries to return the entire table item sorted by a different sort key. Projecting all attributes eliminates the need for table fetches, but in most cases, it doubles your costs for storage and write activity.

Balance the need to keep your indexes as small as possible against the need to keep fetches to a minimum, as explained in the next section.

## Optimize frequent queries to avoid fetches
<a name="bp-indexes-general-fetches"></a>

To get the fastest queries with the lowest possible latency, project all the attributes that you expect those queries to return. In particular, if you query a local secondary index for attributes that are not projected, DynamoDB automatically fetches those attributes from the table, which requires reading the entire item from the table. This introduces latency and additional I/O operations that you can avoid.

Keep in mind that "occasional" queries can often turn into "essential" queries. If there are attributes that you don't intend to project because you anticipate querying them only occasionally, consider whether circumstances might change and you might regret not projecting those attributes after all.

For more information about table fetches, see [Provisioned throughput considerations for Local Secondary Indexes](LSI.md#LSI.ThroughputConsiderations).

## Be aware of item-collection size limits when creating local secondary indexes
<a name="bp-indexes-general-expanding-collections"></a>

An *item collection* is all the items in a table and its local secondary indexes that have the same partition key. No item collection can exceed 10 GB, so it's possible to run out of space for a particular partition key value.

When you add or update a table item, DynamoDB updates all local secondary indexes that are affected. If the indexed attributes are defined in the table, the local secondary indexes grow too.

When you create a local secondary index, think about how much data will be written to it, and how many of those data items will have the same partition key value. If you expect that the sum of table and index items for a particular partition key value might exceed 10 GB, consider whether you should avoid creating the index.

If you can't avoid creating the local secondary index, you must anticipate the item collection size limit and take action before you exceed it. As a best practice, you should utilize the [https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/model/ReturnItemCollectionMetrics.html](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/model/ReturnItemCollectionMetrics.html) parameter when writing items to monitor and alert on item collection sizes that approach the 10GB size limit. Exceeding the maximum item collection size will result in failed write attempts. You can mitigate the item collection size issues by monitoring and alerting on item collection sizes before they impact your application.

**Note**  
Once created, you cannot delete a local secondary index.

For strategies on working within the limit and taking corrective action, see [Item collection size limit](LSI.md#LSI.ItemCollections.SizeLimit).

# Take advantage of sparse indexes
<a name="bp-indexes-general-sparse-indexes"></a>

For any item in a table, DynamoDB writes a corresponding index entry **only if the index key attributes are present in the item**. For a global secondary index, this means the index partition key must be defined on the item, and if the index also has a sort key, that attribute must be present too. If either key attribute is missing from an item, that item does not appear in the index. An index where only a subset of items from the base table appear is called a *sparse* index.

Sparse indexes are useful for queries over a small subsection of a table. For example, suppose that you have a table where you store all your customer orders, with the following key attributes:
+ Partition key: `CustomerId`
+ Sort key: `OrderId`

To track open orders, you can insert an attribute named `isOpen` in order items that have not already shipped. Then when the order ships, you can delete the attribute. If you then create an index on `CustomerId` (partition key) and `isOpen` (sort key), only those orders with `isOpen` defined appear in it. When you have thousands of orders of which only a small number are open, it's faster and less expensive to query that index for open orders than to scan the entire table.

Instead of using a type of attribute like `isOpen`, you could use an attribute with a value that results in a useful sort order in the index. For example, you could use an `OrderOpenDate` attribute set to the date on which each order was placed, and then delete it after the order is fulfilled. That way, when you query the sparse index, the items are returned sorted by the date on which each order was placed.

## Examples of sparse indexes in DynamoDB
<a name="bp-indexes-sparse-examples"></a>

Global secondary indexes are sparse by default. When you create a global secondary index, you specify a partition key and optionally a sort key. Only items in the base table that contain the required key attributes appear in the index. If an item is missing the index partition key—or the sort key, when one is defined—that item is excluded from the index.

By designing a global secondary index to be sparse, you can provision it with lower write throughput than that of the base table, while still achieving excellent performance.

For example, a gaming application might track all scores of every user, but generally only needs to query a few high scores. The following design handles this scenario efficiently:

![\[Sparse GSI example.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/SparseIndex_A.png)


Here, Rick has played three games and achieved `Champ` status in one of them. Padma has played four games and achieved `Champ` status in two of them. Notice that the `Award` attribute is present only in items where the user achieved an award. The associated global secondary index looks like the following:

![\[Sparse GSI example.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/SparseIndex_B.png)


The global secondary index contains only the high scores that are frequently queried, which are a small subset of the items in the base table.

# Using Global Secondary Indexes for materialized aggregation queries in DynamoDB
<a name="bp-gsi-aggregation"></a>

Maintaining near real-time aggregations and key metrics on top of rapidly changing data is becoming increasingly valuable to businesses for making rapid decisions. For example, a music library might want to showcase its most downloaded songs in near-real time, or an e-commerce platform might need to display trending products by category.

Because DynamoDB doesn't natively support aggregation operations like `SUM` or `COUNT` across items, computing these values at read time would require scanning large numbers of items—which may be slow and expensive. Instead, you can *pre-compute* aggregations as data changes and store the results as regular items in your table. This pattern is called *materialized aggregation*.

**Topics**
+ [Example scenario and access patterns](#bp-gsi-aggregation-scenario)
+ [Why pre-compute aggregations](#bp-gsi-aggregation-why)
+ [Table design](#bp-gsi-aggregation-table-design)
+ [Aggregation pipeline with Streams and AWS Lambda](#bp-gsi-aggregation-pipeline)
+ [Sparse GSI design](#bp-gsi-aggregation-sparse-gsi)
+ [Querying the GSI](#bp-gsi-aggregation-querying)
+ [Considerations](#bp-gsi-aggregation-considerations)

## Example scenario and access patterns
<a name="bp-gsi-aggregation-scenario"></a>

Consider a music library application with the following requirements:
+ The application records individual song downloads at high volume (thousands per second).
+ Users need to see the most downloaded songs for a given month with single-digit millisecond latency.
+ The application also needs to support queries like "top 10 songs this month" and "all songs downloaded in a given month."

Computing download counts at read time by scanning all download records may be expensive at this scale. Instead, you can maintain a running count that updates as each download occurs, and store it in a way that supports efficient querying.

## Why pre-compute aggregations
<a name="bp-gsi-aggregation-why"></a>

There are several approaches to computing aggregations. The following table compares common alternatives and explains why materialized aggregation in DynamoDB is often the best fit for this type of use case.


| Approach | Tradeoffs | When to use | 
| --- | --- | --- | 
| Scan and count at read time | Requires reading all download records for every query. Latency grows with data volume and consumes significant read capacity. | Only suitable for very small datasets where latency isn't a concern. | 
| External aggregation store (for example, Amazon ElastiCache) | Adds operational complexity with a separate service to manage. Requires synchronization logic between DynamoDB and the cache. | When you need sub-millisecond reads or complex aggregation logic that goes beyond simple counts. | 
| Application-level aggregation on write | Couples the aggregation logic to the write path. If the application fails after recording the download but before updating the count, the aggregation becomes inconsistent. | When you need synchronous, strongly consistent aggregation and can tolerate added write latency. | 
| Materialized aggregation with Streams and Lambda | Decouples aggregation from the write path. Aggregation is eventually consistent (typically seconds behind). Adds Lambda invocation costs. | When you need near real-time aggregations with low read latency and can tolerate eventual consistency. This is the approach described on this page. | 

The materialized aggregation approach keeps the write path simple (just record the download), offloads the aggregation to an asynchronous process, and stores the result in DynamoDB where it can be queried with single-digit millisecond latency.

## Table design
<a name="bp-gsi-aggregation-table-design"></a>

This design uses a single table with two item types that share the same partition key (`songID`) but use different sort key patterns to distinguish between them:
+ **Download records** – Individual download events. The sort key is the `DownloadID` (a unique identifier for each download).
+ **Monthly aggregation items** – Pre-computed download counts per song per month. The sort key is the month in `YYYY-MM` format (for example, `2018-01`). These items also contain a `DownloadCount` attribute with the running total.

Only the monthly aggregation items contain the `Month` attribute. This distinction is important for the sparse GSI design described later.

The following diagram shows the table layout with both item types:

![\[Music library table layout showing download records and monthly aggregation items sharing the same partition key (songID).\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/AggregationQueries.png)



| Item type | Partition key (songID) | Sort key | Additional attributes | 
| --- | --- | --- | --- | 
| Download record | song1 | download-abc123 | UserID, Timestamp | 
| Monthly aggregation | song1 | 2018-01 | Month=2018-01, DownloadCount=1,746,992 | 

## Aggregation pipeline with Streams and AWS Lambda
<a name="bp-gsi-aggregation-pipeline"></a>

The aggregation pipeline works as follows:

1. When a song is downloaded, the application writes a new item to the table with `Partition-Key=songID` and `Sort-Key=DownloadID`.

1. DynamoDB Streams captures this write as a stream record.

1. A Lambda function, attached to the stream, processes the new record. It identifies the `songID` and the current month, then updates the corresponding monthly aggregation item by incrementing the `DownloadCount` attribute.

1. The updated aggregation item is then available for querying through the sparse GSI.

The Lambda function uses an `UpdateItem` call with an `ADD` expression to atomically increment the download count. This avoids read-modify-write race conditions:

```
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('MusicLibrary')

def handler(event, context):
    for record in event['Records']:
        if record['eventName'] == 'INSERT':
            new_image = record['dynamodb']['NewImage']
            song_id = new_image['songID']['S']
            # Derive the month from the download timestamp
            timestamp = new_image['Timestamp']['S']
            month = timestamp[:7]  # Extract YYYY-MM

            table.update_item(
                Key={
                    'songID': song_id,
                    'SK': month
                },
                UpdateExpression='ADD DownloadCount :inc SET #m = :month',
                ExpressionAttributeNames={
                    '#m': 'Month'
                },
                ExpressionAttributeValues={
                    ':inc': 1,
                    ':month': month
                }
            )
```

**Note**  
If a Lambda execution fails after writing the updated aggregation value, the stream record may be retried. Because the `ADD` operation increments the count each time it runs, a retry would increment the count more than once for the same download, leaving you with an *approximate* value. For most analytics and leaderboard use cases, this small margin of error is acceptable. If you need exact counts, consider adding idempotency logic—for example, by using a condition expression that checks whether the specific `DownloadID` has already been processed.

## Sparse GSI design
<a name="bp-gsi-aggregation-sparse-gsi"></a>

To efficiently query the aggregated results, create a global secondary index with the following key schema:
+ **GSI partition key:** `Month` (String)
+ **GSI sort key:** `DownloadCount` (Number)

This GSI is *sparse* because only the monthly aggregation items contain the `Month` attribute. The individual download records don't have this attribute, so they are automatically excluded from the index. This means the GSI contains only the pre-computed aggregation items—a small fraction of the total items in the table.

A sparse GSI provides two key benefits:
+ **Lower cost** – Because only aggregation items are replicated to the index, you consume far less write capacity and storage compared to an index that includes every item in the table.
+ **Faster queries** – The index contains only the data you need to query, so reads are efficient and return results with single-digit millisecond latency.

For more information about how sparse indexes work, see [Take advantage of sparse indexes](bp-indexes-general-sparse-indexes.md).

## Querying the GSI
<a name="bp-gsi-aggregation-querying"></a>

With the sparse GSI in place, you can efficiently answer several types of queries:

**Get the most downloaded song for a given month:**

```
aws dynamodb query \
    --table-name "MusicLibrary" \
    --index-name "MonthDownloadsIndex" \
    --key-condition-expression "#m = :month" \
    --expression-attribute-names '{"#m": "Month"}' \
    --expression-attribute-values '{":month": {"S": "2018-01"}}' \
    --scan-index-forward false \
    --limit 1
```

Setting `ScanIndexForward` to `false` sorts results by `DownloadCount` in descending order, and `Limit=1` returns only the top song.

**Get the top 10 songs for a given month:**

```
aws dynamodb query \
    --table-name "MusicLibrary" \
    --index-name "MonthDownloadsIndex" \
    --key-condition-expression "#m = :month" \
    --expression-attribute-names '{"#m": "Month"}' \
    --expression-attribute-values '{":month": {"S": "2018-01"}}' \
    --scan-index-forward false \
    --limit 10
```

**Get all songs downloaded in a given month** (sorted by download count):

```
aws dynamodb query \
    --table-name "MusicLibrary" \
    --index-name "MonthDownloadsIndex" \
    --key-condition-expression "#m = :month" \
    --expression-attribute-names '{"#m": "Month"}' \
    --expression-attribute-values '{":month": {"S": "2018-01"}}' \
    --scan-index-forward false
```

## Considerations
<a name="bp-gsi-aggregation-considerations"></a>

Keep the following in mind when implementing this pattern:
+ **Eventual consistency** – The aggregation values are updated asynchronously through DynamoDB Streams and Lambda. There is typically a delay of a few seconds between a download being recorded and the aggregation being updated. This means the GSI reflects near real-time data, not real-time data.
+ **Lambda concurrency** – If your table has a high write volume, multiple Lambda invocations may attempt to update the same aggregation item concurrently. The atomic `ADD` operation handles this safely, but you should monitor Lambda concurrency and throttling metrics to ensure your function can keep up with the stream.
+ **GSI write capacity** – Because the sparse GSI only contains aggregation items, it requires significantly less write capacity than the base table. However, you should still provision enough capacity (or use on-demand mode) to handle the rate of aggregation updates.
+ **Approximate counts** – As noted earlier, Lambda retries can cause counts to be slightly over-counted. For use cases that require exact counts, implement idempotency checks in the Lambda function.

# Overloading Global Secondary Indexes in DynamoDB
<a name="bp-gsi-overloading"></a>

Although Amazon DynamoDB has a default quota of 20 global secondary indexes per table, in practice, you can index across far more than 20 data fields. As opposed to a table in a relational database management system (RDBMS), in which the schema is uniform, a table in DynamoDB can hold many different kinds of data items at one time. In addition, the same attribute in different items can contain entirely different kinds of information.

Consider the following example of a DynamoDB table layout that saves a variety of different kinds of data.

![\[Table schema for GSI Overloading.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/OverloadGSIexample.png)


The `Data` attribute, which is common to all the items, has different content depending on its parent item. If you create a global secondary index for the table that uses the table's sort key as its partition key and the `Data` attribute as its sort key, you can make a variety of different queries using that single global secondary index. These queries might include the following:
+ Look up an employee by name in the global secondary index, using `Employee_Name` as the partition key value and the employee's name (for example `Murphy, John`) as the sort key value.
+ Use the global secondary index to find all employees working in a particular warehouse by searching on a warehouse ID (such as `Warehouse_01`).
+ Get a list of recent hires, querying the global secondary index on `HR_confidential` as a partition key value and using a range of dates in the sort key value.

# Using Global Secondary Index write sharding for selective table queries in DynamoDB
<a name="bp-indexes-gsi-sharding"></a>

When you need to query recent data within a specific time window, DynamoDB's requirement of providing a partition key for most read operations can present a challenge. To address this scenario, you can implement an effective query pattern using a combination of write sharding and a Global Secondary Index (GSI).

This approach allows you to efficiently retrieve and analyze time-sensitive data without performing full table scans, which can be resource-intensive and costly. By strategically designing your table structure and indexing, you can create a flexible solution that supports time-based data retrieval while maintaining optimal performance.

**Topics**
+ [Pattern design](#bp-indexes-gsi-sharding-pattern-design)
+ [Sharding strategy](#bp-indexes-gsi-sharding-strategy)
+ [Querying the sharded GSI](#bp-indexes-gsi-querying-the-sharded-GSI)
+ [Parallel query execution considerations](#bp-indexes-gsi-parallel-query-execution-considerations)
+ [Code example](#bp-indexes-gsi-code-example)

## Pattern design
<a name="bp-indexes-gsi-sharding-pattern-design"></a>

When working with DynamoDB, you can overcome time-based data retrieval challenges by implementing a sophisticated pattern that combines write sharding and Global Secondary Indexes to enable flexible, efficient querying across recent data windows.

**Structure of the table**
+ Partition Key (PK): "Username"

**Structure of the GSI**
+ GSI Partition Key (PK\$1GSI): "ShardNumber\$1"
+ GSI Sort Key (SK\$1GSI): ISO 8601 timestamp (e.g., "2030-04-01T12:00:00Z")

![\[Pattern designs for time-series data.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/BestPractices-44-TimeBoundedTable-2.png)


## Sharding strategy
<a name="bp-indexes-gsi-sharding-strategy"></a>

Assuming you decide to use 10 shards, your shard numbers could range from 0 to 9. When logging an activity, you would calculate the shard number (for example, by using a hash function on the user ID and then taking the modulus of the number of shards) and prepend it to the GSI partition key. This method distributes the entries across different shards, mitigating the risk of hot partitions.

## Querying the sharded GSI
<a name="bp-indexes-gsi-querying-the-sharded-GSI"></a>

Querying across all shards for items within a particular time range in a DynamoDB table, where data is sharded across multiple partition keys, requires a different approach than querying a single partition. Since DynamoDB queries are limited to a single partition key at a time, you can't directly query across multiple shards with a single query operation. However, you can achieve the desired result through application-level logic by performing multiple queries, each targeting a specific shard, and then aggregating the results. The procedure below explains how to do this. 

**To query and aggregate shards**

1. Identify the range of shard numbers used in your sharding strategy. For instance, if you have 10 shards, your shard numbers would range from 0-9.

1. For each shard, construct and execute a query to fetch items within the desired time range. These queries can be executed in parallel to improve efficiency. Use the partition key with the shard number and the sort key with your time range for these queries. Here's an example query for a single shard:

   ```
   aws dynamodb query \
       --table-name "YourTableName" \
       --index-name "YourIndexName" \
       --key-condition-expression "PK_GSI = :pk_val AND SK_GSI BETWEEN :start_date AND :end_date" \
       --expression-attribute-values '{
           ":pk_val": {"S": "ShardNumber#0"},
           ":start_date": {"S": "2024-04-01"},
           ":end_date": {"S": "2024-04-30"}
       }'
   ```  
![\[Query for single shard example.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/BestPractices-44-single-shard-example.png)

   You would replicate this query for each shard, adjusting the partition key accordingly (e.g., "ShardNumber\$11", "ShardNumber\$12", ..., "ShardNumber\$19").

1. Aggregate the results from each query after all queries are complete. Perform this aggregation in your application code, combining the results into a single dataset that represents the items from all shards within your specified time range.

## Parallel query execution considerations
<a name="bp-indexes-gsi-parallel-query-execution-considerations"></a>

Each query consumes read capacity from your table or index. If you're using provisioned throughput, ensure that your table is provisioned with enough capacity to handle the burst of parallel queries. If you're using on-demand capacity, be mindful of the potential cost implications.

## Code example
<a name="bp-indexes-gsi-code-example"></a>

To execute parallel queries across shards in DynamoDB using Python, you can use the boto3 library, which is the Amazon Web Services SDK for Python. This example assumes you have boto3 installed and configured with appropriate AWS credentials.

The following Python code demonstrates how to perform parallel queries across multiple shards for a given time range. It uses concurrent futures to execute queries in parallel, reducing the overall execution time compared to sequential execution.

```
import boto3
from concurrent.futures import ThreadPoolExecutor, as_completed

# Initialize a DynamoDB client
dynamodb = boto3.client('dynamodb')

# Define your table name and the total number of shards
table_name = 'YourTableName'
total_shards = 10  # Example: 10 shards numbered 0 to 9
time_start = "2030-03-15T09:00:00Z"
time_end = "2030-03-15T10:00:00Z"

def query_shard(shard_number):
    """
    Query items in a specific shard for the given time range.
    """
    response = dynamodb.query(
        TableName=table_name,
        IndexName='YourGSIName',  # Replace with your GSI name
        KeyConditionExpression="PK_GSI = :pk_val AND SK_GSI BETWEEN :date_start AND :date_end",
        ExpressionAttributeValues={
            ":pk_val": {"S": f"ShardNumber#{shard_number}"},
            ":date_start": {"S": time_start},
            ":date_end": {"S": time_end},
        }
    )
    return response['Items']

# Use ThreadPoolExecutor to query across shards in parallel
with ThreadPoolExecutor(max_workers=total_shards) as executor:
    # Submit a future for each shard query
    futures = {executor.submit(query_shard, shard_number): shard_number for shard_number in range(total_shards)}
    
    # Collect and aggregate results from all shards
    all_items = []
    for future in as_completed(futures):
        shard_number = futures[future]
        try:
            shard_items = future.result()
            all_items.extend(shard_items)
            print(f"Shard {shard_number} returned {len(shard_items)} items")
        except Exception as exc:
            print(f"Shard {shard_number} generated an exception: {exc}")

# Process the aggregated results (e.g., sorting, filtering) as needed
# For example, simply printing the count of all retrieved items
print(f"Total items retrieved from all shards: {len(all_items)}")
```

Before running this code, make sure to replace `YourTableName` and `YourGSIName` with the actual table and GSI names from your DynamoDB setup. Also, adjust `total_shards`, `time_start`, and `time_end` variables according to your specific requirements.

This script queries each shard for items within the specified time range and aggregates the results.

# Using Global Secondary Indexes to create an eventually consistent replica in DynamoDB
<a name="bp-indexes-gsi-replica"></a>

You can use a global secondary index to create an eventually consistent replica of a table. Creating a replica can allow you to do the following:
+ **Set different provisioned read capacity for different readers.** For example, suppose that you have two applications: One application handles high-priority queries and needs the highest levels of read performance, whereas the other handles low-priority queries that can tolerate throttling of read activity.

  If both of these applications read from the same table, a heavy read load from the low-priority application could consume all the available read capacity for the table. This would throttle the high-priority application's read activity.

  Instead, you can create a replica through a global secondary index whose read capacity you can set separate from that of the table itself. You can then have your low-priority app query the replica instead of the table.
+ **Eliminate reads from a table entirely.** For example, you might have an application that captures a high volume of clickstream activity from a website, and you don't want to risk having reads interfere with that. You can isolate this table and prevent reads by other applications (see [Using IAM policy conditions for fine-grained access control](specifying-conditions.md)), while letting other applications read a replica created using a global secondary index.

To create a replica, set up a global secondary index that has the same key schema as the parent table, with some or all of the non-key attributes projected into it. In applications, you can direct some or all read activity to this global secondary index rather than to the parent table. You can then adjust the provisioned read capacity of the global secondary index to handle those reads without changing the parent table's provisioned read capacity.

There is always a short propagation delay between a write to the parent table and the time when the written data appears in the index. In other words, your applications should take into account that the global secondary index replica is only *eventually consistent* with the parent table.

You can create multiple global secondary index replicas to support different read patterns. When you create the replicas, project only the attributes that each read pattern actually requires. An application can then consume less provisioned read capacity to obtain only the data it needs rather than having to read the item from the parent table. This optimization can result in significant cost savings over time.