

# Best practices for designing and using partition keys effectively in DynamoDB
<a name="bp-partition-key-design"></a>

The primary key that uniquely identifies each item in an Amazon DynamoDB table can be simple (a partition key only) or composite (a partition key combined with a sort key). 

You should design your application for uniform activity across all partition keys in the table and its secondary indexes. You can determine the access patterns that your application requires, and read and write units that each table and secondary index requires.

**Note**  
Adaptive capacity applies to on-demand mode and provisioned capacity.

Every partition in a DynamoDB table is designed to deliver a maximum capacity of 3,000 read units per second and 1,000 write units per second. One read unit represents one strongly consistent read operation per second, or two eventually consistent read operations per second, for an item up to 4 KB in size. One write unit represents one write operation per second for an item up to 1 KB in size.

You must factor in the item size when evaluating the partition throughput limits for your table. For example, if the table has an item size of 20 KB, a single consistent read operation will consume 5 read units. This means you can concurrently drive 600 consistent read operations per second on that single item before reaching the partition limits. The total throughput across all partitions in the table can be constrained by the provisioned throughput in provisioned mode, or by the table level throughput limit in on-demand mode. See [Service Quotas](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ServiceQuotas.html) for more information.

**Topics**
+ [Designing partition keys to distribute your workload in DynamoDB](bp-partition-key-uniform-load.md)
+ [Using write sharding to distribute workloads evenly in your DynamoDB table](bp-partition-key-sharding.md)
+ [Distributing write activity efficiently during data upload in DynamoDB](bp-partition-key-data-upload.md)

# Designing partition keys to distribute your workload in DynamoDB
<a name="bp-partition-key-uniform-load"></a>

The partition key portion of a table's primary key determines the logical partitions in which a table's data is stored. This in turn affects the underlying physical partitions. A partition key design that doesn't distribute I/O requests effectively can create "hot" partitions that result in throttling and use your provisioned I/O capacity inefficiently.

The optimal usage of a table's provisioned throughput depends not only on the workload patterns of individual items, but also on the partition key design. This doesn't mean that you must access all partition key values to achieve an efficient throughput level, or even that the percentage of accessed partition key values must be high. It does mean that the more distinct partition key values that your workload accesses, the more those requests will be spread across the partitioned space. In general, you'll use your provisioned throughput more efficiently as the ratio of partition key values accessed to the total number of partition key values increases.

The following is a comparison of the provisioned throughput efficiency of some common partition key schemas.


****  

| Partition key value | Uniformity | 
| --- | --- | 
| User ID, where the application has many users. | Good | 
| Status code, where there are only a few possible status codes. | Bad | 
| Item creation date, rounded to the nearest time period (for example, day, hour, or minute). | Bad | 
| Device ID, where each device accesses data at relatively similar intervals. | Good | 
| Device ID, where even if there are many devices being tracked, one is by far more popular than all the others. | Bad | 

If a single table has only a small number of partition key values, consider distributing your write operations across more distinct partition key values. In other words, structure the primary key elements to avoid one "hot" (heavily requested) partition key value that slows overall performance.

For example, consider a table with a composite primary key. The partition key represents the item's creation date, rounded to the nearest day. The sort key is an item identifier. On a given day, say `2014-07-09`, **all** of the new items are written to that single partition key value (and corresponding physical partition). 

If the table fits entirely into a single partition (considering growth of your data over time), and if your application's read and write throughput requirements don't exceed the read and write capabilities of a single partition, your application won't encounter any unexpected throttling as a result of partitioning.

To use NoSQL Workbench for DynamoDB to help visualize your partition key design, see [Building data models with NoSQL Workbench](workbench.Modeler.md). 

# Using write sharding to distribute workloads evenly in your DynamoDB table
<a name="bp-partition-key-sharding"></a>

One way to better distribute writes across a partition key space in Amazon DynamoDB is to expand the space. You can do this in several different ways. You can add a random number to the partition key values to distribute the items among partitions. Or you can use a number that is calculated based on something that you're querying on.

## Sharding using random suffixes
<a name="bp-partition-key-sharding-random"></a>

One strategy for distributing loads more evenly across a partition key space is to add a random number to the end of the partition key values. Then you randomize the writes across the larger space.

For example, for a partition key that represents today's date, you might choose a random number between `1` and `200` and concatenate it as a suffix to the date. This yields partition key values like `2014-07-09.1`, `2014-07-09.2`, and so on, through `2014-07-09.200`. Because you are randomizing the partition key, the writes to the table on each day are spread evenly across multiple partitions. This results in better parallelism and higher overall throughput.

However, to read all the items for a given day, you would have to query the items for all the suffixes and then merge the results. For example, you would first issue a `Query` request for the partition key value `2014-07-09.1`. Then issue another `Query` for `2014-07-09.2`, and so on, through `2014-07-09.200`. Finally, your application would have to merge the results from all those `Query` requests.

## Sharding using calculated suffixes
<a name="bp-partition-key-sharding-calculated"></a>

A randomizing strategy can greatly improve write throughput. But it's difficult to read a specific item because you don't know which suffix value was used when writing the item. To make it easier to read individual items, you can use a different strategy. Instead of using a random number to distribute the items among partitions, use a number that you can calculate based upon something that you want to query on.

Consider the previous example, in which a table uses today's date in the partition key. Now suppose that each item has an accessible `OrderId` attribute, and that you most often need to find items by order ID in addition to date. Before your application writes the item to the table, it could calculate a hash suffix based on the order ID and append it to the partition key date. The calculation might generate a number between 1 and 200 that is fairly evenly distributed, similar to what the random strategy produces.

A simple calculation would likely suffice, such as the product of the UTF-8 code point values for the characters in the order ID, modulo 200, \$1 1. The partition key value would then be the date concatenated with the calculation result.

With this strategy, the writes are spread evenly across the partition key values, and thus across the physical partitions. You can easily perform a `GetItem` operation for a particular item and date because you can calculate the partition key value for a specific `OrderId` value.

To read all the items for a given day, you still must `Query` each of the `2014-07-09.N` keys (where `N` is 1–200), and your application then has to merge all the results. The benefit is that you avoid having a single "hot" partition key value taking all of the workload.

**Note**  
For a more efficient strategy specifically designed to handle high-volume time series data, see [Time series data](bp-time-series.md).

# Distributing write activity efficiently during data upload in DynamoDB
<a name="bp-partition-key-data-upload"></a>

Typically, when you load data from other data sources, Amazon DynamoDB partitions your table data on multiple servers. You get better performance if you upload data to all the allocated servers simultaneously.

For example, suppose that you want to upload user messages to a DynamoDB table that uses a composite primary key with `UserID` as the partition key and `MessageID` as the sort key.

When you upload the data, one approach you can take is to upload all message items for each user, one user after another:


****  

| UserID | MessageID | 
| --- | --- | 
| U1 | 1 | 
| U1 | 2 | 
| U1 | ... | 
| U1 | ... up to 100 | 
| U2 | 1 | 
| U2 | 2 | 
| U2 | ... | 
| U2 | ... up to 200 | 

The problem in this case is that you are not distributing your write requests to DynamoDB across your partition key values. You are taking one partition key value at a time and uploading all of its items before going to the next partition key value and doing the same.

Behind the scenes, DynamoDB is partitioning the data in your table across multiple servers. To fully use all the throughput capacity that is provisioned for the table, you must distribute your workload across your partition key values. By directing an uneven amount of upload work toward items that all have the same partition key value, you are not fully using all the resources that DynamoDB has provisioned for your table.

You can distribute your upload work by using the sort key to load one item from each partition key value, then another item from each partition key value, and so on: 


****  

| UserID | MessageID | 
| --- | --- | 
| U1 | 1 | 
| U2 | 1 | 
| U3 | 1 | 
| ... | ... | 
| U1 | 2 | 
| U2 | 2 | 
| U3 | 2 | 
| ... | ... | 

Every upload in this sequence uses a different partition key value, keeping more DynamoDB servers busy simultaneously and improving your throughput performance.