Cache read behavior - AWS Prescriptive Guidance

Cache read behavior

The shim should cache only eventually consistent read calls made to DynamoDB. This includes get_item, batch_get_item, query, and scan. It should not cache strongly consistent reads calls or transactional read calls, because those calls inherently don't want to see a cached version of the data.

The cache entries must understand the signature of the request to identify follow-on, equivalent requests. The signature of each call consists of all the parameters of the request that affect the result. For a get_item call, the signature includes the TableName, Key, ProjectionExpression, and ExpressionAttributeNames parameters. For a query call, it includes the TableName, IndexName, KeyConditions, FilterExpression, ScanIndexForward, and Limit parameters. If two calls to the same function have the same signature, that's a possible cache hit.

Here is a sample logic flow for a get_item call:

  • Check if ConsistentRead=true. If so, directly call the database and return the result. Strongly consistent calls shouldn't use a cache.

  • Calculate the signature of the call. Hash together the TableName, Key, ProjectionExpression, and ExpressionAttributeNames parameters to get a single string signature value.

  • See if a cache entry exists with this signature key. If so, it's a cache hit, so return it.

  • If not, pass the call to the database, retrieve the result, populate the cache entry for this signature, and return the result. When you store the item, specify a time to live (TTL) expiry time.

For example, assume that you have this code:

cache_client.get_item( TableName='test', Key={ 'PK': { 'S': '123' } }, ProjectionExpression='#attr1, #attr2', ExpressionAttributeNames={ '#attr1': 'agent', '#attr2': 'count' }, ConsistentRead=False )

Inside cache_client, the code calculates the hash signature of the call. The signature is derived from hashing the concatenation of the TableName, Key, ProjectionExpression, and ExpressionAttributeNames parameters. Any hash system can be used as long as it's deterministic and produces a single string value. In this case, let's assume that it hashes down to 0xad4c812a. This hash identifies this set of parameters.

What if another call is made that's the same, except that #attr1 is renamed #attribute1? Should that call be considered to have the same signature? Ideally yes, because it's semantically identical, but the overhead of figuring out semantic equivalence on each and every call just isn't practical. It's much faster to hash parameter values blindly and require exact matches. The rule is: Calls are eligible for a cache hit if they're actually the same call, but not if they're basically the same call.

Inside cache_client, the code then looks to ElastiCache for an entry in the item cache that's stored under 0xad4c812a. If the entry exists, that's a cache hit. If not, the value is fetched from the database and stored in ElastiCache for a later cache hit.

Here's what the cache looks like for three get_item calls that have three different sets of table, key, and projection parameters.

Pseudocode

ElastiCache key calculation

ElastiCache value

get_item(t1, k1, p1)

hash('get', t1, k1, p1) = 0xad4c812a

{ 'Item': … }

get_item(t1, k1, p2)

hash('get', t1, k2, p2) = 0x045deaab

{ 'Item': … }

get_item(t1, k2, p1)

hash('get', t1, k2, p1) = 0x9cda78af

{ 'Item': … }

Other calls, such as query and scan, work the same way, but different parameters make up their signatures.

This design might remind you of how an HTTP caching proxy works. If it sees the same request again, it can return the response from the earlier request. The definition of same request in HTTP is based mostly on the query string. With DynamoDB, the call type and the set of parameters that are passed to the function influence its results.

There's one more step. It's important for item caching to keep a mapping between each item's primary key and the list of hashes actively used to cache that item. This will come into play during write operations, as described in the next section. So in addition to the previous keys and values, there will be extra cache entries such as these:

Operation

ElastiCache key calculation

ElastiCache value

Track list of entries for table t1, key k1

hash('list', t1, k1)

( 0xad4c812a, 0x045deaab )

Track list of entries for table t1, key k2

hash('list', t1, k2)

( 0x9cda78af )