# Preparation checklist for DynamoDB global tables
Preparation checklist

Use the following checklist for decisions and tasks when you deploy global tables.
+ Determine if your use case benefits more from an MRSC or MREC consistency mode. Do you need strong consistency, even with the higher latency and other tradeoffs?
+ Determine how many and which Regions should participate in the global table. If you plan to use MRSC, decide if you want the third Region to be a replica or a witness.
+ Determine your application’s write mode. This is not the same as the consistency mode. For more information, see [Write modes with DynamoDB global tables](bp-global-table-design.prescriptive-guidance.writemodes.md).
+ Plan your [Routing strategies in DynamoDB](bp-global-table-design.prescriptive-guidance.request-routing.md) strategy, based on your write mode.
+ Define your [ Evacuation processes  Evacuating a Region with global tables   Evacuating a Region is the process of migrating activity, usually read and write activity or read activity, away from that Region.  Evacuating a live RegionLive Regions  Evacuating a live Region   You might decide to evacuate a live Region for a number of reasons: as part of usual business activity (for example, if you’re using a follow-the-sun, write to one Region mode), due to a business decision to change the currently active Region, in response to failures in the software stack outside DynamoDB, or because you’re encountering general issues such as higher than usual latencies within the Region. With *write to any Region* mode, evacuating a live Region is straightforward. You can route traffic to alternative Regions by using any routing system and let the write operations in the evacuated Region replicate over as usual. The write to one Region and write to your Region modes are usually used with MREC tables. Therefore, you must make sure that all write operations to the active Region have been fully recorded, stream processed, and globally propagated before starting write operations in the new active Region, to ensure that future write operations are processed against the latest version of the data. Let’s say that Region A is active and Region B is passive (either for the full table or for items that are homed in Region A). The typical mechanism to perform an evacuation is to pause write operations to A, wait long enough for those operations to have fully propagated to B, update the architecture stack to recognize B as active, and then resume write operations to B. There is no metric to indicate with absolute certainty that Region A has fully replicated its data to Region B. If Region A is healthy, pausing write operations to Region A and waiting 10 times the recent maximum value of the `ReplicationLatency` metric would typically be sufficient to determine that replication is complete. If Region A is unhealthy and shows other areas of increased latencies, you would choose a larger multiple for the wait time.   Evacuating an offline RegionOffline Regions  Evacuating an offline Region   There’s a special case to consider: What if Region A goes fully offline without notice? This is extremely unlikely but should be considered nevertheless.  

Evacuating an offline MRSC table  
If this happens with an MRSC table, there is nothing special you need to do. MRSC tables support a recovery point objective (RPO) of zero. All successful write operations made to the MRSC table in the offline Region will be available in all other Region tables, so there's no potential gap in data even if the Region goes fully offline without notice. Business can continue using replicas located in the other Regions. 

Evacuating an offline MREC table  
If this happens with an MREC table, any write operations in Region A that were not yet propagated are held and propagated after Region A comes back online. The write operations aren’t lost, but their propagation is indefinitely delayed.  
How to proceed in this event is the application’s decision. For business continuity, write operations might need to proceed to the new primary Region B. However, if an item in Region B receives an update while there is a pending propagation of a write operation for that item from Region A, the propagation is suppressed under the *last writer wins* model. Any update in Region B might suppress an incoming write request.  
With the *write to any Region* mode, read and write operations can continue in Region B, trusting that the items in Region A will propagate to Region B eventually and recognizing the potential for missing items until Region A comes back online. When possible, such as with idempotent write operations, you should consider replaying recent write traffic (for example, by using an upstream event source) to fill in the gap of any potentially missing write operations and let the last writer wins conflict resolution suppress the eventual propagation of the incoming write operation.  
With the other write modes, you have to consider the degree to which work can continue with a slightly out-of-date view of the world. Some small duration of write operations, as tracked by `ReplicationLatency`, will be missing until Region A comes back online. Can business move forward? In some use cases it can, but in others it might not without additional mitigation mechanisms.  
For example, imagine that you have to maintain an available credit balance without interruption even after a full outage of a Region. You could split the balance into two different items, one homed in Region A and one in Region B, and start each with half the available balance. This would use the *write to your Region* mode. Transactional updates processed in each Region would write against the local copy of the balance. If Region A goes fully offline, work could still proceed with transaction processing in Region B, and write operations would be limited to the balance portion held in Region B. Splitting the balance like this introduces complexities when the balance gets low or the credit has to be rebalanced, but it does provide one example of safe business recovery even with uncertain pending write operations.  
As another example, imagine that you’re capturing web form data. You can use [Optimistic Concurrency Control (OCC)](DynamoDBMapper.OptimisticLocking.md) (OCC) to assign versions to data items and embed the latest version into the web form as a hidden field. On each submit, the write operation succeeds only if the version in the database still matches the version that the form was built against. If the versions don’t match, the web form can be refreshed (or carefully merged) based on the current version in the database, and the user can proceed again. The OCC model usually protects against another client overwriting and producing a new version of the data, but it can also help during failover where a client might encounter older versions of data. Let’s imagine that you’re using the timestamp as the version. The form was first built against Region A at 12:00 but (after failover) tries to write to Region B and notices that the latest version in the database is 11:59. In this scenario, the client can either wait for the 12:00 version to propagate to Region B and then write on top of that version, or build on 11:59 and create a new 12:01 version (which, after writing, would suppress the incoming version after Region A recovers).  
As a third example, a financial services company holds data about customer accounts and their financial transactions in a DynamoDB database. In the event of a complete Region A outage, they want to make sure that any write activity related to their accounts is either fully available in Region B, or they want to quarantine their accounts as known partial until Region A comes back online. Instead of pausing all business, they decided to pause business only to the tiny fraction of accounts that they determined had unpropagated transactions. To achieve this, they used a third Region, which we will call Region C. Before they processed any write operations in Region A, they placed a succinct summary of those pending operations (for example, a new transaction count for an account) in Region C. This summary was sufficient for Region B to determine if its view was fully up to date. This action effectively locked the account from the time of writing in Region C until Region A accepted the write operations and Region B received them. The data in Region C wasn’t used except as part of a failover process, after which Region B could cross-check its data with Region C to check if any of its accounts were out of date. Those accounts would be marked as quarantined until the Region A recovery propagated the partial data to Region B. If Region C were to fail, a new Region D could be spun up for use instead. The data in Region C was very transient, and after a few minutes Region D would have a sufficiently up-to-date record of the in-flight write operations to be fully useful. If Region B were to fail, Region A could continue accepting write requests in cooperation with Region C. This company was willing to accept higher latency writes (to two Regions: C and then A) and was fortunate to have a data model where the state of an account could be succinctly summarized.   ](bp-global-table-design.prescriptive-guidance.evacuation.md#bp-global-table-design.prescriptive-guidance.evacuation.title) [Evacuation processes](bp-global-table-design.prescriptive-guidance.evacuation.md), based on your consistency mode, write mode, and routing strategy.
+ Capture metrics on the health, latency, and errors across each Region. For a list of DynamoDB metrics, see the AWS blog post [Monitoring Amazon DynamoDB for Operational Awareness](https://aws.amazon.com/blogs/database/monitoring-amazon-dynamodb-for-operational-awareness/) for a list of metrics to observe. You should also use [synthetic canaries](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html) (artificial requests designed to detect failures, named after the canary in the coal mine), as well as live observation of customer traffic. Not all issues will appear in the DynamoDB metrics.
+ If you're using MREC, set alarms for any sustained increase in `ReplicationLatency`. An increase might indicate an accidental misconfiguration in which the global table has different write settings in different Regions, which leads to failed replicated requests and increased latencies. It could also indicate that there is a Regional disruption. A [good example](https://aws.amazon.com/blogs/database/monitoring-amazon-dynamodb-for-operational-awareness/) is to generate an alert if the recent average exceeds 180,000 milliseconds. You might also watch for `ReplicationLatency` dropping to 0, which indicates stalled replication.
+ Assign sufficient maximum read and write settings for each global table.
+ Identify the reasons for evacuating a Region in advance. If the decision involves human judgment, document all considerations. This work should be done carefully in advance, not under stress.
+ Maintain a runbook for every action that must take place when you evacuate a Region. Usually very little work is involved for the global tables, but moving the rest of the stack might be complex. 
**Note**  
With failover procedures, it's best practice to rely only on data plane operations and not on control plane operations, because some control plane operations could be degraded during Region failures.

   For more information, see the AWS blog post [ Build resilient applications with Amazon DynamoDB global tables: Part 4](https://aws.amazon.com/blogs/database/part-4-build-resilient-applications-with-amazon-dynamodb-global-tables/).
+ Test all aspects of the runbook periodically, including Region evacuations. An untested runbook is an unreliable runbook.
+ Consider using [AWS Resilience Hub](https://docs.aws.amazon.com/resilience-hub/latest/userguide/what-is.html) to evaluate the resilience of your entire application (including global tables). It provides a comprehensive view of your overall application portfolio resilience status through its dashboard.
+ Consider using ARC readiness checks to evaluate the current configuration of your application and track any deviances from best practices.
+ When you write health checks for use with Route 53 or Global Accelerator, make a set of calls that cover the full database flow. If you limit your check to confirm only that the DynamoDB endpoint is up, you won’t be able to cover many failure modes such as AWS Identity and Access Management (IAM) configuration errors, code deployment problems, failure in the stack outside DynamoDB, higher than average read or write latencies, and so on.

## Frequently Asked Questions (FAQ) for deploying global tables
FAQ

**What is the pricing for global tables?**
+ A write operation in a traditional DynamoDB table is priced in write capacity units (WCUs, for provisioned tables) or write request units (WRUs) for on-demand tables. If you write a 5 KB item, it incurs a charge of 5 units. A write to a global table is priced in replicated write capacity units (rWCUs, for provisioned tables) or replicated write request units (rWRUs, for on-demand tables). rWCUs and rWRUs are priced the same as WGUs and WRUs.
+ rWCU and rWRU changes are incurred in every Region where the item is written directly or written through replication. Cross-Region data transfer fees apply.
+ Writing to a global secondary index (GSI) is considered a local write operation and uses regular write units.
+ There is no reserved capacity available for rWCUs or rWRUs at this time. Purchasing reserved capacity for WCUs can be beneficial for tables where GSIs consume write units.
+ When you add a new Region to a global table, DynamoDB bootstraps the new Region automatically and charges you as if it were a table restore, based on the GB size of the table. It also charges cross-Region data transfer fees.

**Which Regions does global tables support?**

[Global Tables version 2019.11.21 (Current)](GlobalTables.md) supports all AWS Regions for MREC tables and the following Region sets for MRSC tables:
+ US Region set: US East (N.Virginia), US East (Ohio), US West (Oregon)
+ EU Region set: Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfort)
+ AP Region set: Asia Pacific (Tokyo), Asia Pacific (Seoul), and Asia Pacific (Osaka)

**How are GSIs handled with global tables?**

In [Global Tables version 2019.11.21 (Current)](GlobalTables.md), when you create a GSI in one Region it’s automatically created in other participating Regions and automatically backfilled. 

**How do I stop replication of a global table?** 
+ You can delete a replica table the same way you would delete any other table. Deleting the global table stops replication to that Region and deletes the table copy kept in that Region. However, you can't stop replication while keeping copies of the table as independent entities, nor can you pause replication.
+ An MRSC table must be deployed in exactly three Regions. To delete the replicas you must delete all the replicas and the witness so that the MRSC table becomes a local table.

**How do DynamoDB Streams interact with global tables?**
+ Each global table produces an independent stream based on all its write operations, wherever they started from. You can choose to consume the DynamoDB stream in one Region or in all Regions (independently). If you want to process local but not replicated write operations, you can add your own Region attribute to each item to identify the writing Region. You can then use a Lambda event filter to call the Lambda function only for write operations in the local Region. This helps with insert and update operations, but not delete operations.
+ Global tables that are configured for multi-Region eventual consistency (MREC tables) replicate changes by reading those changes from a DynamoDB stream on a replica table and applying that change to all other replica tables. Therefore, DynamoDB is enabled by default on all replicas in an MREC global table and cannot be disabled on those replicas. The MREC replication process can combine multiple changes in a short period of time into a single replicated write operation. As a result, each replica's stream might contain slightly different records. DynamoDB Streams records on MREC replicas are always ordered on a per-item basis, but ordering between items might differ between replicas.
+ Global tables that are configured for multi-Region strong consistency (MRSC tables) don’t use DynamoDB Streams for replication, so this feature isn’t enabled by default on MRSC replicas. You can enable DynamoDB Streams on an MRSC replica. DynamoDB Streams records on MRSC replicas are identical for every replica and are always ordered on a per-item basis, but ordering between items might differ between replicas.

**How do global tables handle transactions?** 
+ Transactional operations on MRSC tables will generate errors.
+ Transactional operations on MREC tables provide atomicity, consistency, isolation, and durability (ACID) guarantees only within the Region where the write operation originally occurred. Transactions are not supported across Regions in global tables. For example, if you have an MREC global table with replicas in the US East (Ohio) and US West (Oregon) Regions and perform a `TransactWriteItems` operation in the US East (Ohio) Region, you might observe partially completed transactions in the US West (Oregon) Region as changes are replicated. Changes are replicated to other Regions only after they have been committed in the source Region.

** How do global tables interact with the DynamoDB Accelerator cache (DAX)?**

Global tables bypass DAX by updating DynamoDB directly, so DAX isn’t aware that it’s holding stale data. The DAX cache is refreshed only when the cache’s TTL expires.

**Do tags on tables propagate?**

No, tags do not automatically propagate.

**Should I backup tables in all Regions or just one?**

The answer depends on the purpose of the backup.
+ If you want to ensure data durability, DynamoDB already provides that safeguard. The service ensures durability.
+ If you want to keep a snapshot for historical records (for example, to meet regulatory requirements), backing up in one Region should suffice. You can copy the backup to additional Regions by using AWS Backup.
+ If you want to recover erroneously deleted or modified data, use [DynamoDB point-in-time recovery (PITR)](PointInTimeRecovery_Howitworks.md) in one Region.

**How do I deploy global tables using CloudFormation?**
+ CloudFormation represents a DynamoDB table and a global table as two separate resources: `AWS::DynamoDB::Table` and `AWS::DynamoDB::GlobalTable`. One approach is to create all tables that can potentially be global by using the `GlobalTable` construct of keeping them as standalone tables initially, and add Regions later if necessary. 
+ In CloudFormation, each global table is controlled by a single stack, in a single Region, regardless of the number of replicas. When you deploy your template, CloudFormation creates and updates all replicas as part of a single stack operation. You should not deploy the same [AWS::DynamoDB::GlobalTable](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-dynamodb-globaltable.html) resource in multiple Regions. This will result in errors and is unsupported. If you deploy your application template in multiple Regions, you can use conditions to create the `AWS::DynamoDB::GlobalTable` resource in a single Region. Alternatively, you can choose to define your `AWS::DynamoDB::GlobalTable` resources in a stack that’s separate from your application stack, and make sure that it’s deployed to a single Region. 
+ If you have a regular table and you want to convert it to a global table while keeping it managed by CloudFormation then set the deletion policy to `Retain`, remove the table from the stack, convert the table to a global table in the console, and then import the global table as a new resource to the stack. For more information, see the [AWS GitHub repository](https://github.com/aws-samples/amazon-dynamodb-table-to-global-table-cdk).
+ Cross-account replication is not supported at this time.