# PERF 4  How do you select your database solution?


 The optimal database solution for a system varies based on requirements for availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Many systems use different database solutions for various subsystems and enable different features to improve performance. Selecting the wrong database solution and features for a system can lead to lower performance efficiency. 

**Topics**
+ [

# PERF04-BP01 Understand data characteristics
](perf_right_database_solution_understand_char.md)
+ [

# PERF04-BP02 Evaluate the available options
](perf_right_database_solution_evaluate_options.md)
+ [

# PERF04-BP03 Collect and record database performance metrics
](perf_right_database_solution_collect_metrics.md)
+ [

# PERF04-BP04 Choose data storage based on access patterns
](perf_right_database_solution_access_patterns.md)
+ [

# PERF04-BP05 Optimize data storage based on access patterns and metrics
](perf_right_database_solution_optimize_metrics.md)

# PERF04-BP01 Understand data characteristics
PERF04-BP01 Understand data characteristics

 Choose your data management solutions to optimally match the characteristics, access patterns, and requirements of your workload datasets. When selecting and implementing a data management solution, you must ensure that the querying, scaling, and storage characteristics support the workload data requirements. Learn how various database options match your data models, and which configuration options are best for your use-case.  

 AWS provides numerous database engines including relational, key-value, document, in-memory, graph, time series, and ledger databases. Each data management solution has options and configurations available to you to support your use-cases and data models. Your workload might be able to use several different database solutions, based on the data characteristics. By selecting the best database solutions to a specific problem, you can break away from monolithic databases, with the one-size-fits-all approach that is restrictive and focus on managing data to meet your customer's need. 

 **Desired outcome:** The workload data characteristics are documented with enough detail to facilitate selection and configuration of supporting database solutions, and provide insight into potential alternatives. 

 **Common anti-patterns:** 
+  Not considering ways to segment large datasets into smaller collections of data that have similar characteristics, resulting in missing opportunities to use more purpose-built databases that better match data and growth characteristics. 
+  Not identifying the data access patterns up front, which leads to costly and complex rework later. 
+  Limiting growth by using data storage strategies that don’t scale as quickly as is needed 
+  Choosing one database type and vendor for all workloads. 
+  Sticking to one database solution because there is internal experience and knowledge of one particular type of database solution. 
+  Keeping a database solution because it worked well in an on-premises environment. 

 **Benefits of establishing this best practice:** Be familiar with all of the AWS database solutions so that you can determine the correct database solution for your various workloads. After you select the appropriate database solution for your workload, you can quickly experiment on each of those database offerings to determine if they continue to meet your workload needs. 

 **Level of risk exposed if this best practice is not established:** High 
+  Potential cost savings may not be identified. 
+  Data may not be secured to the level required. 
+  Data access and storage performance may not be optimal. 

## Implementation guidance
Implementation guidance

 Define the data characteristics and access patterns of your workload. Review all available database solutions to identify which solution supports your data requirements. Within a given workload, multiple databases may be selected. Evaluate each service or group of services and assess them individually. If potential alternative data management solutions are identified for part or all of the data, experiment with alternative implementations that might unlock cost, security, performance, and reliability benefits. Update existing documentation, should a new data management approach be adopted. 


|  **Type**  |  **AWS Services**  |  **Key Characteristics**  |  **Common use-cases**  | 
| --- | --- | --- | --- | 
|  Relational  |  Amazon RDS, Amazon Aurora  |  Referential integrity, ACID transactions, schema on write  |  ERP, CRM, Commercial off-the-shelf software  | 
|  Key Value  |  Amazon DynamoDB  |  High throughput, low latency, near-infinite scalability  |  Shopping carts (ecommerce), product catalogs, chat applications  | 
|  Document  |  Amazon DocumentDB  |  Store JSON documents and query on any attribute  |  Content Management (CMS), customer profiles, mobile applications  | 
|  In Memory  |  Amazon ElastiCache, Amazon MemoryDB  |  Microsecond latency  |  Caching, game leaderboards  | 
|  Graph  |  Amazon Neptune  |  Highly relational data where the relationships between data have meaning  |  Social networks, personalization engines, fraud detection  | 
|  Time Series  |  Amazon Timestream  |  Data where the primary dimension is time  |  DevOps, IoT, Monitoring  | 
|  Wide column  |  Amazon Keyspaces  |  Cassandra workloads.  |  Industrial equipment maintenance, route optimization  | 
|  Ledger  |  Amazon QLDB  |  Immutable and cryptographically verifiable ledger of changes  |  Systems of record, healthcare, supply chains, financial institutions  | 

 **Implementation steps** 

1.  How is the data structured? (for example, unstructured, key-value, semi-structured, relational) 

   1.  If the data is unstructured, consider an object-store such as [Amazon S3](https://aws.amazon.com/products/storage/data-lake-storage/) or a NoSQL database such as [Amazon DocumentDB.](https://aws.amazon.com/documentdb/) 

   1.  For key-value data, consider [DynamoDB](https://aws.amazon.com/documentdb/), [ElastiCache for Redis](https://aws.amazon.com/elasticache/redis/) or [MemoryDB.](https://aws.amazon.com/memorydb/) 

   1.  If the data has a relational structure, what level of referential integrity is required? 

      1.  For foreign key constraints, relational databases such as [Amazon RDS](https://aws.amazon.com/rds/) and [Aurora](https://aws.amazon.com/rds/aurora/) can provide this level of integrity. 

      1.  Typically, within a NoSQL data-model, you would de-normalize your data into a single document or collection of documents to be retrieved in a single request rather than joining across documents or tables.  

1.  Is ACID (atomicity, consistency, isolation, durability) compliance required? 

   1.  If the ACID properties associated with relational databases are required, consider a relational database such as [Amazon RDS](https://aws.amazon.com/rds/) and [Aurora.](https://aws.amazon.com/rds/aurora/) 

1.  What consistency model is required? 

   1.  If your application can tolerate eventual consistency, consider a NoSQL implementation. Review the other characteristics to help choose which [NoSQL database](https://aws.amazon.com/nosql/) is most appropriate. 

   1.  If strong consistency is required, you can use strongly consistent reads with [DynamoDB](https://aws.amazon.com/documentdb/) or a relational database such as [Amazon RDS](https://aws.amazon.com/rds/). 

1.  What query and result formats must be supported? (for example, SQL, CSV, Parque, Avro, JSON, etc.) 

1.  What data types, field sizes and overall quantities are present? (for example, text, numeric, spatial, time-series calculated, binary or blob, document) 

1.  How will the storage requirements change over time? How does this impact scalability? 

   1.  Serverless databases such as [DynamoDB](https://aws.amazon.com/documentdb/) and [Amazon Quantum Ledger Database](https://aws.amazon.com/qldb/) will scale dynamically up to near-unlimited storage. 

   1.  Relational databases have upper bounds on provisioned storage, and often must be horizontally partitioned via mechanisms such as sharding once they reach these limits. 

1.  What is the proportion of read queries in relation to write queries? Would caching be likely to improve performance? 

   1.  Read-heavy workloads can benefit from a caching layer, this could be [ElastiCache](https://aws.amazon.com/elasticache/) or [DAX](https://aws.amazon.com/dynamodb/dax/) if the database is DynamoDB. 

   1.  Reads can also be offloaded to read replicas with relational databases such as [Amazon RDS](https://aws.amazon.com/rds/). 

1.  Does storage and modification (OLTP - Online Transaction Processing) or retrieval and reporting (OLAP - Online Analytical Processing) have a higher priority? 

   1.  For high-throughput transactional processing, consider a NoSQL database such as DynamoDB or Amazon DocumentDB. 

   1.  For analytical queries, consider a columnar database such as [Amazon Redshift](https://aws.amazon.com/redshift/) or exporting the data to Amazon S3 and performing analytics using [Athena](https://aws.amazon.com/athena/) or [QuickSight.](https://aws.amazon.com/quicksight/) 

1.  How sensitive is this data and what level of protection and encryption does it require? 

   1.  All Amazon RDS and Aurora engines support data encryption at rest using AWS KMS. Microsoft SQL Server and Oracle also support native Transparent Data Encryption (TDE) when using Amazon RDS. 

   1.  For DynamoDB, you can use fine-grained access control with [IAM](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/access-control-overview.html) to control who has access to what data at the key level. 

1.  What level of durability does the data require? 

   1.  Aurora automatically replicates your data across three Availability Zones within a Region, meaning your data is highly durable with less chance of data loss. 

   1.  DynamoDB is automatically replicated across multiple Availability Zones, providing high availability and data durability. 

   1.  Amazon S3 provides 11 9s of durability. Many database services such as Amazon RDS and DynamoDB support exporting data to Amazon S3 for long-term retention and archival. 

1.  Do [Recovery Time Objective (RTO) or Recovery Point Objectives (RPO)](https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/plan-for-disaster-recovery-dr.html) requirements influence the solution? 

   1.  Amazon RDS, Aurora, DynamoDB, Amazon DocumentDB, and Neptune all support point in time recovery and on-demand backup and restore.  

   1.  For high availability requirements, DynamoDB tables can be replicated globally using the [Global Tables](https://aws.amazon.com/dynamodb/global-tables/) feature and Aurora clusters can be replicated across multiple Regions using the Global database feature. Additionally, S3 buckets can be replicated across AWS Regions using cross-region replication.  

1.  Is there a desire to move away from commercial database engines / licensing costs? 

   1.  Consider open-source engines such as PostgreSQL and MySQL on Amazon RDS or Aurora 

   1.  Leverage [AWS DMS](https://aws.amazon.com/dms/) and [AWS SCT](https://aws.amazon.com/dms/schema-conversion-tool/) to perform migrations from commercial database engines to open-source 

1.  What is the operational expectation for the database? Is moving to managed services a primary concern? 

   1.  Leveraging Amazon RDS instead of Amazon EC2, and DynamoDB or Amazon DocumentDB instead of self-hosting a NoSQL database can reduce operational overhead. 

1.  How is the database currently accessed? Is it only application access, or are there Business Intelligence (BI) users and other connected off-the-shelf applications? 

   1.  If you have dependencies on external tooling then you may have to maintain compatibility with the databases they support. Amazon RDS is fully compatible with the difference engine versions that it supports including Microsoft SQL Server, Oracle, MySQL, and PostgreSQL. 

1.  The following is a list of potential data management services, and where these can best be used: 

   1.  Relational databases store data with predefined schemas and relationships between them. These databases are designed to support ACID (atomicity, consistency, isolation, durability) transactions, and maintain referential integrity and strong data consistency. Many traditional applications, enterprise resource planning (ERP), customer relationship management (CRM), and ecommerce use relational databases to store their data. You can run many of these database engines on Amazon EC2, or choose from one of the AWS-managed [database services](https://aws.amazon.com/products/databases/): [Amazon Aurora](https://aws.amazon.com/rds/aurora), [Amazon RDS](https://aws.amazon.com/rds), and [Amazon Redshift](https://aws.amazon.com/redshift). 

   1.  Key-value databases are optimized for common access patterns, typically to store and retrieve large volumes of data. These databases deliver quick response times, even in extreme volumes of concurrent requests. High-traffic web apps, ecommerce systems, and gaming applications are typical use-cases for key-value databases. In AWS, you can utilize [Amazon DynamoDB](https://aws.amazon.com/dynamodb/), a fully managed, multi-Region, multi-master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. 

   1.  In-memory databases are used for applications that require real-time access to data, lowest latency and highest throughput. By storing data directly in memory, these databases deliver microsecond latency to applications where millisecond latency is not enough. You may use in-memory databases for application caching, session management, gaming leaderboards, and geospatial applications. [Amazon ElastiCache](https://aws.amazon.com/elasticache/) is a fully managed in-memory data store, compatible with [Redis](https://aws.amazon.com/elasticache/redis/) or [Memcached](https://aws.amazon.com/elasticache/memcached). In case the applications also higher durability requirements, [Amazon MemoryDB for Redis](https://aws.amazon.com/memorydb/) offers this in combination being a durable, in-memory database service for ultra-fast performance. 

   1.  A document database is designed to store semistructured data as JSON-like documents. These databases help developers build and update applications such as content management, catalogs, and user profiles quickly. [Amazon DocumentDB](https://aws.amazon.com/documentdb/) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. 

   1.  A wide column store is a type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. You typically see a wide column store in high scale industrial apps for equipment maintenance, fleet management, and route optimization. [Amazon Keyspaces (for Apache Cassandra)](https://aws.amazon.com/mcs/) is a wide column scalable, highly available, and managed Apache Cassandra–compatible database service. 

   1.  Graph databases are for applications that must navigate and query millions of relationships between highly connected graph datasets with millisecond latency at large scale. Many companies use graph databases for fraud detection, social networking, and recommendation engines. [Amazon Neptune](https://aws.amazon.com/neptune/) is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. 

   1.  Time-series databases efficiently collect, synthesize, and derive insights from data that changes over time. IoT applications, DevOps, and industrial telemetry can utilize time-series databases. [Amazon Timestream](https://aws.amazon.com/timestream/) is a fast, scalable, fully managed time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day. 

   1.  Ledger databases provide a centralized and trusted authority to maintain a scalable, immutable, and cryptographically verifiable record of transactions for every application. We see ledger databases used for systems of record, supply chain, registrations, and even banking transactions. [Amazon Quantum Ledger Database (Amazon QLDB)](https://aws.amazon.com/qldb/) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority. Amazon QLDB tracks every application data change and maintains a complete and verifiable history of changes over time. 

 **Level of effort for the implementation plan: **If a workload is moving from one database solution to another, there could be a *high* level of effort involved in refactoring the data and application.   

## Resources
Resources

 **Related documents:** 
+  [Cloud Databases with AWS ](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 
+  [Choose between EC2 and Amazon RDS](https://docs.aws.amazon.com/prescriptive-guidance/latest/migration-sql-server/comparison.html) 
+  [Best Practices for Implementing Amazon ElastiCache](https://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28) 
+ [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+ [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM) 

 **Related examples:** 
+  [Optimize Data Pattern using Amazon Redshift Data Sharing](https://wellarchitectedlabs.com/sustainability/300_labs/300_optimize_data_pattern_using_redshift_data_sharing/) 
+  [Database Migrations](https://github.com/aws-samples/aws-database-migration-samples) 
+  [MS SQL Server - AWS Database Migration Service (DMS) Replication Demo](https://github.com/aws-samples/aws-dms-sql-server) 
+  [Database Modernization Hands On Workshop](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Amazon Neptune Samples](https://github.com/aws-samples/amazon-neptune-samples) 

# PERF04-BP02 Evaluate the available options
PERF04-BP02 Evaluate the available options

 Understand the available database options and how it can optimize your performance before you select your data management solution. Use load testing to identify database metrics that matter for your workload. While you explore the database options, take into consideration various aspects such as the parameter groups, storage options, memory, compute, read replica, eventual consistency, connection pooling, and caching options. Experiment with these various configuration options to improve the metrics. 

 **Desired outcome:** A workload could have one or more database solutions used based on data types. The database functionality and benefits optimally match the data characteristics, access patterns, and workload requirements. To optimize your database performance and cost, you must evaluate the data access patterns to determine the appropriate database options. Evaluate the acceptable query times to ensure that the selected database options can meet the requirements. 

 **Common anti-patterns:** 
+  Not identifying the data access patterns. 
+  Not being aware of the configuration options of your chosen data management solution. 
+  Relying solely on increasing the instance size without looking at other available configuration options. 
+  Not testing the scaling characteristics of the chosen solution. 

 
 **Benefits of establishing this best practice:** By exploring and experimenting with the database options you may be able to reduce the cost of infrastructure, improve performance and scalability and lower the effort required to maintain your workloads. 

 **Level of risk exposed if this best practice is not established:** High 
+  Having to optimize for a *one size fits all* database means making unnecessary compromises. 
+  Higher costs as a result of not configuring the database solution to match the traffic patterns. 
+  Operational issues may emerge from scaling issues. 
+  Data may not be secured to the level required. 

## Implementation guidance
Implementation guidance

 Understand your workload data characteristics so that you can configure your database options. Run load tests to identify your key performance metrics and bottlenecks. Use these characteristics and metrics to evaluate database options and experiment with different configurations. 


|  AWS Services  |  Amazon RDS, Amazon Aurora  |  Amazon DynamoDB  |  Amazon DocumentDB  |  Amazon ElastiCache  |  Amazon Neptune  |  Amazon Timestream  |  Amazon Keyspaces  |  Amazon QLDB  | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | 
|  Scaling Compute  |  Increase instance size, Aurora Serverless instances autoscale in response to changes in load  |  Automatic read/write scaling with on-demand capacity mode or automatic scaling of provisioned read/write capacity in provisioned capacity mode  |  Increase instance size  |  Increase instance size, add nodes to cluster  |  Increase instance size  |  Automatically scales to adjust capacity  |  Automatic read/write scaling with on-demand capacity mode or automatic scaling of provisioned read/write capacity in provisioned capacity mode  |  Automatically scales to adjust capacity  | 
|  Scaling-out reads  |  All engines support read replicas. Aurora supports automatic scaling of read replica instances  |  Increase provisioned read capacity units  |  Read replicas  |  Read replicas  |  Read replicas. Supports automatic scaling of read replica instances  |  Automatically scales  |  Increase provisioned read capacity units  |  Automatically scales up to documented concurrency limits  | 
|  Scaling-out writes  |  Increasing instance size, batching writes in the application or adding a queue in front of the database. Horizontal scaling via application-level sharding across multiple instances  |  Increase provisioned write capacity units. Ensuring optimal partition key to prevent partition level write throttling  |  Increasing primary instance size  |  Using Redis in cluster mode to distribute writes across shards  |  Increasing instance size  |  Write requests may be throttled while scaling. If you encounter throttling exceptions, continue to send data at the same (or higher) throughput to automatically scale. Batch writes to reduce concurrent write requests  |  Increase provisioned write capacity units. Ensuring optimal partition key to prevent partition level write throttling  |  Automatically scales up to documented concurrency limits  | 
|  Engine configuration  |  Parameter groups  |  Not applicable  |  Parameter groups  |  Parameter groups  |  Parameter groups  |  Not applicable  |  Not applicable  |  Not applicable  | 
|  Caching  |  In-memory caching, configurable via parameter groups. Pair with a dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  DAX (DAX) fully managed cache available  |  In-memory caching. Optionally, pair with a dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  Primary function is caching  |  Use the query results cache to cache the result of a read-only query  |  Timestream has two storage tiers; one of these is a high-performance in-memory tier  |  Deploy a separate dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  Not applicable  | 
|  High availability / disaster recovery  |  Recommended configuration for production workloads is to run a standby instance in a second Availability Zone to provide resiliency within a Region.  For resiliency across Regions, Aurora Global Database can be used  |  Highly available within a Region. Tables can be replicated across Regions using DynanoDB global tables  |  Create multiple instances across Availability Zones for availability.  Snapshots can be shared across Regions and clusters can be replicated using DMS to provide Cross-Region Replication / disaster recovery  |  Recommended configuration for production clusters is to create at least one node in a secondary Availability Zone.  ElastiCache Global Datastore can be used to replicate clusters across Regions.  |  Read replicas in other Availability Zones serve as failover targets.  Snapshots can be shared across Region and clusters can be replicated using Neptune streams to replicate data between two clusters in two different Regions.  |  Highly available within a Region.  cross-Region replication requires custom application development using the Timestream SDK  |  Highly available within a Region.  Cross-Region Replication requires custom application logic or third-party tools  |  Highly available within a Region.  To replicate across Regions, export the contents of the Amazon QLDB journal to a S3 bucket and configure the bucket for Cross-Region Replication.  | 

 
 **Implementation steps** 

1.  What configuration options are available for the selected databases? 

   1.  Parameter Groups for Amazon RDS and Aurora allow you to adjust common database engine level settings such as the memory allocated for the cache or adjusting the time zone of the database 

   1.  For provisioned database services such as Amazon RDS, Aurora, Neptune, Amazon DocumentDB and those deployed on Amazon EC2 you can change the instance type, provisioned storage and add read replicas. 

   1.  DynamoDB allows you to specify two capacity modes: on-demand and provisioned. To account for differing workloads, you can change between these modes and increase the allocated capacity in provisioned mode at any time. 

1.  Is the workload read or write heavy?  

   1.  What solutions are available for offloading reads (read replicas, caching, etc.)?  

      1.  For DynamoDB tables, you can offload reads using DAX for caching. 

      1.  For relational databases, you can create an ElastiCache for Redis cluster and configure your application to read from the cache first, falling back to the database if the requested item is not present. 

      1.  Relational databases such as Amazon RDS and Aurora, and provisioned NoSQL databases such as Neptune and Amazon DocumentDB all support adding read replicas to offload the read portions of the workload. 

      1.  Serverless databases such as DynamoDB will scale automatically. Ensure that you have enough read capacity units (RCU) provisioned to handle the workload. 

   1.  What solutions are available for scaling writes (partition key sharding, introducing a queue, etc.)? 

      1.  For relational databases, you can increase the size of the instance to accommodate an increased workload or increase the provisioned IOPs to allow for an increased throughput to the underlying storage. 
         +  You can also introduce a queue in front of your database rather than writing directly to the database. This pattern allows you to decouple the ingestion from the database and control the flow-rate so the database does not get overwhelmed.  
         +  Batching your write requests rather than creating many short-lived transactions can help improve throughput in high-write volume relational databases. 

      1.  Serverless databases like DynamoDB can scale the write throughput automatically or by adjusting the provisioned write capacity units (WCU) depending on the capacity mode.  
         +  You can still run into issues with *hot* partitions though, when you reach the throughput limits for a given partition key. This can be mitigated by choosing a more evenly distributed partition key or by write-sharding the partition key.  

1.  What are the current or expected peak transactions per second (TPS)? Test using this volume of traffic and this volume \$1X% to understand the scaling characteristics. 

   1.  Native tools such as pg\$1bench for PostgreSQL can be used to stress-test the database and understand the bottlenecks and scaling characteristics. 

   1.  Production-like traffic should be captured so that it can be replayed to simulate real-world conditions in addition to synthetic workloads. 

1.  If using serverless or elastically scalable compute, test the impact of scaling this on the database. If appropriate, introduce connection management or pooling to lower impact on the database.  

   1.  RDS Proxy can be used with Amazon RDS and Aurora to manage connections to the database.  

   1.  Serverless databases such as DynamoDB do not have connections associated with them, but consider the provisioned capacity and automatic scaling policies to deal with spikes in load. 

1.  Is the load predictable, are there spikes in load and periods of inactivity? 

   1.  If there are periods of inactivity consider scaling down the provisioned capacity or instance size during these times. Aurora Serverless V2 will automatically scale up and down based on load. 

   1.  For non-production instances, consider pausing or stopping these during non-work hours. 

1.  Do you need to segment and break apart your data models based on access patterns and data characteristics? 

   1.  Consider using AWS DMS or AWS SCT to move your data to other services. 

## Level of effort for the implementation plan: 
Level of effort for the implementation plan: 

To establish this best practice, you must be aware of your current data characteristics and metrics. Gathering those metrics, establishing a baseline and then using those metrics to identify the ideal database configuration options is a *low* to *moderate* level of effort. This is best validated by load tests and experimentation. 

## Resources
Resources

 **Related documents:** 
+  [Cloud Databases with AWS ](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 

 
 **Related videos:** 
+  [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28)
+ [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Amazon DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples) 
+  [AWS Database migration samples](https://github.com/aws-samples/aws-database-migration-samples) 
+  [Database Modernization Workshop](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Working with parameters on your Amazon RDS for Postgress DB](https://github.com/awsdocs/amazon-rds-user-guide/blob/main/doc_source/Appendix.PostgreSQL.CommonDBATasks.Parameters.md) 

# PERF04-BP03 Collect and record database performance metrics
PERF04-BP03 Collect and record database performance metrics

 To understand how your data management systems are performing, it is important to track relevant metrics. These metrics will help you to optimize your data management resources, to ensure that your workload requirements are met, and that you have a clear overview on how the workload performs. Use tools, libraries, and systems that record performance measurements related to database performance. 

 
 There are metrics that are related to the system on which the database is being hosted (for example, CPU, storage, memory, IOPS), and there are metrics for accessing the data itself (for example, transactions per second, queries rates, response times, errors). These metrics should be readily accessible for any support or operational staff, and have sufficient historical record to be able to identify trends, anomalies, and bottlenecks. 

 
 **Desired outcome:** To monitor the performance of your database workloads, you must record multiple performance metrics over a period of time. This allows you to detect anomalies as well as measure performance against business metrics to ensure you are meeting your workload needs. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics. 
+  You only publish metrics to internal tools used by your team and don’t have a comprehensive picture of your workload. 
+  You only use the default metrics recorded by your selected monitoring software. 
+  You only review metrics when there is an issue. 
+  You only monitor system level metrics, not capturing data access or usage metrics. 

 **Benefits of establishing this best practice:** Establishing a performance baseline helps in understanding normal behavior and requirements of workloads. Abnormal patterns can be identified and debugged faster improving performance and reliability of the database. Database capacity can be configured to ensure optimal cost without compromising performance. 

 **Level of risk exposed if this best practice is not established:** High 
+  Inability to differentiate out of normal vs. normal performance level will create difficulties in issue identification, and decision making. 
+  Potential cost savings may not be identified. 
+  Growth patterns will not be identified which might result in reliability or performance degradation. 

## Implementation guidance
Implementation guidance

 Identify, collect, aggregate, and correlate database-related metrics. Metrics should include both the underlying system that is supporting the database and the database metrics. The underlying system metrics might include CPU utilization, memory, available disk storage, disk I/O, and network inbound and outbound metrics while the database metrics might include transactions per second, top queries, average queries rates, response times, index usage, table locks, query timeouts, and number of connections open. This data is crucial to understand how the workload is performing and how the database solution is used. Use these metrics as part of a data-driven approach to tune and optimize your workload's resources.  

 **Implementation steps:** 

1.  Which database metrics are important to track? 

   1.  [Monitoring metrics for Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html) 

   1.  [Monitoring with Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) 

   1.  [Enhanced monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.overview.html) 

   1.  [DynamoDB metrics](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html) 

   1.  [Monitoring DynamoDB DAX](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.Monitoring.html) 

   1.  [Monitoring MemoryDB](https://docs.aws.amazon.com/memorydb/latest/devguide/monitoring-cloudwatch.html) 

   1.  [Monitoring Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/metrics.html) 

   1.  [Timeseries metrics and dimensions](https://docs.aws.amazon.com/timestream/latest/developerguide/metrics-dimensions.html) 

   1.  [Cluster level metrics for Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html) 

   1.  [Monitoring Amazon Keyspaces](https://docs.aws.amazon.com/keyspaces/latest/devguide/monitoring.html) 

   1.  [Monitoring Amazon Neptune](https://docs.aws.amazon.com/neptune/latest/userguide/monitoring.html) 

1.  Would the database monitoring benefit from a machine learning solution that detects operational anomalies performance issues? 

   1.  [Amazon DevOps Guru for Amazon RDS](https://docs.aws.amazon.com/devops-guru/latest/userguide/working-with-rds.overview.how-it-works.html) provides visibility into performance issues and makes recommendations for corrective actions. 

1.  Do you need application level details about SQL usage? 

   1.  [AWS X-Ray](https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-sql) can be instrumented into the application to gain insights and encapsulate all the data points for single query. 

1.  Do you currently have an approved logging and monitoring solution? 

   1.  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 

1.  You identified and configured your data retention policies to match my security and operational goals? 

   1.  [Default data retention for CloudWatch metrics](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 

   1.  [Default data retention for CloudWatch Logs](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 

 **Level of effort for the implementation plan: **There is a *medium* level of effort to identify, track, collect, aggregate, and correlate metrics from all database resources. 

## Resources
Resources

 **Related documents:** 
+ [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/) 
+ [ Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/)
+ [ Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html)
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/)
+ [Amazon DynamoDB best practices ](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+ [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+ [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+ [Cloud Databases with AWS](https://aws.amazon.com/products/databases/) 
+  [Amazon RDS Performance Insights](https://aws.amazon.com/rds/performance-insights/) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54)
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Level 100: Monitoring with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_with_cloudwatch_dashboards/) 
+  [AWS Dataset Ingestion Metrics Collection Framework](https://github.com/awslabs/aws-dataset-ingestion-metrics-collection-framework) 
+  [Amazon RDS Monitoring Workshop](https://www.workshops.aws/?tag=Enhanced%20Monitoring) 

# PERF04-BP04 Choose data storage based on access patterns
PERF04-BP04 Choose data storage based on access patterns

 Use the access patterns of the workload to decide which services and technologies to use. In addition to non-functional requirements such as performance and scale, access patterns heavily influence the choice of the database and storage solutions. The first dimension is the need for transactions, ACID compliance, and consistent reads. Not every database supports these and most of the NoSQL databases provide an eventual consistency model. The second important dimension would be the distribution of write and reads over time and space. Globally distributed applications need to consider the traffic patterns, latency and access requirements in order to identify the optimal storage solution. The third crucial aspect to choose is the query pattern flexibility, random access patterns, and one-time queries. Considerations around highly specialized query functionality for text and natural language processing, time series, and graphs must also be taken into account. 

 **Desired outcome:** The data storage has been selected based on identified and documented data access patterns. This might include the most common read, write and delete queries, the need for ad-hoc calculations and aggregations, complexity of the data, the data interdependency, and the required consistency needs. 

 **Common anti-patterns:** 
+  You only select one database vendor to simplify operations management. 
+  You assume that data access patterns will stay consistent over time. 
+  You implement complex transactions, rollback, and consistency logic in the application. 
+  The database is configured to support a potential high traffic burst, which results in the database resources remaining idle most of the time. 
+  Using a shared database for transactional and analytical uses. 

 **Benefits of establishing this best practice:** Selecting and optimizing your data storage based on access patterns will help decrease development complexity and optimize your performance opportunities. Understanding when to use read replicas, global tables, data partitioning, and caching will help you decrease operational overhead and scale based on your workload needs. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
Implementation guidance

 Identify and evaluate your data access pattern to select the correct storage configuration. Each database solution has options to configure and optimize your storage solution. Use the collected metrics and logs and experiment with options to find the optimal configuration. Use the following table to review storage options per database service. 


|  AWS Services  |  Amazon RDS, Amazon Aurora  |  Amazon DynamoDB  |  Amazon DocumentDB  |  Amazon ElastiCache  |  Amazon Neptune  |  Amazon Timestream  |  Amazon Keyspaces  |  Amazon QLDB  | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | 
|  Scaling Storage  |  Storage automatic scaling option available to automatically scale provisioned storage IOPS can also be scaled independently of provisioned storage when leveraging provisioned IOPs storage types  |  Automatically scales. Tables are unconstrained in terms of size.  |  Storage automatic scaling option available scale provisioned storage  |  Storage is in-memory, tied to instance type or count  |  Storage automatic scaling option available to automatically scale provisioned storage  |  Configure retention period for in-memory and magnetic tiers in days  |  Scales table storage up and down automatically  |  Automatically scales. Tables are unconstrained in terms of size.  | 

 
 **Implementation steps:** 

1.  Identify and document the anticipated growth of the data and traffic. 

   1.  Amazon RDS and Aurora support storage automatic scaling up to documented limits. Beyond this, consider transitioning older data to Amazon S3 for archival, aggregating historical data for analytics or scaling horizontally via sharding. 

   1.  DynamoDB and Amazon S3 will scale to near limitless storage volume automatically. 

   1.  Amazon RDS instances and databases running on EC2 can be manually resized and EC2 instances can have new EBS volumes added at a later date for additional storage.  

   1.  Instance types can be changed based on changes in activity. For example, you can start with a smaller instance while you are testing, then scale the instance as you begin to receive production traffic to the service. Aurora Serverless V2 automatically scales in response to changes in load.  

1.  Document requirements around normal and peak performance (transactions per second TPS and queries per second QPS) and consistency (ACID and eventual consistency). 

1.  Document solution deployment aspects and the database access requirements (global, Mult-AZ, read replication, multiple write nodes) 

 **Level of effort for the implementation plan: **If you do not have logs or metrics for your data management solution, you will need to complete that before identifying and documenting your data access patterns. Once your data access pattern is understood, selecting, and configuring your data storage is a *low* level of effort. 

## Resources
Resources

 **Related documents:** 
+ [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/)
+ [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 
+ [Amazon Aurora best practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html) 
+ [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/) 
+ [Amazon DynamoDB best practices ](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+ [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+ [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+  [Cloud Databases with AWS](https://aws.amazon.com/products/databases/)
+  [Amazon RDS Storage Types](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L)](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54)
+ [ Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Experiment and test with Distributed Load Testing on AWS](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) 

# PERF04-BP05 Optimize data storage based on access patterns and metrics
PERF04-BP05 Optimize data storage based on access patterns and metrics

 Use performance characteristics and access patterns that optimize how data is stored or queried to achieve the best possible performance. Measure how optimizations such as indexing, key distribution, data warehouse design, or caching strategies impact system performance or overall efficiency. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics. 
+  You only publish metrics to internal tools. 

 **Benefits of establishing this best practice:** In order to ensure you are meeting the metrics required for the workload, you must monitor database performance metrics related to both reads and writes. You can use this data to add new optimizations for both reads and writes to the data storage layer. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
Implementation guidance

 Optimize data storage based on metrics and patterns: Use reported metrics to identify any underperforming areas in your workload and optimize your database components. Each database system has different performance related characteristics to evaluate, such as how data is indexed, cached, or distributed among multiple systems. Measure the impact of your optimizations. 

## Resources
Resources

 **Related documents:** 
+  [AWS Database Caching](https://aws.amazon.com/caching/database-caching/) 
+  [Amazon Athena top 10 performance tips](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 
+  [Amazon Aurora best practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html) 
+  [Amazon DynamoDB Accelerator](https://aws.amazon.com/dynamodb/dax/) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+  [Amazon Redshift Spectrum best practices](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+  [Amazon Redshift performance](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+  [Cloud Databases with AWS](https://aws.amazon.com/products/databases/) 
+  [Analyzing performance anomalies with DevOps Guru for RDS](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/devops-guru-for-rds.html) 
+  [Read/Write Capacity Mode for DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) 

 **Related videos:** 
+  [AWS purpose-built databases (DAT209-L)](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R)](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1)](https://www.youtube.com/watch?v=6yqfmXiZTlM) 

 **Related examples:** 
+  [Hands-on Labs for Amazon DynamoDB](https://amazon-dynamodb-labs.workshop.aws/hands-on-labs.html)