# Selection
<a name="a-selection"></a>

**Topics**
+ [PERF 1. How do you select the best performing architecture?](perf-01.md)
+ [PERF 2. How do you select your compute solution?](perf-02.md)
+ [PERF 3. How do you select your storage solution?](perf-03.md)
+ [PERF 4. How do you select your database solution?](perf-04.md)
+ [PERF 5. How do you configure your networking solution?](perf-05.md)

# PERF 1. How do you select the best performing architecture?
<a name="perf-01"></a>

 Often, multiple approaches are required more efficient performance across a workload. Well-architected systems use multiple solutions and features to improve performance. 

**Topics**
+ [PERF01-BP01 Understand the available services and resources](perf_performing_architecture_evaluate_resources.md)
+ [PERF01-BP02 Define a process for architectural choices](perf_performing_architecture_process.md)
+ [PERF01-BP03 Factor cost requirements into decisions](perf_performing_architecture_cost.md)
+ [PERF01-BP04 Use policies or reference architectures](perf_performing_architecture_use_policies.md)
+ [PERF01-BP05 Use guidance from your cloud provider or an appropriate partner](perf_performing_architecture_external_guidance.md)
+ [PERF01-BP06 Benchmark existing workloads](perf_performing_architecture_benchmark.md)
+ [PERF01-BP07 Load test your workload](perf_performing_architecture_load_test.md)

# PERF01-BP01 Understand the available services and resources
<a name="perf_performing_architecture_evaluate_resources"></a>

 Learn about and understand the wide range of services and resources available in the cloud. Identify the relevant services and configuration options for your workload, and understand how to achieve optimal performance. 

 If you are evaluating an existing workload, you must generate an inventory of the various services resources it consumes. Your inventory helps you evaluate which components can be replaced with managed services and newer technologies. 

 **Common anti-patterns:** 
+  You use the cloud as a collocated data center. 
+  You use shared storage for all things that need persistent storage. 
+  You do not use automatic scaling. 
+  You use instance types that are closest matched, but larger where needed, to your current standards. 
+  You deploy and manage technologies that are available as managed services. 

 **Benefits of establishing this best practice:** By considering services you may be unfamiliar with, you may be able to greatly reduce the cost of infrastructure and the effort required to maintain your services. You may be able to accelerate your time to market by deploying new services and features. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="perf01-bp01-implementation-guidance"></a>

 Inventory your workload software and architecture for related services: Gather an inventory of your workload and decide which category of products to learn more about. Identify workload components that can be replaced with managed services to increase performance and reduce operational complexity. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP02 Define a process for architectural choices
<a name="perf_performing_architecture_process"></a>

 Use internal experience and knowledge of the cloud, or external resources such as published use cases, relevant documentation, or whitepapers, to define a process to choose resources and services. You should define a process that encourages experimentation and benchmarking with the services that could be used in your workload. 

 When you write critical user stories for your architecture, you should include performance requirements, such as specifying how quickly each critical story should run. For these critical stories, you should implement additional scripted user journeys to ensure that you have visibility into how these stories perform against your requirements. 

 **Common anti-patterns:** 
+  You assume your current architecture will become static and not be updated over time. 
+  You introduce architecture changes over time without justification. 

 **Benefits of establishing this best practice:** By having a defined process for making architectural changes, you permit using the gathered data to influence your workload design over time. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Select an architectural approach: Identify the kind of architecture that meets your performance requirements. Identify constraints, such as the media for delivery (desktop, web, mobile, IoT), legacy requirements, and integrations. Identify opportunities for reuse, including refactoring. Consult other teams, architecture diagrams, and resources such as AWS Solution Architects, AWS Reference Architectures, and AWS Partners to help you choose an architecture. 

 Define performance requirements: Use the customer experience to identify the most important metrics. For each metric, identify the target, measurement approach, and priority. Define the customer experience. Document the performance experience required by customers, including how customers will judge the performance of the workload. Prioritize experience concerns for critical user stories. Include performance requirements and implement scripted user journeys to ensure that you know how the stories perform against your requirements. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP03 Factor cost requirements into decisions
<a name="perf_performing_architecture_cost"></a>

 Workloads often have cost requirements for operation. Use internal cost controls to select resource types and sizes based on predicted resource need. 

 Determine which workload components could be replaced with fully managed services, such as managed databases, in-memory caches, and ETL services. Reducing your operational workload allows you to focus resources on business outcomes. 

 For cost requirement best practices, refer to the *Cost-Effective Resources* section of the [Cost Optimization Pillar whitepaper](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html). 

 **Common anti-patterns:** 
+  You only use one family of instances. 
+  You do not evaluate licensed solutions versus open-source solutions 
+  You only use block storage. 
+  You deploy common software on EC2 instances and Amazon EBS or ephemeral volumes that are available as a managed service. 

 **Benefits of establishing this best practice:** Considering cost when making your selections will allow you to allow other investments. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Optimize workload components to reduce cost: Right size workload components and allow elasticity to reduce cost and maximize component efficiency. Determine which workload components can be replaced with managed services when appropriate, such as managed databases, in-memory caches, and reverse proxies. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1) ](https://www.youtube.com/watch?v=zt6jYJLK8sg&ref=wellarchitected) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF01-BP04 Use policies or reference architectures
<a name="perf_performing_architecture_use_policies"></a>

 Maximize performance and efficiency by evaluating internal policies and existing reference architectures and using your analysis to select services and configurations for your workload. 

 **Common anti-patterns:** 
+  You allow wide use of technology selection that may impact the management overhead of your company. 

 **Benefits of establishing this best practice:** Establishing a policy for architecture, technology, and vendor choices will allow decisions to be made quickly. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Deploy your workload using existing policies or reference architectures: Integrate the services into your cloud deployment, then use your performance tests to ensure that you can continue to meet your performance requirements. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP05 Use guidance from your cloud provider or an appropriate partner
<a name="perf_performing_architecture_external_guidance"></a>

 Use cloud company resources, such as solutions architects, professional services, or an appropriate partner to guide your decisions. These resources can help review and improve your architecture for optimal performance. 

 Reach out to AWS for assistance when you need additional guidance or product information. AWS Solutions Architects and [AWS Professional Services](https://aws.amazon.com/professional-services/) provide guidance for solution implementation. [AWS Partners](https://aws.amazon.com/partners/) provide AWS expertise to help you unlock agility and innovation for your business. 

 **Common anti-patterns:** 
+  You use AWS as a common data center provider. 
+  You use AWS services in a manner that they were not designed for. 

 **Benefits of establishing this best practice:** Consulting with your provider or a partner will give you confidence in your decisions. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Reach out to AWS resources for assistance: AWS Solutions Architects and Professional Services provide guidance for solution implementation. APN Partners provide AWS expertise to help you unlock agility and innovation for your business. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP06 Benchmark existing workloads
<a name="perf_performing_architecture_benchmark"></a>

 Benchmark the performance of an existing workload to understand how it performs on the cloud. Use the data collected from benchmarks to drive architectural decisions. 

 Use benchmarking with synthetic tests and real-user monitoring to generate data about how your workload’s components perform. Benchmarking is generally quicker to set up than load testing and is used to evaluate the technology for a particular component. Benchmarking is often used at the start of a new project, when you lack a full solution to load test. 

 You can either build your own custom benchmark tests, or you can use an industry standard test, such as [TPC-DS](http://www.tpc.org/tpcds/) to benchmark your data warehousing workloads. Industry benchmarks are helpful when comparing environments. Custom benchmarks are useful for targeting specific types of operations that you expect to make in your architecture. 

 When benchmarking, it is important to pre-warm your test environment to ensure valid results. Run the same benchmark multiple times to ensure that you’ve captured any variance over time. 

 Because benchmarks are generally faster to run than load tests, they can be used earlier in the deployment pipeline and provide faster feedback on performance deviations. When you evaluate a significant change in a component or service, a benchmark can be a quick way to see if you can justify the effort to make the change. Using benchmarking in conjunction with load testing is important because load testing informs you about how your workload will perform in production. 

 **Common anti-patterns:** 
+  You rely on common benchmarks that are not indicative of your workload characteristics. 
+  You rely on customer feedback and perceptions as your only benchmark. 

 **Benefits of establishing this best practice:** Benchmarking your current implementation allows you to measure the improvement in performance. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Monitor performance during development: Implement processes that provide visibility into performance as your workload evolves. 

 Integrate into your delivery pipeline: Automatically run load tests in your delivery pipeline. Compare the test results against pre-defined key performance indicators (KPIs) and thresholds to ensure that you continue to meet performance requirements. 

 Test user journeys: Use synthetic or sanitized versions of production data (remove sensitive or identifying information) for load testing. Exercise your entire architecture by using replayed or pre-programmed user journeys through your application at scale. 

 Real-user monitoring: Use CloudWatch RUM to help you collect and view client-side data about your application performance. Use this data to help establish your real-user performance benchmarks. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 
+  [Amazon CloudWatch RUM](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html) 
+  [Amazon CloudWatch Synthetics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 
+  [Optimize applications through Amazon CloudWatch RUM](https://www.youtube.com/watch?v=NMaeujY9A9Y) 
+  [Demo of Amazon CloudWatch Synthetics](https://www.youtube.com/watch?v=hF3NM9j-u7I) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 
+  [Distributed Load Tests](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) 
+  [Measure page load time with Amazon CloudWatch Synthetics](https://github.com/aws-samples/amazon-cloudwatch-synthetics-page-performance) 
+  [Amazon CloudWatch RUM Web Client](https://github.com/aws-observability/aws-rum-web) 

# PERF01-BP07 Load test your workload
<a name="perf_performing_architecture_load_test"></a>

 Deploy your latest workload architecture on the cloud using different resource types and sizes. Monitor the deployment to capture performance metrics that identify bottlenecks or excess capacity. Use this performance information to design or improve your architecture and resource selection. 

 Load testing uses your *actual* workload so that you can see how your solution performs in a production environment. Load tests must be run using synthetic or sanitized versions of production data (remove sensitive or identifying information). Use replayed or pre-programmed user journeys through your workload at scale that exercise your entire architecture. Automatically carry out load tests as part of your delivery pipeline, and compare the results against pre-defined KPIs and thresholds. This ensures that you continue to achieve required performance. 

 **Common anti-patterns:** 
+  You load test individual parts of your workload but not your entire workload. 
+  You load test on infrastructure that is not the same as your production environment. 
+  You only conduct load testing to your expected load and not beyond, to help foresee where you may have future problems. 
+  Performing load testing without informing AWS Support, and having your test defeated as it looks like a denial of service event. 

 **Benefits of establishing this best practice:** Measuring your performance under a load test will show you where you will be impacted as load increases. This can provide you with the capability of anticipating needed changes before they impact your workload. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Validate your approach with load testing: Load test a proof-of-concept to find out if you meet your performance requirements. You can use AWS services to run production-scale environments to test your architecture. Because you only pay for the test environment when it is needed, you can carry out full-scale testing at a fraction of the cost of using an on-premises environment. 

 Monitor metrics: Amazon CloudWatch can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 

 Test at scale: Load testing uses your actual workload so you can see how your solution performs in a production environment. You can use AWS services to run production-scale environments to test your architecture. Because you only pay for the test environment when it is needed, you can run full-scale testing at a lower cost than using an on-premises environment. Take advantage of the AWS Cloud to test your workload to discover where it fails to scale, or if it scales in a non-linear way. For example, use Spot Instances to generate loads at low cost and discover bottlenecks before they are experienced in production. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) 
+  [Building AWS CloudFormation Templates using CloudFormer](https://aws.amazon.com/blogs/devops/building-aws-cloudformation-templates-using-cloudformer/) 
+  [Amazon CloudWatch RUM](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html) 
+  [Amazon CloudWatch Synthetics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html) 
+  [Distributed Load Testing on AWS](https://docs.aws.amazon.com/solutions/latest/distributed-load-testing-on-aws/welcome.html) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [Optimize applications through Amazon CloudWatch RUM](https://www.youtube.com/watch?v=NMaeujY9A9Y) 
+  [Demo of Amazon CloudWatch Synthetics](https://www.youtube.com/watch?v=hF3NM9j-u7I) 

 **Related examples:** 
+  [Distributed Load Testing on AWS](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) 

# PERF 2. How do you select your compute solution?
<a name="perf-02"></a>

The most effective compute solution for a workload varies based on application design, usage patterns, and configuration settings. Architectures can use different compute solutions for various components and activates different features to improve performance. Selecting the wrong compute solution for an architecture can lead to lower performance efficiency.

**Topics**
+ [PERF02-BP01 Evaluate the available compute options](perf_select_compute_evaluate_options.md)
+ [PERF02-BP02 Understand the available compute configuration options](perf_select_compute_config_options.md)
+ [PERF02-BP03 Collect compute-related metrics](perf_select_compute_collect_metrics.md)
+ [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md)
+ [PERF02-BP05 Use the available elasticity of resources](perf_select_compute_elasticity.md)
+ [PERF02-BP06 Continually evaluate compute needs based on metrics](perf_select_compute_use_metrics.md)

# PERF02-BP01 Evaluate the available compute options
<a name="perf_select_compute_evaluate_options"></a>

 Understand how your workload can benefit from the use of different compute options, such as instances, containers and functions. 

 **Desired outcome:** By understanding all of the compute options available, you will be aware of the opportunities to increase performance, reduce unnecessary infrastructure costs, and lower the operational effort required to maintain your workload. You can also accelerate your time to market when you deploy new services and features. 

 **Common anti-patterns:** 
+  In a post-migration workload, using the same compute solution that was being used on premises. 
+  Lacking awareness of the cloud compute solutions and how those solutions might improve your compute performance. 
+  Oversizing an existing compute solution to meet scaling or performance requirements, when an alternative compute solution would align to your workload characteristics more precisely. 

 **Benefits of establishing this best practice:** By identifying the compute requirements and evaluating the available compute solutions, business stakeholders and engineering teams will understand the benefits and limitations of using the selected compute solution. The selected compute solution should fit the workload performance criteria. Key criteria include processing needs, traffic patterns, data access patterns, scaling needs, and latency requirements. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Understand the virtualization, containerization, and management solutions that can benefit your workload and meet your performance requirements. A workload can contain multiple types of compute solutions. Each compute solution has differing characteristics. Based on your workload scale and compute requirements, a compute solution can be selected and configured to meet your needs. The cloud architect should learn the advantages and disadvantages of instances, containers, and functions. The following steps will help you through how to select your compute solution to match your workload characteristics and performance requirements. 


|  **Type**  |  **Server**  |  **Containers**  |  **Function**  | 
| --- | --- | --- | --- | 
|  AWS service  |  Amazon Elastic Compute Cloud (Amazon EC2)  |  Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS)  |  AWS Lambda  | 
|  Key Characteristics  |  Has dedicated option for hardware license requirements, Placement Options, and a large selection of different instance families based on compute metrics  |  Easy deployment, consistent environments, runs on top of EC2 instances, Scalable  |  Short runtime (15 minutes or less), maximum memory and CPU are not as high as other services, Managed hardware layer, Scales to millions of concurrent requests  | 
|  Common use-cases  |  Lift and shift migrations, monolithic application, hybrid environments, enterprise applications  |  Microservices, hybrid environments,  |  Microservices, event-driven applications  | 

 
 **Implementation steps:** 

1.  Select the location of where the compute solution must reside by evaluating [PERF05-BP06 Choose your workload’s location based on network requirements](perf_select_network_location.md). This location will limit the types of compute solution available to you. 

1.  Identify the type of compute solution that works with the location requirement and application requirements  

   1.  [https://aws.amazon.com/ec2/](https://aws.amazon.com/ec2/) virtual server instances come in a wide variety of different families and sizes. They offer a wide variety of capabilities, including solid state drives (SSDs) and graphics processing units (GPUs). EC2 instances offer the greatest flexibility on instance choice. When you launch an EC2 instance, the instance type that you specify determines the hardware of your instance. Each instance type offers different compute, memory, and storage capabilities. Instance types are grouped in instance families based on these capabilities. Typical use cases include: running enterprise applications, high performance computing (HPC), training and deploying machine learning applications and running cloud native applications. 

   1.  [https://aws.amazon.com/ecs/](https://aws.amazon.com/ecs/) is a fully managed container orchestration service that allows you to automatically run and manage containers on a cluster of EC2 instances or serverless instances using AWS Fargate. You can use Amazon ECS with other services such as Amazon Route 53, Secrets Manager, AWS Identity and Access Management (IAM), and Amazon CloudWatch. Amazon ECS is recommended if your application is containerized and your engineering team prefers Docker containers. 

   1.  [https://aws.amazon.com/eks/](https://aws.amazon.com/eks/) is a fully managed Kubernetes service. You can choose to run your EKS clusters using AWS Fargate, removing the need to provision and manage servers. Managing Amazon EKS is simplified due to integrations with AWS Services such as Amazon CloudWatch, Auto Scaling Groups, AWS Identity and Access Management (IAM), and Amazon Virtual Private Cloud (VPC). When using containers, you must use compute metrics to select the optimal type for your workload, similar to how you use compute metrics to select your EC2 or AWS Fargate instance types. Amazon EKS is recommended if your application is containerized and your engineering team prefers Kubernetes over Docker containers. 

   1.  You can use [https://aws.amazon.com/lambda/](https://aws.amazon.com/lambda/) to run code that supports the allowed runtime, memory, and CPU options. Simply upload your code, and AWS Lambda will manage everything required to run and scale that code. You can set up your code to automatically run from other AWS services or call it directly. Lambda is recommended for short running, microservice architectures developed for the cloud.  

1.  After you have experimented with your new compute solution, plan your migration and validate your performance metrics. This is a continual process, see [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md). 

 **Level of effort for the implementation plan:** If a workload is moving from one compute solution to another, there could be a *moderate* level of effort involved in refactoring the application.   

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [EC2 Instance Types ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Processor State Control for Your EC2 Instance ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html?ref=wellarchitected) 
+  [EKS Containers: EKS Worker Nodes ](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS Containers: Amazon ECS Container Instances ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 
+  [Prescriptive Guidance for Containers](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23containers&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 
+  [Prescriptive Guidance for Serverless](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23serverless&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 

 **Related videos:** 
+  [How to choose compute option for startups](https://aws.amazon.com/startups/start-building/how-to-choose-compute-option/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Amazon EC2 foundations (CMP211-R2) ](https://www.youtube.com/watch?v=kMMybKqC2Y0&ref=wellarchitected) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system ](https://www.youtube.com/watch?v=rUY-00yFlE4&ref=wellarchitected) 
+  [Deliver high-performance ML inference with AWS Inferentia (CMP324-R1) ](https://www.youtube.com/watch?v=17r1EapAxpk&ref=wellarchitected) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1) ](https://www.youtube.com/watch?v=_dvh4P2FVbw&ref=wellarchitected) 

 **Related examples:** 
+  [Migrating the web application to containers](https://application-migration-with-aws.workshop.aws/en/container-migration.html) 
+  [Run a Serverless Hello World](https://aws.amazon.com/getting-started/hands-on/run-serverless-code/) 

# PERF02-BP02 Understand the available compute configuration options
<a name="perf_select_compute_config_options"></a>

 Each compute solution has options and configurations available to you to support your workload characteristics. Learn how various options complement your workload, and which configuration options are best for your application. Examples of these options include instance family, sizes, features (GPU, I/O), bursting, time-outs, function sizes, container instances, and concurrency. 

 **Desired outcome:** The workload characteristics including CPU, memory, network throughput, GPU, IOPS, traffic patterns, and data access patterns are documented and used to configure the compute solution to match the workload characteristics. Each of these metrics plus custom metrics specific to your workload are recorded, monitored, and then used to optimize the compute configuration to best meet the requirements. 

 **Common anti-patterns:** 
+  Using the same compute solution that was being used on premises. 
+  Not reviewing the compute options or instance family to match workload characteristics. 
+  Oversizing the compute to ensure bursting capability. 
+  You use multiple compute management platforms for the same workload. 

** Benefits of establishing this best practice:** Be familiar with the AWS compute offerings so that you can determine the correct solution for each of your workloads. After you have selected the compute offerings for your workload, you can quickly experiment with those compute offerings to determine how well they meet your workload needs. A compute solution that is optimized to meet your workload characteristics will increase your performance, lower your cost and increase your reliability.

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 If your workload has been using the same compute option for more than four weeks and you anticipate that the characteristics will remain the same in the future, you can use [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to provide a recommendation to you based on your compute characteristics. If AWS Compute Optimizer is not an option due to lack of metrics, [a non-supported instance type](https://docs.aws.amazon.com/compute-optimizer/latest/ug/requirements.html#requirements-ec2-instances) or a foreseeable change in your characteristics then you must predict your metrics based on load testing and experimentation.  

 **Implementation steps:** 

1.  Are you running on EC2 instances or containers with the EC2 Launch Type? 

   1.  Can your workload use GPUs to increase performance? 

      1.  [Accelerated Computing](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Accelerated_Computing) instances are GPU-based instances that provide the highest performance for machine learning training, inference and high performance computing. 

   1.  Does your workload run machine learning inference applications? 

      1.  [AWS Inferentia (Inf1)](https://aws.amazon.com/ec2/instance-types/inf1/) — Inf1 instances are built to support machine learning inference applications. Using Inf1 instances, customers can run large-scale machine learning inference applications, such as image recognition, speech recognition, natural language processing, personalization, and fraud detection. You can build a model in one of the popular machine learning frameworks, such as TensorFlow, PyTorch, or MXNet and use GPU instances, to train your model. After your machine learning model is trained to meet your requirements, you can deploy your model on Inf1 instances by using [AWS Neuron](https://aws.amazon.com/machine-learning/neuron/), a specialized software development kit (SDK) consisting of a compiler, runtime, and profiling tools that optimize the machine learning inference performance of Inferentia chips. 

   1.  Does your workload integrate with the low-level hardware to improve performance?  

      1.  [Field Programmable Gate Arrays (FPGA)](https://aws.amazon.com/ec2/instance-types/f1/) — Using FPGAs, you can optimize your workloads by having custom hardware-accelerated operation for your most demanding workloads. You can define your algorithms by leveraging supported general programming languages such as C or Go, or hardware-oriented languages such as Verilog or VHDL. 

   1.  Do you have at least four weeks of metrics and can predict that your traffic pattern and metrics will remain about the same in the future? 

      1.  Use [Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to get a machine learning recommendation on which compute configuration best matches your compute characteristics. 

   1.  Is your workload performance constrained by the CPU metrics?  

      1.  [Compute-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Compute_Optimized) instances are ideal for the workloads that require high performing processors.  

   1.  Is your workload performance constrained by the memory metrics?  

      1.  [Memory-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Memory_Optimized) instances deliver large amounts of memory to support memory intensive workloads. 

   1.  Is your workload performance constrained by IOPS? 

      1.  [Storage-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Storage_Optimized) instances are designed for workloads that require high, sequential read and write access (IOPS) to local storage. 

   1.  Do your workload characteristics represent a balanced need across all metrics? 

      1.  Does your workload CPU need to burst to handle spikes in traffic? 

         1.  [Burstable Performance](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Instance_Features) instances are similar to Compute Optimized instances except they offer the ability to burst past the fixed CPU baseline identified in a compute-optimized instance. 

      1.  [General Purpose](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#General_Purpose) instances provide a balance of all characteristics to support a variety of workloads. 

   1.  Is your compute instance running on Linux and constrained by network throughput on the network interface card? 

      1.  Review [Performance Question 5, Best Practice 2: Evaluate available networking features](https://docs.aws.amazon.com/wellarchitected/latest/performance-efficiency-pillar/network-architecture-selection.html) to find the right instance type and family to meet your performance needs. 

   1.  Does your workload need consistent and predictable instances in a specific Availability Zone that you can commit to for a year?  

      1.  [Reserved Instances](https://aws.amazon.com/ec2/pricing/reserved-instances/) confirms capacity reservations in a specific Availability Zone. Reserved Instances are ideal for required compute power in a specific Availability Zone.  

   1.  Does your workload have licenses that require dedicated hardware? 

      1.  [Dedicated Hosts](https://aws.amazon.com/ec2/dedicated-hosts/) support existing software licenses and help you meet compliance requirements. 

   1.  Does your compute solution burst and require synchronous processing? 

      1.  [On-Demand Instances](https://aws.amazon.com/ec2/pricing/on-demand/) let you use the compute capacity by the hour or second with no long-term commitment. These instances are good for bursting above performance baseline needs. 

   1.  Is your compute solution stateless, fault-tolerant, and asynchronous?  

      1.  [Spot Instances](https://aws.amazon.com/ec2/spot/) let you take advantage of unused instance capacity for your stateless, fault-tolerant workloads.  

1.  Are you running containers on [Fargate](https://aws.amazon.com/fargate/)? 

   1.  Is your task performance constrained by the memory or CPU? 

      1.  Use the [Task Size](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/capacity-tasksize.html) to adjust your memory or CPU. 

   1.  Is your performance being affected by your traffic pattern bursts? 

      1.  Use the [Auto Scaling](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/capacity-autoscaling.html) configuration to match your traffic patterns. 

1.  Is your compute solution on [Lambda](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-features.html)? 

   1.  Do you have at least four weeks of metrics and can predict that your traffic pattern and metrics will remain about the same in the future? 

      1.  Use [Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to get a machine learning recommendation on which compute configuration best matches your compute characteristics. 

   1.  Do you not have enough metrics to use AWS Compute Optimizer? 

      1.  If you do not have metrics available to use Compute Optimizer, use [AWS Lambda Power Tuning](https://docs.aws.amazon.com/lambda/latest/operatorguide/profile-functions.html) to help select the best configuration. 

   1.  Is your function performance constrained by the memory or CPU? 

      1.  Configure your [Lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-memory-console) to meet your performance needs metrics. 

   1.  Is your function timing out when running? 

      1.  Change the [timeout settings](https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html) 

   1.  Is your function performance constrained by bursts of activity and concurrency?  

      1.  Configure the [concurrency settings](https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html) to meet your performance requirements. 

   1.  Does your function run asynchronously and is failing on retries? 

      1.  Configure the maximum age of the event and the maximum retry limit in the [asynchronous configuration](https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html) settings. 

## Level of effort for the implementation plan: 
<a name="level-of-effort-for-the-implementation-plan-to-establish-this-best-practice-you-must-be-aware-of-your-current-compute-characteristics-and-metrics.-gathering-those-metrics-establishing-a-baseline-and-then-using-those-metrics-to-identify-the-ideal-compute-option-is-a-low-to-moderate-level-of-effort.-this-is-best-validated-by-load-tests-and-experimentation."></a>

To establish this best practice, you must be aware of your current compute characteristics and metrics. Gathering those metrics, establishing a baseline and then using those metrics to identify the ideal compute option is a *low* to *moderate* level of effort. This is best validated by load tests and experimentation. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 
+  [EC2 Instance Types ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Processor State Control for Your EC2 Instance ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html?ref=wellarchitected) 
+  [EKS Containers: EKS Worker Nodes ](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS Containers: Amazon ECS Container Instances ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2) ](https://www.youtube.com/watch?v=kMMybKqC2Y0&ref=wellarchitected) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system ](https://www.youtube.com/watch?v=rUY-00yFlE4&ref=wellarchitected) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1) ](https://www.youtube.com/watch?v=zt6jYJLK8sg&ref=wellarchitected) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF02-BP03 Collect compute-related metrics
<a name="perf_select_compute_collect_metrics"></a>

To understand how your compute resources are performing, you must record and track the utilization of various systems. This data can be used to make more accurate determinations about resource requirements.  

 Workloads can generate large volumes of data such as metrics, logs, and events. Determine if your existing storage, monitoring, and observability service can manage the data generated. Identify which metrics reflect resource utilization and can be collected, aggregated, and correlated on a single platform across. Those metrics should represent all your workload resources, applications, and services, so you can easily gain system-wide visibility and quickly identify performance improvement opportunities and issues.

 **Desired outcome:** All metrics related to the compute-related resources are identified, collected, aggregated, and correlated on a single platform with retention implemented to support cost and operational goals. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics.  
+  You only publish metrics to internal tools. 
+  You only use the default metrics recorded by your selected monitoring software. 
+  You only review metrics when there is an issue. 

 
 **Benefits of establishing this best practice:** To monitor the performance of your workloads, you must record multiple performance metrics over a period of time. These metrics allow you to detect anomalies in performance. They will also help gauge performance against business metrics to ensure that you are meeting your workload needs. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify, collect, aggregate, and correlate compute-related metrics. Using a service such as Amazon CloudWatch, can make the implementation quicker and easier to maintain. In addition to the default metrics recorded, identify and track additional system-level metrics within your workload. Record data such as CPU utilization, memory, disk I/O, and network inbound and outbound metrics to gain insight into utilization levels or bottlenecks. This data is crucial to understand how the workload is performing and how the compute solution is utilized. Use these metrics as part of a data-driven approach to actively tune and optimize your workload's resources.  

 **Implementation steps:** 

1.  Which compute solution metrics are important to track? 

   1.  [EC2 default metrics](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html) 

   1.  [Amazon ECS default metrics](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html) 

   1.  [EKS default metrics](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/kubernetes-eks-metrics.html) 

   1.  [Lambda default metrics](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-access-metrics.html) 

   1.  [EC2 memory and disk metrics](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html) 

1.  Do I currently have an approved logging and monitoring solution? 

   1.  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) 

   1.  [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/) 

   1.  [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/grafana/latest/userguide/prometheus-data-source.html) 

1.  Have I identified and configured my data retention policies to match my security and operational goals? 

   1.  [Default data retention for CloudWatch metrics](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 

   1.  [Default data retention for CloudWatch Logs](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 

1.  How do you deploy your metric and log aggregation agents? 

   1.  [AWS Systems Manager automation](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-automation.html?ref=wellarchitected) 

   1.  [OpenTelemetry Collector](https://aws-otel.github.io/docs/getting-started/collector) 

 **Level of effort for the Implementation Plan: **There is a *medium* level of effort to identify, track, collect, aggregate, and correlate metrics from all compute resources. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon CloudWatch documentation](https://docs.aws.amazon.com/cloudwatch/index.html?ref=wellarchitected) 
+  [Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch Agent](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html?ref=wellarchitected) 
+  [Accessing Amazon CloudWatch Logs for AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-logs.html?ref=wellarchitected) 
+  [Using CloudWatch Logs with container instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html?ref=wellarchitected) 
+  [Publish custom metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html?ref=wellarchitected) 
+  [AWS Answers: Centralized Logging](https://aws.amazon.com/answers/logging/centralized-logging/?ref=wellarchitected) 
+  [AWS Services That Publish CloudWatch Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CW_Support_For_AWS.html?ref=wellarchitected) 
+  [Monitoring Amazon EKS on AWS Fargate](https://aws.amazon.com/blogs/containers/monitoring-amazon-eks-on-aws-fargate-using-prometheus-and-grafana/) 

 
 **Related videos:** 
+  [Application Performance Management on AWS](https://www.youtube.com/watch?v=5T4stR-HFas&ref=wellarchitected) 
+  [Build a Monitoring Plan](https://www.youtube.com/watch?v=OMmiGETJpfU&ref=wellarchitected) 

 
 **Related examples:** 
+  [Level 100: Monitoring with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_with_cloudwatch_dashboards/) 
+  [Level 100: Monitoring Windows EC2 instance with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_windows_ec2_cloudwatch/) 
+  [Level 100: Monitoring an Amazon Linux EC2 instance with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_linux_ec2_cloudwatch/) 

# PERF02-BP04 Determine the required configuration by right-sizing
<a name="perf_select_compute_right_sizing"></a>

Analyze the various performance characteristics of your workload and how these characteristics relate to memory, network, I/O, and CPU usage. Use this data to choose resources that best match your workload’s profile. For example, a memory-intensive workload like a database may benefit from a higher memory per core ratio. However, a compute intensive workload may need a higher core count and frequency, but can be satisfied with a lower amount of memory per core.

 **Common anti-patterns:** 
+  You choose an instance with the largest values across all performance characteristics available for all workloads. 
+  You standardize all instances types to one type for ease of management. 
+  You optimize against standard synthetic benchmarks without validating the actual requirements of a particular workload. 
+  You keep the same infrastructure for a long period of time without reevaluating and integrating new offerings. 

 **Benefits of establishing this best practice:** When you are familiar with the requirements of your workload, you can compare these needs with available compute offerings and quickly experiment to determine which ones meet the needs of your workload most efficiently. This allows for optimal performance without overpaying for resources not required. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

Modify your workload configuration by right sizing. To optimize performance, overall efficiency, and cost effectiveness, determine first which resources your workload needs. Choose memory-optimized instances, such as the R-family of instances, for memory-intensive workloads like a database. For workloads that require higher compute capacity, choose the C-family of instances, or choose instances with higher core counts or higher core frequency. Choose I/O performance based on the needs of your workload instead of comparing against standard, synthetic benchmarks. For higher I/O performance, choose instances from the I-family of instances, [select I/O optimized Amazon EBS volumes](https://aws.amazon.com/premiumsupport/knowledge-center/optimize-ebs-provisioned-iops/), or choose instances with [instance store](https://aws.amazon.com/premiumsupport/knowledge-center/instance-store-vs-ebs/). For more detail on particular instance types, see [Amazon EC2 instance types](https://aws.amazon.com/ec2/instance-types/).

 Right sizing verifies that your workloads perform as well as possible while not overpaying on resources not needed. 

 **Implementation steps** 
+  Know your workload or analyze its resource requirements. 
+  Evaluate workloads separately. The AWS Cloud gives you flexibility and agility to right-size each workload on its own without needing to compromise. 
+  Create test environments to find the best match of compute offerings against your workload. 
+  Continually reevaluate new compute offerings, and compare against your workload’s needs. 
+  Routinely review new service offers for better price performance. 
+  Regularly conduct Well-Architected Framework Reviews. 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+  [PERF02-BP03 Collect compute-related metrics](perf_select_compute_collect_metrics.md) 
+  [PERF02-BP06 Continually evaluate compute needs based on metrics](perf_select_compute_use_metrics.md) 

 **Related documents:** 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/)  
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [Amazon EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [Amazon ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [Amazon EKS Containers: Amazon EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 
+  [How to choose compute option for startups](https://aws.amazon.com/startups/start-building/how-to-choose-compute-option/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF02-BP05 Use the available elasticity of resources
<a name="perf_select_compute_elasticity"></a>

The cloud provides the flexibility to expand and reduce your resources dynamically through a variety of mechanisms to meet changes in demand. Combining this elasticity with compute-related metrics, a workload can automatically respond to changes to use the resources it needs and only the resources it needs.

 **Common anti-patterns:** 
+  You overprovision to cover possible spikes. 
+  You react to alarms by manually increasing capacity. 
+  You increase capacity without considering provisioning time. 
+  You leave increased capacity after a scaling event instead of scaling back down. 
+  You monitor metrics that don’t directly reflect your workloads true requirements. 

 **Benefits of establishing this best practice:** Demand can be fixed, variable, follow a pattern or be spiky. Matching supply to demand delivers the lowest cost for a workload. Monitoring, testing, and configuring workload elasticity will optimize performance, save money, and improve reliability as usage demands change. Although a manual approach to this is possible, it is impractical at larger scales. An automated and metrics-based approach assures resources meet demands and any given time. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

Metric based automation should be used to take advantage of elasticity with the goal that the supply of resources you have matches the demand of the resources your workload requires. For example, you can use [Amazon CloudWatch metrics to monitor your resources](https://aws.amazon.com/startups/start-building/how-to-monitor-resources/), or use Amazon CloudWatch metrics for your Auto Scaling groups.

 Combined with compute-related metrics, a workload can automatically respond to changes and use the optimal set of resources to achieve its goal. You also must plan for provisioning time and potential resource failures. 

 Instances, containers, and functions provide mechanisms for elasticity either as a feature of the service, in the form of [Application Auto Scaling](https://aws.amazon.com/autoscaling/), or in combination with [Amazon EC2 Auto Scaling](https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html). Use elasticity in your architecture to verify that you have sufficient capacity to meet performance requirements at a wide variety of scales of use. 

 Validate your metrics for scaling up or down elastic resources against the type of workload being deployed. As an example, if you are deploying a video transcoding application, 100% CPU utilization is expected and should not be your primary metric. Alternatively, you can measure against the queue depth of transcoding jobs waiting to scale your instance types. 

 Workload deployments need to handle both scale up and scale down events. Scaling down workload components safely is as critical as scaling up resources when demand dictates. 

 Create test scenarios for scaling events to verify that the workload behaves as expected. 

 **Implementation steps** 
+  Leverage historical data to analyze your workload’s resource demands over time. Ask specific questions like: 
  +  Is your workload steady and increasing over time at a known rate? 
  +  Does your workload increase and decrease in seasonal, repeatable patterns? 
  +  Is your workload spiky? Can the spikes be anticipated or predicted? 
+  Leverage monitoring services and historical data as much as possible. 
+  Tagging resources can help with monitoring. When using tags, refer to [tagging best practices](https://docs.aws.amazon.com/whitepapers/latest/tagging-best-practices/tagging-best-practices.html). Additionally, [tags can help you manage, identify, and organize resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). 
+  With AWS, you can use a number of different approaches to match supply with demand. The cost optimization pillar best practices ([COST09-BP01 through COST09-03](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/manage-demand-and-supply-resources.html)) describe how to use the following approaches to cost: 
  + [ COST09-BP01 Perform an analysis on the workload demand ](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/cost_manage_demand_resources_cost_analysis.html)
  + [ COST09-BP02 Implement a buffer or throttle to manage demand ](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/cost_manage_demand_resources_buffer_throttle.html)
  + [ COST09-BP03 Supply resources dynamically ](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/cost_manage_demand_resources_dynamic.html)
+  Create test scenarios for scale down events to verify that the workload behaves as expected. 
+  Most non-production instances should be stopped when they are not being used. 
+  For storage needs when using Amazon Elastic Block Store (Amazon EBS), take advantage of [volume-based elasticity](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-modify-volume.html). 
+  For [Amazon Elastic Compute Cloud (Amazon EC2)](https://aws.amazon.com/ec2/), consider using [Auto Scaling groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html), which allow you to optimize performance and cost by automatically increasing the number of compute instances during demand spikes and decreasing capacity when demand decreases. 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+  [PERF02-BP03 Collect compute-related metrics](perf_select_compute_collect_metrics.md) 
+  [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md) 
+  [PERF02-BP06 Continually evaluate compute needs based on metrics](perf_select_compute_use_metrics.md) 

 **Related documents:** 
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [Amazon EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [Amazon ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [Amazon EKS Containers: Amazon EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 

 **Related examples:** 
+  [Amazon EC2 Auto Scaling Group Examples](https://github.com/aws-samples/amazon-ec2-auto-scaling-group-examples) 
+  [Amazon EFS Tutorials](https://github.com/aws-samples/amazon-efs-tutorial) 

# PERF02-BP06 Continually evaluate compute needs based on metrics
<a name="perf_select_compute_use_metrics"></a>

Use a data-driven approach to continually evaluate and optimize the compute resources for your workload over time.

 **Desired outcome:** Use system-level metrics to actively monitor the behavior and requirements of your workload over time. Evaluate the demands of your workload against available resources based on the collected data, and make changes to your compute environment to best match your workload's profile. For example, a workload might be observed over time to be more memory-intensive than initially specified, so moving to a different instance family or size could improve both performance and efficiency. 

 **Common anti-patterns:** 
+  Monitoring system-level metrics to gain insight into your workload and not re-evaluating compute needs. 
+  Architecting your compute needs for peak workload requirements. 
+  Oversizing the existing compute solution to meet scaling or performance requirements when moving to an alternative compute solution would more efficiently match your workload characteristics. 

 **Benefits of establishing this best practice:** Optimized compute resources based on real-world data and your desired balance of cost and performance. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

Use a data-driven approach to optimize compute resources based on observed workload behavior. To achieve maximum performance and efficiency, use the data gathered over time from your workload to continually tune and optimize your resources. Look at the trends in your workload's usage of current resources and determine where you can make changes to better match your workload's needs. When resources are over-committed, system performance degrades, and when resources are not adequately used, the system is operating less efficiently and at a higher cost. 

 To optimize performance and resource utilization, you need a unified operational view, real-time granular data, and a historical reference. You can create automated dashboards to visualize this data and derive operational and utilization insights. 

 **Implementation steps** 

1.  Collect compute-related metrics over time. 

1.  Compare workload metrics against available resources in your selected compute solution. 

1.  Determine any required configuration changes by right-sizing the existing solution or evaluating alternative compute solutions. 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+  [PERF02-BP01 Evaluate the available compute options](perf_select_compute_evaluate_options.md) 
+  [PERF02-BP02 Understand the available compute configuration options](perf_select_compute_config_options.md) 
+  [PERF02-BP03 Collect compute-related metrics](perf_select_compute_collect_metrics.md) 
+  [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md) 

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 
+  [EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [Amazon ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [Amazon EKS Containers: Amazon EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+ [ Best practices for working with AWS Lambda functions ](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration)

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 
+ [ Selecting and optimizing Amazon EC2 instances ](https://www.youtube.com/watch?v=Vz0HZ6hlpgM)

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF 3. How do you select your storage solution?
<a name="perf-03"></a>

 The most effective storage solution for a system varies based on the kind of access operation (block, file, or object), patterns of access (random or sequential), required throughput, frequency of access (online, offline, archival), frequency of update (WORM, dynamic), and availability and durability constraints. Well-architected systems use multiple storage solutions and activates different features to improve performance and use resources efficiently. 

**Topics**
+ [PERF03-BP01 Understand storage characteristics and requirements](perf_right_storage_solution_understand_char.md)
+ [PERF03-BP02 Evaluate available configuration options](perf_right_storage_solution_evaluated_options.md)
+ [PERF03-BP03 Make decisions based on access patterns and metrics](perf_right_storage_solution_optimize_patterns.md)

# PERF03-BP01 Understand storage characteristics and requirements
<a name="perf_right_storage_solution_understand_char"></a>

 Identify and document the workload storage needs and define the storage characteristics of each location. Examples of storage characteristics include: shareable access, file size, growth rate, throughput, IOPS, latency, access patterns, and persistence of data. Use these characteristics to evaluate if block, file, object, or instance storage services are the most efficient solution for your storage needs. 

 **Desired outcome:** Identify and document the storage requirements per storage requirement and evaluate the available storage solutions. Based on the key storage characteristics, your team will understand how the selected storage services will benefit your workload performance. Key criteria include data access patterns, growth rate, scaling needs, and latency requirements. 

 **Common anti-patterns:** 
+  You only use one storage type, such as Amazon Elastic Block Store (Amazon EBS), for all workloads. 
+  You assume that all workloads have similar storage access performance requirements. 

 **Benefits of establishing this best practice:** Selecting the storage solution based on the identified and required characteristics will help improve your workloads performance, decrease costs and lower your operational efforts in maintaining your workload. Your workload performance will benefit from the solution, configuration, and location of the storage service. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify your workload’s most important storage performance metrics and implement improvements as part of a data-driven approach, using benchmarking or load testing. Use this data to identify where your storage solution is constrained, and examine configuration options to improve the solution. Determine the expected growth rate for your workload and choose a storage solution that will meet those rates. Research the AWS storage offerings to determine the correct storage solution for your various workload needs. Provisioning storage solutions in AWS increases the opportunity for you to test storage offerings and determine if they are appropriate for your workload needs. 


| AWS service | Key characteristics | Common use cases | 
| --- | --- | --- | 
| Amazon S3 |  99.999999999% durability, unlimited growth, accessible from anywhere, several cost models based on access and resiliency  |  Cloud-native application data, data archiving, and backups, analytics, data lakes, static website hosting, IoT data   | 
| Amazon Glacier |  Seconds to hours latency, unlimited growth, lowest cost, long-term storage  |  Data archiving, media archives, long-term backup retention.  | 
| Amazon EBS | Storage size requires management and monitoring, low latency, persistent storage, 99.8% to 99.9% durability, most volume types are accessible only from one EC2 instance. |  COTS applications, I/O intensive applications, relational and NoSQL databases, backup and recovery  | 
| EC2 Instance Store |  Pre-determined storage size, lowest latency, not persisted, accessible only from one EC2 instance  |  COTS applications, I/O intensive applications, in-memory data store  | 
| Amazon EFS |  99.999999999% durability, unlimited growth, accessible by multiple compute services  |  Modernized applications sharing files across multiple compute services, file storage for scaling content management systems  | 
| Amazon FSx |  Supports four file systems (NetApp, OpenZFS, Windows File Server, and Amazon FSx for Lustre), storage available different per file system, accessible by multiple compute services  |  Cloud native workloads, private cloud bursting, migrated workloads that require a specific file system, VMC, ERP systems, on-premises file storage and backups   | 
| Snow family |  Portable devices, 256-bit encryption, NFS endpoint, on-board computing, TBs of storage  |  Migrating data to the cloud, storage, and computing in extreme on-premises conditions, disaster recovery, remote data collection  | 
| AWS Storage Gateway |  Provides low-latency on-premises access to cloud-backed storage, fully managed on-premises cache   |  On-premises data to cloud migrations, populate cloud data lakes from on-premises sources, modernized file sharing.  | 

 **Implementation steps:** 

1. Use benchmarking or load tests to collect the key characteristics of your storage needs. Key characteristics include: 

   1. Shareable (what components access this storage) 

   1. Growth rate 

   1. Throughput 

   1. Latency 

   1. I/O size 

   1. Durability 

   1. Access patterns (reads vs writes, frequency, spikey, or consistent) 

1. Identify the type of storage solution that supports your storage characteristics. 

   1. [Amazon S3](https://aws.amazon.com/s3/) is an object storage service with unlimited scalability, high availability, and multiple options for accessibility. Transferring and accessing objects in and out of Amazon S3 can use a service, such as [Transfer Acceleration](https://aws.amazon.com/s3/transfer-acceleration/) or [Access Points](https://aws.amazon.com/s3/features/access-points/) to support your location, security needs, and access patterns. Use the [Amazon S3 performance guidelines](https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-guidelines.html) to help you optimize your Amazon S3 configuration to meet your workload performance needs. 

   1. [Amazon Glacier](https://aws.amazon.com/s3/storage-classes/glacier/) is a storage class of Amazon S3 built for data archiving. You can choose from three archiving solutions ranging from millisecond access to 5-12 hour access with different cost and security options. Amazon Glacier can help you meet performance requirements by implementing a data lifecycle that supports your business requirements and data characteristics. 

   1. [Amazon Elastic Block Store (Amazon EBS)](https://aws.amazon.com/ebs/) is a high-performance block storage service designed for Amazon Elastic Compute Cloud (Amazon EC2). You can choose from [SSD- or HDD-based](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html) solutions with different characteristics that prioritize [IOPS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/provisioned-iops.html) or [throughput](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/hdd-vols.html). EBS volumes are well suited for high-performance workloads, primary storage for file systems, databases, or applications that can only access attached stage systems. 

   1. [Amazon EC2 Instance Store](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html) is similar to Amazon EBS as it attaches to an Amazon EC2 instance however, the Instance Store is only temporary storage that should ideally be used as a buffer, cache, or other temporary content. You cannot detach an Instance Store and all data is lost if the instance shuts down. Instance Stores can be used for high I/O performance and low latency use cases where data doesn’t need to persist. 

   1. [Amazon Elastic File System (Amazon EFS)](https://aws.amazon.com/efs/) is a mountable file system that can be accessed by multiple types of compute solutions. Amazon EFS automatically grows and shrinks storage and is performance-optimized to deliver consistent low latencies. EFS has [two performance configuration modes](https://docs.aws.amazon.com/efs/latest/ug/performance.html): General Purpose and Max I/O. General Purpose has a sub-millisecond read latency and a single-digit millisecond write latency. The Max I/O feature can support thousands of compute instance requiring a shared file system. Amazon EFS supports [two throughput modes](https://docs.aws.amazon.com/efs/latest/ug/managing-throughput.html): Bursting and Provisioned. A workload that experiences a spikey access pattern will benefit from the bursting throughput mode while a workload that is consistently high would be performant with a provisioned throughput mode. 

   1. [Amazon FSx](https://aws.amazon.com/fsx/) is built on the latest AWS compute solutions to support four commonly used file systems: NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. Amazon FSx [latency, throughput, and IOPS](https://aws.amazon.com/fsx/when-to-choose-fsx/) vary per file system and should be considered when selecting the right file system for your workload needs. 

   1. [AWS Snow Family](https://aws.amazon.com/snow/) are storage and compute devices that support online and offline data migration to the cloud and data storage and computing on premises. AWS Snow devices support collecting large amounts of on-premises data, processing of that data and moving that data to the cloud. There are several [documented performance best practices](https://docs.aws.amazon.com/snowball/latest/developer-guide/performance.html) when it comes to the number of files, file sizes, and compression. 

   1. [AWS Storage Gateway](https://aws.amazon.com/storagegateway/) provides on-premises applications access to cloud-based storage. AWS Storage Gateway supports multiple cloud storage services including Amazon S3, Amazon Glacier, Amazon FSx, and Amazon EBS. It supports a number of protocols such as iSCSI, SMB, and NFS. It provides low-latency performance by caching frequently accessed data on premises and only sends changed data and compressed data to AWS. 

1. After you have experimented with your new storage solution and identified the optimal configuration, plan your migration and validate your performance metrics. This is a continual process, and should be reevaluated when key characteristics change or available services or options change. 

 **Level of effort for the implementation plan: **If a workload is moving from one storage solution to another, there could be a *moderate* level of effort involved in refactoring the application.   

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS Volume Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 Storage](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS: Amazon EFS Performance](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [Amazon FSx for Lustre Performance](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [Amazon FSx for Windows File Server Performance](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+ [Amazon FSx for NetApp ONTAP performance](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/performance.html)
+ [Amazon FSx for OpenZFS performance](https://docs.aws.amazon.com/fsx/latest/OpenZFSGuide/performance.html)
+  [Amazon Glacier: Amazon Glacier Documentation](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3: Request Rate and Performance Considerations](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/) 
+ [AWS Snow Family](https://aws.amazon.com/snow/#Feature_comparison)
+  [EBS I/O Characteristics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 

 **Related videos:** 
+  [Deep dive on Amazon EBS (STG303-R1)](https://www.youtube.com/watch?v=wsMWANWNoqQ) 
+  [Optimize your storage performance with Amazon S3 (STG343)](https://www.youtube.com/watch?v=54AhwfME6wI) 

 **Related examples:** 
+  [Amazon EFS CSI Driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver) 
+  [Amazon EBS CSI Driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) 
+  [Amazon EFS Utilities](https://github.com/aws/efs-utils) 
+  [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale) 
+  [Amazon S3 Examples](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 
+ [Amazon FSx for Lustre Container Storage Interface (CSI) Driver](https://github.com/kubernetes-sigs/aws-fsx-csi-driver)

# PERF03-BP02 Evaluate available configuration options
<a name="perf_right_storage_solution_evaluated_options"></a>

 Evaluate the various characteristics and configuration options and how they relate to storage. Understand where and how to use provisioned IOPS, SSDs, magnetic storage, object storage, archival storage, or ephemeral storage to optimize storage space and performance for your workload. 

 [Amazon EBS](https://aws.amazon.com/ebs) provides a range of options that allow you to optimize storage performance and cost for your workload. These options are divided into two major categories: SSD-backed storage for transactional workloads, such as databases and boot volumes (performance depends primarily on IOPS), and HDD-backed storage for throughput-intensive workloads, such as MapReduce and log processing (performance depends primarily on MB/s). 

 SSD-backed volumes include the highest performance provisioned IOPS SSD for latency-sensitive transactional workloads and general-purpose SSD that balance price and performance for a wide variety of transactional data. 

 [Amazon S3 transfer acceleration](https://aws.amazon.com/s3/transfer-acceleration/) allows fast transfer of files over long distances between your client and your S3 bucket. Transfer acceleration leverages Amazon CloudFront globally distributed edge locations to route data over an optimized network path. For a workload in an S3 bucket that has intensive GET requests, use Amazon S3 with CloudFront. When uploading large files, use multi-part uploads with multiple parts uploading at the same time to help maximize network throughput. 

 [Amazon Elastic File System (Amazon EFS)](https://aws.amazon.com/efs/) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. To support a wide variety of cloud storage workloads, Amazon EFS offers two performance modes: general purpose performance mode, and max I/O performance mode. There are also two throughput modes to choose from for your file system: Bursting Throughput, and Provisioned Throughput. To determine which settings to use for your workload, see the [Amazon EFS User Guide](https://docs.aws.amazon.com/efs/latest/ug/performance.html). 

 [Amazon FSx](https://aws.amazon.com/fsx/) provides four file systems to choose from: [Amazon FSx for Windows File Server](https://aws.amazon.com/fsx/windows/) for enterprise workloads, [Amazon FSx for Lustre](https://aws.amazon.com/fsx/lustre/) for high-performance workloads, [Amazon FSx for NetApp ONTAP](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/index.html) for NetApps popular ONTAP file system, and [Amazon FSx for OpenZFS](https://docs.aws.amazon.com/fsx/latest/OpenZFSGuide/what-is-fsx.html) for Linux-based file servers. FSx is SSD-backed and is designed to deliver fast, predictable, scalable, and consistent performance. Amazon FSx file systems deliver sustained high read and write speeds and consistent low latency data access. You can choose the throughput level you need to match your workload’s needs. 

 **Common anti-patterns:** 
+  You only use one storage type, such as Amazon EBS, for all workloads. 
+  You use Provisioned IOPS for all workloads without real-world testing against all storage tiers. 
+  You assume that all workloads have similar storage access performance requirements. 

 **Benefits of establishing this best practice:** Evaluating all storage service options can reduce the cost of infrastructure and the effort required to maintain your workloads. It can potentially accelerate your time to market for deploying new services and features. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Determine storage characteristics: When you evaluate a storage solution, determine which storage characteristics you require, such as ability to share, file size, cache size, latency, throughput, and persistence of data. Then match your requirements to the AWS service that best fits your needs. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/?ref=wellarchitected) 
+  [Amazon EBS Volume Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 Storage](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS: Amazon EFS Performance](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [Amazon FSx for Lustre Performance](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [Amazon FSx for Windows File Server Performance](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+  [Amazon Glacier: Amazon Glacier Documentation](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3: Request Rate and Performance Considerations](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/?ref=wellarchitected) 
+  [EBS I/O Characteristics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 

 **Related videos:** 
+  [Deep dive on Amazon EBS (STG303-R1)](https://www.youtube.com/watch?v=wsMWANWNoqQ) 
+  [Optimize your storage performance with Amazon S3 (STG343)](https://www.youtube.com/watch?v=54AhwfME6wI) 

 **Related examples:** 
+  [Amazon EFS CSI Driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver) 
+  [Amazon EBS CSI Driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) 
+  [Amazon EFS Utilities](https://github.com/aws/efs-utils) 
+  [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale) 
+  [Amazon S3 Examples](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 

# PERF03-BP03 Make decisions based on access patterns and metrics
<a name="perf_right_storage_solution_optimize_patterns"></a>

 Choose storage systems based on your workload's access patterns and configure them by determining how the workload accesses data. Increase storage efficiency by choosing object storage over block storage. Configure the storage options you choose to match your data access patterns. 

 How you access data impacts how the storage solution performs. Select the storage solution that aligns best to your access patterns, or consider changing your access patterns to align with the storage solution to maximize performance. 

 Creating a RAID 0 array allows you to achieve a higher level of performance for a file system than what you can provision on a single volume. Consider using RAID 0 when I/O performance is more important than fault tolerance. For example, you could use it with a heavily used database where data replication is already set up separately. 

 Select appropriate storage metrics for your workload across all of the storage options consumed for the workload. When using filesystems that use burst credits, create alarms to let you know when you are approaching those credit limits. You must create storage dashboards to show the overall workload storage health. 

 For storage systems that are a fixed size, such as Amazon EBS or Amazon FSx, ensure that you are monitoring the amount of storage used versus the overall storage size and create automation if possible to increase the storage size when reaching a threshold 

 **Common anti-patterns:** 
+  You assume that storage performance is adequate if customers are not complaining. 
+  You only use one tier of storage, assuming all workloads fit within that tier. 

 **Benefits of establishing this best practice:** You need a unified operational view, real-time granular data, and historical reference to optimize performance and resource utilization. You can create automatic dashboards and data with one-second granularity to perform metric math on your data and derive operational and utilization insights for your storage needs. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Optimize your storage usage and access patterns: Choose storage systems based on your workload's access patterns and the characteristics of the available storage options. Determine the best place to store data that will allow you to meet your requirements while reducing overhead. Use performance optimizations and access patterns when configuring and interacting with data based on the characteristics of your storage (for example, striping volumes or partitioning data). 

 Select appropriate metrics for storage options: Ensure that you select the appropriate storage metrics for the workload. Each storage option offers various metrics to track how your workload performs over time. Ensure that you are measuring against any storage burst metrics (for example, monitoring burst credits for Amazon EFS). For storage systems that are fixed sized, such as Amazon Elastic Block Store or Amazon FSx, ensure that you are monitoring the amount of storage used versus the overall storage size. Create automation when possible to increase the storage size when reaching a threshold. 

 Monitor metrics: Amazon CloudWatch can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS Volume Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 Storage](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS: Amazon EFS Performance](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [Amazon FSx for Lustre Performance](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [Amazon FSx for Windows File Server Performance](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+  [Amazon Glacier: Amazon Glacier Documentation](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3: Request Rate and Performance Considerations](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/) 
+  [EBS I/O Characteristics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 
+  [Monitoring and understanding Amazon EBS performance using Amazon CloudWatch](https://aws.amazon.com/blogs/storage/valuable-tips-for-monitoring-and-understanding-amazon-ebs-performance-using-amazon-cloudwatch/) 

 **Related videos:** 
+  [Deep dive on Amazon EBS (STG303-R1)](https://www.youtube.com/watch?v=wsMWANWNoqQ) 
+  [Optimize your storage performance with Amazon S3 (STG343)](https://www.youtube.com/watch?v=54AhwfME6wI) 

 **Related examples:** 
+  [Amazon EFS CSI Driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver) 
+  [Amazon EBS CSI Driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) 
+  [Amazon EFS Utilities](https://github.com/aws/efs-utils) 
+  [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale) 
+  [Amazon S3 Examples](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 

# PERF 4. How do you select your database solution?
<a name="perf-04"></a>

 The most effective database solution for a system varies based on requirements for availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Many systems use different database solutions for various subsystems and activate different features to improve performance. Selecting the wrong database solution and features for a system can lead to lower performance efficiency. 

**Topics**
+ [PERF04-BP01 Understand data characteristics](perf_right_database_solution_understand_char.md)
+ [PERF04-BP02 Evaluate the available options](perf_right_database_solution_evaluate_options.md)
+ [PERF04-BP03 Collect and record database performance metrics](perf_right_database_solution_collect_metrics.md)
+ [PERF04-BP04 Choose data storage based on access patterns](perf_right_database_solution_access_patterns.md)
+ [PERF04-BP05 Optimize data storage based on access patterns and metrics](perf_right_database_solution_optimize_metrics.md)

# PERF04-BP01 Understand data characteristics
<a name="perf_right_database_solution_understand_char"></a>

 Choose your data management solutions to optimally match the characteristics, access patterns, and requirements of your workload datasets. When selecting and implementing a data management solution, you must ensure that the querying, scaling, and storage characteristics support the workload data requirements. Learn how various database options match your data models, and which configuration options are best for your use-case.  

 AWS provides numerous database engines including relational, key-value, document, in-memory, graph, time series, and ledger databases. Each data management solution has options and configurations available to you to support your use-cases and data models. Your workload might be able to use several different database solutions, based on the data characteristics. By selecting the best database solutions to a specific problem, you can break away from monolithic databases, with the one-size-fits-all approach that is restrictive and focus on managing data to meet your customer's need. 

 **Desired outcome:** The workload data characteristics are documented with enough detail to facilitate selection and configuration of supporting database solutions, and provide insight into potential alternatives. 

 **Common anti-patterns:** 
+  Not considering ways to segment large datasets into smaller collections of data that have similar characteristics, resulting in missing opportunities to use more purpose-built databases that better match data and growth characteristics. 
+  Not identifying the data access patterns up front, which leads to costly and complex rework later. 
+  Limiting growth by using data storage strategies that don’t scale as quickly as is needed 
+  Choosing one database type and vendor for all workloads. 
+  Sticking to one database solution because there is internal experience and knowledge of one particular type of database solution. 
+  Keeping a database solution because it worked well in an on-premises environment. 

 **Benefits of establishing this best practice:** Be familiar with all of the AWS database solutions so that you can determine the correct database solution for your various workloads. After you select the appropriate database solution for your workload, you can quickly experiment on each of those database offerings to determine if they continue to meet your workload needs. 

 **Level of risk exposed if this best practice is not established:** High 
+  Potential cost savings may not be identified. 
+  Data may not be secured to the level required. 
+  Data access and storage performance may not be optimal. 

## Implementation guidance
<a name="implementation-guidance"></a>

 Define the data characteristics and access patterns of your workload. Review all available database solutions to identify which solution supports your data requirements. Within a given workload, multiple databases may be selected. Evaluate each service or group of services and assess them individually. If potential alternative data management solutions are identified for part or all of the data, experiment with alternative implementations that might unlock cost, security, performance, and reliability benefits. Update existing documentation, should a new data management approach be adopted. 


|  **Type**  |  **AWS Services**  |  **Key Characteristics**  |  **Common use-cases**  | 
| --- | --- | --- | --- | 
|  Relational  |  Amazon RDS, Amazon Aurora  |  Referential integrity, ACID transactions, schema on write  |  ERP, CRM, Commercial off-the-shelf software  | 
|  Key Value  |  Amazon DynamoDB  |  High throughput, low latency, near-infinite scalability  |  Shopping carts (ecommerce), product catalogs, chat applications  | 
|  Document  |  Amazon DocumentDB  |  Store JSON documents and query on any attribute  |  Content Management (CMS), customer profiles, mobile applications  | 
|  In Memory  |  Amazon ElastiCache, Amazon MemoryDB  |  Microsecond latency  |  Caching, game leaderboards  | 
|  Graph  |  Amazon Neptune  |  Highly relational data where the relationships between data have meaning  |  Social networks, personalization engines, fraud detection  | 
|  Time Series  |  Amazon Timestream  |  Data where the primary dimension is time  |  DevOps, IoT, Monitoring  | 
|  Wide column  |  Amazon Keyspaces  |  Cassandra workloads.  |  Industrial equipment maintenance, route optimization  | 
|  Ledger  |  Amazon QLDB  |  Immutable and cryptographically verifiable ledger of changes  |  Systems of record, healthcare, supply chains, financial institutions  | 

 **Implementation steps** 

1.  How is the data structured? (for example, unstructured, key-value, semi-structured, relational) 

   1.  If the data is unstructured, consider an object-store such as [Amazon S3](https://aws.amazon.com/products/storage/data-lake-storage/) or a NoSQL database such as [Amazon DocumentDB.](https://aws.amazon.com/documentdb/) 

   1.  For key-value data, consider [DynamoDB](https://aws.amazon.com/documentdb/), [ElastiCache for Redis](https://aws.amazon.com/elasticache/redis/) or [MemoryDB.](https://aws.amazon.com/memorydb/) 

   1.  If the data has a relational structure, what level of referential integrity is required? 

      1.  For foreign key constraints, relational databases such as [Amazon RDS](https://aws.amazon.com/rds/) and [Aurora](https://aws.amazon.com/rds/aurora/) can provide this level of integrity. 

      1.  Typically, within a NoSQL data-model, you would de-normalize your data into a single document or collection of documents to be retrieved in a single request rather than joining across documents or tables.  

1.  Is ACID (atomicity, consistency, isolation, durability) compliance required? 

   1.  If the ACID properties associated with relational databases are required, consider a relational database such as [Amazon RDS](https://aws.amazon.com/rds/) and [Aurora.](https://aws.amazon.com/rds/aurora/) 

1.  What consistency model is required? 

   1.  If your application can tolerate eventual consistency, consider a NoSQL implementation. Review the other characteristics to help choose which [NoSQL database](https://aws.amazon.com/nosql/) is most appropriate. 

   1.  If strong consistency is required, you can use strongly consistent reads with [DynamoDB](https://aws.amazon.com/documentdb/) or a relational database such as [Amazon RDS](https://aws.amazon.com/rds/). 

1.  What query and result formats must be supported? (for example, SQL, CSV, Parque, Avro, JSON, etc.) 

1.  What data types, field sizes and overall quantities are present? (for example, text, numeric, spatial, time-series calculated, binary or blob, document) 

1.  How will the storage requirements change over time? How does this impact scalability? 

   1.  Serverless databases such as [DynamoDB](https://aws.amazon.com/documentdb/) and [Amazon Quantum Ledger Database](https://aws.amazon.com/qldb/) will scale dynamically up to near-unlimited storage. 

   1.  Relational databases have upper bounds on provisioned storage, and often must be horizontally partitioned via mechanisms such as sharding once they reach these limits. 

1.  What is the proportion of read queries in relation to write queries? Would caching be likely to improve performance? 

   1.  Read-heavy workloads can benefit from a caching layer, this could be [ElastiCache](https://aws.amazon.com/elasticache/) or [DAX](https://aws.amazon.com/dynamodb/dax/) if the database is DynamoDB. 

   1.  Reads can also be offloaded to read replicas with relational databases such as [Amazon RDS](https://aws.amazon.com/rds/). 

1.  Does storage and modification (OLTP - Online Transaction Processing) or retrieval and reporting (OLAP - Online Analytical Processing) have a higher priority? 

   1.  For high-throughput transactional processing, consider a NoSQL database such as DynamoDB or Amazon DocumentDB. 

   1.  For analytical queries, consider a columnar database such as [Amazon Redshift](https://aws.amazon.com/redshift/) or exporting the data to Amazon S3 and performing analytics using [Athena](https://aws.amazon.com/athena/) or [QuickSight.](https://aws.amazon.com/quicksight/) 

1.  How sensitive is this data and what level of protection and encryption does it require? 

   1.  All Amazon RDS and Aurora engines support data encryption at rest using AWS KMS. Microsoft SQL Server and Oracle also support native Transparent Data Encryption (TDE) when using Amazon RDS. 

   1.  For DynamoDB, you can use fine-grained access control with [IAM](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/access-control-overview.html) to control who has access to what data at the key level. 

1.  What level of durability does the data require? 

   1.  Aurora automatically replicates your data across three Availability Zones within a Region, meaning your data is highly durable with less chance of data loss. 

   1.  DynamoDB is automatically replicated across multiple Availability Zones, providing high availability and data durability. 

   1.  Amazon S3 provides 11 9s of durability. Many database services such as Amazon RDS and DynamoDB support exporting data to Amazon S3 for long-term retention and archival. 

1.  Do [Recovery Time Objective (RTO) or Recovery Point Objectives (RPO)](https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/plan-for-disaster-recovery-dr.html) requirements influence the solution? 

   1.  Amazon RDS, Aurora, DynamoDB, Amazon DocumentDB, and Neptune all support point in time recovery and on-demand backup and restore.  

   1.  For high availability requirements, DynamoDB tables can be replicated globally using the [Global Tables](https://aws.amazon.com/dynamodb/global-tables/) feature and Aurora clusters can be replicated across multiple Regions using the Global database feature. Additionally, S3 buckets can be replicated across AWS Regions using cross-region replication.  

1.  Is there a desire to move away from commercial database engines / licensing costs? 

   1.  Consider open-source engines such as PostgreSQL and MySQL on Amazon RDS or Aurora 

   1.  Leverage [AWS DMS](https://aws.amazon.com/dms/) and [AWS SCT](https://aws.amazon.com/dms/schema-conversion-tool/) to perform migrations from commercial database engines to open-source 

1.  What is the operational expectation for the database? Is moving to managed services a primary concern? 

   1.  Leveraging Amazon RDS instead of Amazon EC2, and DynamoDB or Amazon DocumentDB instead of self-hosting a NoSQL database can reduce operational overhead. 

1.  How is the database currently accessed? Is it only application access, or are there Business Intelligence (BI) users and other connected off-the-shelf applications? 

   1.  If you have dependencies on external tooling then you may have to maintain compatibility with the databases they support. Amazon RDS is fully compatible with the difference engine versions that it supports including Microsoft SQL Server, Oracle, MySQL, and PostgreSQL. 

1.  The following is a list of potential data management services, and where these can best be used: 

   1.  Relational databases store data with predefined schemas and relationships between them. These databases are designed to support ACID (atomicity, consistency, isolation, durability) transactions, and maintain referential integrity and strong data consistency. Many traditional applications, enterprise resource planning (ERP), customer relationship management (CRM), and ecommerce use relational databases to store their data. You can run many of these database engines on Amazon EC2, or choose from one of the AWS-managed [database services](https://aws.amazon.com/products/databases/): [Amazon Aurora](https://aws.amazon.com/rds/aurora), [Amazon RDS](https://aws.amazon.com/rds), and [Amazon Redshift](https://aws.amazon.com/redshift). 

   1.  Key-value databases are optimized for common access patterns, typically to store and retrieve large volumes of data. These databases deliver quick response times, even in extreme volumes of concurrent requests. High-traffic web apps, ecommerce systems, and gaming applications are typical use-cases for key-value databases. In AWS, you can utilize [Amazon DynamoDB](https://aws.amazon.com/dynamodb/), a fully managed, multi-Region, multi-master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. 

   1.  In-memory databases are used for applications that require real-time access to data, lowest latency and highest throughput. By storing data directly in memory, these databases deliver microsecond latency to applications where millisecond latency is not enough. You may use in-memory databases for application caching, session management, gaming leaderboards, and geospatial applications. [Amazon ElastiCache](https://aws.amazon.com/elasticache/) is a fully managed in-memory data store, compatible with [Redis](https://aws.amazon.com/elasticache/redis/) or [Memcached](https://aws.amazon.com/elasticache/memcached). In case the applications also higher durability requirements, [Amazon MemoryDB for Redis](https://aws.amazon.com/memorydb/) offers this in combination being a durable, in-memory database service for ultra-fast performance. 

   1.  A document database is designed to store semistructured data as JSON-like documents. These databases help developers build and update applications such as content management, catalogs, and user profiles quickly. [Amazon DocumentDB](https://aws.amazon.com/documentdb/) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. 

   1.  A wide column store is a type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. You typically see a wide column store in high scale industrial apps for equipment maintenance, fleet management, and route optimization. [Amazon Keyspaces (for Apache Cassandra)](https://aws.amazon.com/mcs/) is a wide column scalable, highly available, and managed Apache Cassandra–compatible database service. 

   1.  Graph databases are for applications that must navigate and query millions of relationships between highly connected graph datasets with millisecond latency at large scale. Many companies use graph databases for fraud detection, social networking, and recommendation engines. [Amazon Neptune](https://aws.amazon.com/neptune/) is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. 

   1.  Time-series databases efficiently collect, synthesize, and derive insights from data that changes over time. IoT applications, DevOps, and industrial telemetry can utilize time-series databases. [Amazon Timestream](https://aws.amazon.com/timestream/) is a fast, scalable, fully managed time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day. 

   1.  Ledger databases provide a centralized and trusted authority to maintain a scalable, immutable, and cryptographically verifiable record of transactions for every application. We see ledger databases used for systems of record, supply chain, registrations, and even banking transactions. [Amazon Quantum Ledger Database (Amazon QLDB)](https://aws.amazon.com/qldb/) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority. Amazon QLDB tracks every application data change and maintains a complete and verifiable history of changes over time. 

 **Level of effort for the implementation plan: **If a workload is moving from one database solution to another, there could be a *high* level of effort involved in refactoring the data and application.   

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Databases with AWS ](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 
+  [Choose between EC2 and Amazon RDS](https://docs.aws.amazon.com/prescriptive-guidance/latest/migration-sql-server/comparison.html) 
+  [Best Practices for Implementing Amazon ElastiCache](https://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28) 
+ [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+ [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM) 

 **Related examples:** 
+  [Optimize Data Pattern using Amazon Redshift Data Sharing](https://wellarchitectedlabs.com/sustainability/300_labs/300_optimize_data_pattern_using_redshift_data_sharing/) 
+  [Database Migrations](https://github.com/aws-samples/aws-database-migration-samples) 
+  [MS SQL Server - AWS Database Migration Service (DMS) Replication Demo](https://github.com/aws-samples/aws-dms-sql-server) 
+  [Database Modernization Hands On Workshop](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Amazon Neptune Samples](https://github.com/aws-samples/amazon-neptune-samples) 

# PERF04-BP02 Evaluate the available options
<a name="perf_right_database_solution_evaluate_options"></a>

 Understand the available database options and how it can optimize your performance before you select your data management solution. Use load testing to identify database metrics that matter for your workload. While you explore the database options, take into consideration various aspects such as the parameter groups, storage options, memory, compute, read replica, eventual consistency, connection pooling, and caching options. Experiment with these various configuration options to improve the metrics. 

 **Desired outcome:** A workload could have one or more database solutions used based on data types. The database functionality and benefits optimally match the data characteristics, access patterns, and workload requirements. To optimize your database performance and cost, you must evaluate the data access patterns to determine the appropriate database options. Evaluate the acceptable query times to ensure that the selected database options can meet the requirements. 

 **Common anti-patterns:** 
+  Not identifying the data access patterns. 
+  Not being aware of the configuration options of your chosen data management solution. 
+  Relying solely on increasing the instance size without looking at other available configuration options. 
+  Not testing the scaling characteristics of the chosen solution. 

 
 **Benefits of establishing this best practice:** By exploring and experimenting with the database options you may be able to reduce the cost of infrastructure, improve performance and scalability and lower the effort required to maintain your workloads. 

 **Level of risk exposed if this best practice is not established:** High 
+  Having to optimize for a *one size fits all* database means making unnecessary compromises. 
+  Higher costs as a result of not configuring the database solution to match the traffic patterns. 
+  Operational issues may emerge from scaling issues. 
+  Data may not be secured to the level required. 

## Implementation guidance
<a name="implementation-guidance"></a>

 Understand your workload data characteristics so that you can configure your database options. Run load tests to identify your key performance metrics and bottlenecks. Use these characteristics and metrics to evaluate database options and experiment with different configurations. 


|  AWS Services  |  Amazon RDS, Amazon Aurora  |  Amazon DynamoDB  |  Amazon DocumentDB  |  Amazon ElastiCache  |  Amazon Neptune  |  Amazon Timestream  |  Amazon Keyspaces  |  Amazon QLDB  | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | 
|  Scaling Compute  |  Increase instance size, Aurora Serverless instances autoscale in response to changes in load  |  Automatic read/write scaling with on-demand capacity mode or automatic scaling of provisioned read/write capacity in provisioned capacity mode  |  Increase instance size  |  Increase instance size, add nodes to cluster  |  Increase instance size  |  Automatically scales to adjust capacity  |  Automatic read/write scaling with on-demand capacity mode or automatic scaling of provisioned read/write capacity in provisioned capacity mode  |  Automatically scales to adjust capacity  | 
|  Scaling-out reads  |  All engines support read replicas. Aurora supports automatic scaling of read replica instances  |  Increase provisioned read capacity units  |  Read replicas  |  Read replicas  |  Read replicas. Supports automatic scaling of read replica instances  |  Automatically scales  |  Increase provisioned read capacity units  |  Automatically scales up to documented concurrency limits  | 
|  Scaling-out writes  |  Increasing instance size, batching writes in the application or adding a queue in front of the database. Horizontal scaling via application-level sharding across multiple instances  |  Increase provisioned write capacity units. Ensuring optimal partition key to prevent partition level write throttling  |  Increasing primary instance size  |  Using Redis in cluster mode to distribute writes across shards  |  Increasing instance size  |  Write requests may be throttled while scaling. If you encounter throttling exceptions, continue to send data at the same (or higher) throughput to automatically scale. Batch writes to reduce concurrent write requests  |  Increase provisioned write capacity units. Ensuring optimal partition key to prevent partition level write throttling  |  Automatically scales up to documented concurrency limits  | 
|  Engine configuration  |  Parameter groups  |  Not applicable  |  Parameter groups  |  Parameter groups  |  Parameter groups  |  Not applicable  |  Not applicable  |  Not applicable  | 
|  Caching  |  In-memory caching, configurable via parameter groups. Pair with a dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  DAX (DAX) fully managed cache available  |  In-memory caching. Optionally, pair with a dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  Primary function is caching  |  Use the query results cache to cache the result of a read-only query  |  Timestream has two storage tiers; one of these is a high-performance in-memory tier  |  Deploy a separate dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  Not applicable  | 
|  High availability / disaster recovery  |  Recommended configuration for production workloads is to run a standby instance in a second Availability Zone to provide resiliency within a Region.  For resiliency across Regions, Aurora Global Database can be used  |  Highly available within a Region. Tables can be replicated across Regions using DynamoDB global tables  |  Create multiple instances across Availability Zones for availability.  Snapshots can be shared across Regions and clusters can be replicated using DMS to provide Cross-Region Replication / disaster recovery  |  Recommended configuration for production clusters is to create at least one node in a secondary Availability Zone.  ElastiCache Global Datastore can be used to replicate clusters across Regions.  |  Read replicas in other Availability Zones serve as failover targets.  Snapshots can be shared across Region and clusters can be replicated using Neptune streams to replicate data between two clusters in two different Regions.  |  Highly available within a Region.  cross-Region replication requires custom application development using the Timestream SDK  |  Highly available within a Region.  Cross-Region Replication requires custom application logic or third-party tools  |  Highly available within a Region.  To replicate across Regions, export the contents of the Amazon QLDB journal to a S3 bucket and configure the bucket for Cross-Region Replication.  | 

 
 **Implementation steps** 

1.  What configuration options are available for the selected databases? 

   1.  Parameter Groups for Amazon RDS and Aurora allow you to adjust common database engine level settings such as the memory allocated for the cache or adjusting the time zone of the database 

   1.  For provisioned database services such as Amazon RDS, Aurora, Neptune, Amazon DocumentDB and those deployed on Amazon EC2 you can change the instance type, provisioned storage and add read replicas. 

   1.  DynamoDB allows you to specify two capacity modes: on-demand and provisioned. To account for differing workloads, you can change between these modes and increase the allocated capacity in provisioned mode at any time. 

1.  Is the workload read or write heavy?  

   1.  What solutions are available for offloading reads (read replicas, caching, etc.)?  

      1.  For DynamoDB tables, you can offload reads using DAX for caching. 

      1.  For relational databases, you can create an ElastiCache for Redis cluster and configure your application to read from the cache first, falling back to the database if the requested item is not present. 

      1.  Relational databases such as Amazon RDS and Aurora, and provisioned NoSQL databases such as Neptune and Amazon DocumentDB all support adding read replicas to offload the read portions of the workload. 

      1.  Serverless databases such as DynamoDB will scale automatically. Ensure that you have enough read capacity units (RCU) provisioned to handle the workload. 

   1.  What solutions are available for scaling writes (partition key sharding, introducing a queue, etc.)? 

      1.  For relational databases, you can increase the size of the instance to accommodate an increased workload or increase the provisioned IOPs to allow for an increased throughput to the underlying storage. 
         +  You can also introduce a queue in front of your database rather than writing directly to the database. This pattern allows you to decouple the ingestion from the database and control the flow-rate so the database does not get overwhelmed.  
         +  Batching your write requests rather than creating many short-lived transactions can help improve throughput in high-write volume relational databases. 

      1.  Serverless databases like DynamoDB can scale the write throughput automatically or by adjusting the provisioned write capacity units (WCU) depending on the capacity mode.  
         +  You can still run into issues with *hot* partitions though, when you reach the throughput limits for a given partition key. This can be mitigated by choosing a more evenly distributed partition key or by write-sharding the partition key.  

1.  What are the current or expected peak transactions per second (TPS)? Test using this volume of traffic and this volume \$1X% to understand the scaling characteristics. 

   1.  Native tools such as pg\$1bench for PostgreSQL can be used to stress-test the database and understand the bottlenecks and scaling characteristics. 

   1.  Production-like traffic should be captured so that it can be replayed to simulate real-world conditions in addition to synthetic workloads. 

1.  If using serverless or elastically scalable compute, test the impact of scaling this on the database. If appropriate, introduce connection management or pooling to lower impact on the database.  

   1.  RDS Proxy can be used with Amazon RDS and Aurora to manage connections to the database.  

   1.  Serverless databases such as DynamoDB do not have connections associated with them, but consider the provisioned capacity and automatic scaling policies to deal with spikes in load. 

1.  Is the load predictable, are there spikes in load and periods of inactivity? 

   1.  If there are periods of inactivity consider scaling down the provisioned capacity or instance size during these times. Aurora Serverless V2 will automatically scale up and down based on load. 

   1.  For non-production instances, consider pausing or stopping these during non-work hours. 

1.  Do you need to segment and break apart your data models based on access patterns and data characteristics? 

   1.  Consider using AWS DMS or AWS SCT to move your data to other services. 

## Level of effort for the implementation plan: 
<a name="level-of-effort-for-the-implementation-plan-to-establish-this-best-practice-you-must-be-aware-of-your-current-data-characteristics-and-metrics.-gathering-those-metrics-establishing-a-baseline-and-then-using-those-metrics-to-identify-the-ideal-database-configuration-options-is-a-low-to-moderate-level-of-effort.-this-is-best-validated-by-load-tests-and-experimentation."></a>

To establish this best practice, you must be aware of your current data characteristics and metrics. Gathering those metrics, establishing a baseline and then using those metrics to identify the ideal database configuration options is a *low* to *moderate* level of effort. This is best validated by load tests and experimentation. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Databases with AWS ](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 

 
 **Related videos:** 
+  [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28)
+ [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Amazon DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples) 
+  [AWS Database migration samples](https://github.com/aws-samples/aws-database-migration-samples) 
+  [Database Modernization Workshop](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Working with parameters on your Amazon RDS for Postgress DB](https://github.com/awsdocs/amazon-rds-user-guide/blob/main/doc_source/Appendix.PostgreSQL.CommonDBATasks.Parameters.md) 

# PERF04-BP03 Collect and record database performance metrics
<a name="perf_right_database_solution_collect_metrics"></a>

 To understand how your data management systems are performing, it is important to track relevant metrics. These metrics will help you to optimize your data management resources, to ensure that your workload requirements are met, and that you have a clear overview on how the workload performs. Use tools, libraries, and systems that record performance measurements related to database performance. 

 
 There are metrics that are related to the system on which the database is being hosted (for example, CPU, storage, memory, IOPS), and there are metrics for accessing the data itself (for example, transactions per second, queries rates, response times, errors). These metrics should be readily accessible for any support or operational staff, and have sufficient historical record to be able to identify trends, anomalies, and bottlenecks. 

 
 **Desired outcome:** To monitor the performance of your database workloads, you must record multiple performance metrics over a period of time. This allows you to detect anomalies as well as measure performance against business metrics to ensure you are meeting your workload needs. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics. 
+  You only publish metrics to internal tools used by your team and don’t have a comprehensive picture of your workload. 
+  You only use the default metrics recorded by your selected monitoring software. 
+  You only review metrics when there is an issue. 
+  You only monitor system level metrics, not capturing data access or usage metrics. 

 **Benefits of establishing this best practice:** Establishing a performance baseline helps in understanding normal behavior and requirements of workloads. Abnormal patterns can be identified and debugged faster improving performance and reliability of the database. Database capacity can be configured to ensure optimal cost without compromising performance. 

 **Level of risk exposed if this best practice is not established:** High 
+  Inability to differentiate out of normal vs. normal performance level will create difficulties in issue identification, and decision making. 
+  Potential cost savings may not be identified. 
+  Growth patterns will not be identified which might result in reliability or performance degradation. 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify, collect, aggregate, and correlate database-related metrics. Metrics should include both the underlying system that is supporting the database and the database metrics. The underlying system metrics might include CPU utilization, memory, available disk storage, disk I/O, and network inbound and outbound metrics while the database metrics might include transactions per second, top queries, average queries rates, response times, index usage, table locks, query timeouts, and number of connections open. This data is crucial to understand how the workload is performing and how the database solution is used. Use these metrics as part of a data-driven approach to tune and optimize your workload's resources.  

 **Implementation steps:** 

1.  Which database metrics are important to track? 

   1.  [Monitoring metrics for Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html) 

   1.  [Monitoring with Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) 

   1.  [Enhanced monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.overview.html) 

   1.  [DynamoDB metrics](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html) 

   1.  [Monitoring DynamoDB DAX](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.Monitoring.html) 

   1.  [Monitoring MemoryDB](https://docs.aws.amazon.com/memorydb/latest/devguide/monitoring-cloudwatch.html) 

   1.  [Monitoring Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/metrics.html) 

   1.  [Timeseries metrics and dimensions](https://docs.aws.amazon.com/timestream/latest/developerguide/metrics-dimensions.html) 

   1.  [Cluster level metrics for Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html) 

   1.  [Monitoring Amazon Keyspaces](https://docs.aws.amazon.com/keyspaces/latest/devguide/monitoring.html) 

   1.  [Monitoring Amazon Neptune](https://docs.aws.amazon.com/neptune/latest/userguide/monitoring.html) 

1.  Would the database monitoring benefit from a machine learning solution that detects operational anomalies performance issues? 

   1.  [Amazon DevOps Guru for Amazon RDS](https://docs.aws.amazon.com/devops-guru/latest/userguide/working-with-rds.overview.how-it-works.html) provides visibility into performance issues and makes recommendations for corrective actions. 

1.  Do you need application level details about SQL usage? 

   1.  [AWS X-Ray](https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-sql) can be instrumented into the application to gain insights and encapsulate all the data points for single query. 

1.  Do you currently have an approved logging and monitoring solution? 

   1.  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 

1.  You identified and configured your data retention policies to match my security and operational goals? 

   1.  [Default data retention for CloudWatch metrics](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 

   1.  [Default data retention for CloudWatch Logs](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 

 **Level of effort for the implementation plan: **There is a *medium* level of effort to identify, track, collect, aggregate, and correlate metrics from all database resources. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/) 
+ [ Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/)
+ [ Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html)
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/)
+ [Amazon DynamoDB best practices ](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+ [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+ [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+ [Cloud Databases with AWS](https://aws.amazon.com/products/databases/) 
+  [Amazon RDS Performance Insights](https://aws.amazon.com/rds/performance-insights/) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54)
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Level 100: Monitoring with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_with_cloudwatch_dashboards/) 
+  [AWS Dataset Ingestion Metrics Collection Framework](https://github.com/awslabs/aws-dataset-ingestion-metrics-collection-framework) 
+  [Amazon RDS Monitoring Workshop](https://www.workshops.aws/?tag=Enhanced%20Monitoring) 

# PERF04-BP04 Choose data storage based on access patterns
<a name="perf_right_database_solution_access_patterns"></a>

Use the access patterns of the workload and requirements of the applications to decide on optimal data services and technologies to use. 

 **Desired outcome:** Data storage has been selected based on identiﬁed and documented data access patterns. This might include the most common read, write, and delete queries, the need for as necessary calculations and aggregations, complexity of the data, data interdependency, and the required consistency needs. 

 **Common anti-patterns:** 
+ You only select one database engine to simplify operations management.
+  You assume that data access patterns will stay consistent over time. 
+  You implement complex transactions, rollback, and consistency logic in the application. 
+  The database is configured to support a potential high traffic burst, which results in the database resources remaining idle most of the time. 
+  Using a shared database for transactional and analytical uses. 

 **Benefits of establishing this best practice:** Selecting and optimizing your data storage based on access patterns will help decrease development complexity and optimize your performance opportunities. Understanding when to use read replicas, global tables, data partitioning, and caching will help you decrease operational overhead and scale based on your workload needs. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

Identify and evaluate your data access pattern to select the correct storage conﬁguration. Each database solution has options to conﬁgure and optimize your storage solution. Use the collected metrics and logs and experiment with options to ﬁnd the optimal conﬁguration. Use the following table to review storage options per database service.


|  AWS Services  |  Amazon RDS  |  Amazon Aurora  |  Amazon DynamoDB  |  Amazon DocumentDB  |  Amazon ElastiCache  |  Amazon Neptune  |  Amazon Timestream  |  Amazon Keyspaces  |  Amazon QLDB  | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
|  Scaling Storage  | Storage can be scaled up manually or configured to scale automatically to a maximum of 64 TiB based on engine types. Provisioned storage cannot be decreased. |  Storage scales automatically up to maximum of 128 TiB and decreases when data is removed. Maximum storage size also depends upon specific Aurora MySQL or Aurora Postgres engine versions.  | Storage automatically scales. Tables are unconstrained in terms of size. | Storage scales automatically up to maximum of 64 TiB. Starting Amazon DocumentDB 4.0 storage can decrease by comparable amounts for data removal through dropping a collection or index. With Amazon DocumentDB 3.6 allocated space remains same and free space is reused when data volume increases. |  Storage is in-memory, tied to instance type or count.  |  Storage scales automatically can grow up to 128 TiB (or 64 TiB in few Regions). Upon data removal from, total allocated space remains same and is reused in the future.  | Organizes your time series data to optimize query processing and reduce storage costs. Retention period can be configured through in-memory and magnetic tiers. | Scales table storage up and down automatically as your application writes, updates, and deletes data. | Storage automatically scales. Tables are unconstrained in terms of size. | 

 
 **Implementation steps:** 

1.  Understand the requirement of transactions, atomicity, consistency, isolation, and durability (ACID) compliance, and consistent reads. Not every database supports these and most of the NoSQL databases provide an eventual consistency model. 

1.  Consider the traffic patterns, latency, and access requirements for a globally distributed application in order to identify the optimal storage solution. 

1.  Analyze query patterns, random access patterns and one-time queries. Considerations around highly specialized query functionality for text and natural language processing, time series, and graphs must also be taken into account. 

1.  Identify and document the anticipated growth of the data and traffic. 

   1.  Amazon RDS and Aurora support storage automatic scaling up to documented limits. Beyond this, consider transitioning older data to Amazon S3 for archival, aggregating historical data for analytics or scaling horizontally using sharding. 

   1.  DynamoDB and Amazon S3 will scale to near limitless storage volume automatically. 

   1.  Amazon RDS instances and databases running on EC2 can be manually resized and EC2 instances can have new EBS volumes added at a later date for additional storage.  

   1.  Instance types can be changed based on changes in activity. For example, you can start with a smaller instance while you are testing, then scale the instance as you begin to receive production traffic to the service. Aurora Serverless V2 automatically scales in response to changes in load.  

1. Baseline requirements around normal and peak performance (transactions per second TPS and queries per second QPS) and consistency (ACID and eventual consistency).

1.  Document solution deployment aspects and the database access requirements (like global replication, Multi-AZ, read replication, and multiple write nodes). 

 **Level of effort for the implementation plan: ** Low. If you do not have logs or metrics for your data management solution, you will need to complete that before identifying and documenting your data access patterns. Once your data access pattern is understood, selecting and conﬁguring your data storage is a low level of eﬀort. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [ Cloud Databases with AWS](https://aws.amazon.com/products/databases/)
+ [ Working with storage for Amazon RDS DB instances ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIOPS.StorageTypes.html)
+ [ Amazon DocumentDB Storage ](https://docs.aws.amazon.com/documentdb/latest/developerguide/how-it-works.html#how-it-works.storage)
+ [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/)
+ [ Amazon Timestream Storage ](https://docs.aws.amazon.com/timestream/latest/developerguide/storage.html)
+ [ Storage in Amazon Keyspaces ](https://docs.aws.amazon.com/keyspaces/latest/devguide/Storage.html)
+ [ Amazon ElastiCache FAQs ](https://aws.amazon.com/elasticache/faqs/)
+ [ Amazon Neptune storage, reliability, and availability ](https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-storage.html)
+ [Amazon Aurora best practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html) 
+ [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/) 
+ [Amazon DynamoDB best practices ](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+  [Amazon RDS Storage Types](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html) 
+ [ Hardware specifications for Amazon RDS instance classes ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html#Concepts.DBInstanceClass.Types)
+ [ Aurora Storage limits ](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_Limits.html#RDS_Limits.FileSize.Aurora)

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L)](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54)
+ [ Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Experiment and test with Distributed Load Testing on AWS](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) 

# PERF04-BP05 Optimize data storage based on access patterns and metrics
<a name="perf_right_database_solution_optimize_metrics"></a>

 Use performance characteristics and access patterns that optimize how data is stored or queried to achieve the best possible performance. Measure how optimizations such as indexing, key distribution, data warehouse design, or caching strategies impact system performance or overall efficiency. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics. 
+  You only publish metrics to internal tools. 

 **Benefits of establishing this best practice:** In order to ensure you are meeting the metrics required for the workload, you must monitor database performance metrics related to both reads and writes. You can use this data to add new optimizations for both reads and writes to the data storage layer. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Optimize data storage based on metrics and patterns: Use reported metrics to identify any underperforming areas in your workload and optimize your database components. Each database system has different performance related characteristics to evaluate, such as how data is indexed, cached, or distributed among multiple systems. Measure the impact of your optimizations. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Database Caching](https://aws.amazon.com/caching/database-caching/) 
+  [Amazon Athena top 10 performance tips](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 
+  [Amazon Aurora best practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html) 
+  [Amazon DynamoDB Accelerator](https://aws.amazon.com/dynamodb/dax/) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+  [Amazon Redshift Spectrum best practices](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+  [Amazon Redshift performance](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+  [Cloud Databases with AWS](https://aws.amazon.com/products/databases/) 
+  [Analyzing performance anomalies with DevOps Guru for RDS](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/devops-guru-for-rds.html) 
+  [Read/Write Capacity Mode for DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) 

 **Related videos:** 
+  [AWS purpose-built databases (DAT209-L)](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R)](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1)](https://www.youtube.com/watch?v=6yqfmXiZTlM) 

 **Related examples:** 
+  [Hands-on Labs for Amazon DynamoDB](https://amazon-dynamodb-labs.workshop.aws/hands-on-labs.html) 

# PERF 5. How do you configure your networking solution?
<a name="perf-05"></a>

 The most effective network solution for a workload varies based on latency, throughput requirements, jitter, and bandwidth. Physical constraints, such as user or on-premises resources, determine location options. These constraints can be offset with edge locations or resource placement. 

**Topics**
+ [PERF05-BP01 Understand how networking impacts performance](perf_select_network_understand_impact.md)
+ [PERF05-BP02 Evaluate available networking features](perf_select_network_evaluate_features.md)
+ [PERF05-BP03 Choose appropriately sized dedicated connectivity or VPN for hybrid workloads](perf_select_network_hybrid.md)
+ [PERF05-BP04 Leverage load-balancing and encryption offloading](perf_select_network_encryption_offload.md)
+ [PERF05-BP05 Choose network protocols to improve performance](perf_select_network_protocols.md)
+ [PERF05-BP06 Choose your workload’s location based on network requirements](perf_select_network_location.md)
+ [PERF05-BP07 Optimize network configuration based on metrics](perf_select_network_optimize.md)

# PERF05-BP01 Understand how networking impacts performance
<a name="perf_select_network_understand_impact"></a>

 Analyze and understand how network-related decisions impact workload performance. The network is responsible for the connectivity between application components, cloud services, edge networks and on-premises data and therefor it can highly impact workload performance. In addition to workload performance, user experience is also impacted by network latency, bandwidth, protocols, location, network congestion, jitter, throughput, and routing rules. 

 **Desired outcome:** Have a documented list of networking requirements from the workload including latency, packet size, routing rules, protocols, and supporting traffic patterns. Review the available networking solutions and identify which service meets your workload networking characteristics. Cloud-based networks can be quickly rebuilt, so evolving your network architecture over time is necessary to improve performance efficiency. 

 **Common anti-patterns:** 
+  All traffic flows through your existing data centers. 
+  You overbuild Direct Connect sessions without understanding the actual usage requirements. 
+  You don’t consider workload characteristics and encryption overhead when defining your networking solutions. 
+  You use on-premises concepts and strategies for networking solutions in the cloud. 

 **Benefits of establishing this best practice:** Understanding how networking impacts workload performance will help you identify potential bottlenecks, improve user experience, increase reliability, and lower operational maintenance as the workload changes. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify important network performance metrics of your workload and capture its networking characteristics. Define and document requirements as part of a data-driven approach, using benchmarking or load testing. Use this data to identify where your network solution is constrained, and examine configuration options that could improve the workload. Understand the cloud-native networking features and options available and how they can impact your workload performance based on the requirements. Each networking feature has advantages and disadvantages and can be configured to meet your workload characteristics and scale based on your needs. 

 **Implementation steps:** 

1.  Define and document networking performance requirements: 

   1.  Include metrics such as network latency, bandwidth, protocols, locations, traffic patterns (spikes and frequency), throughput, encryption, inspection, and routing rules 

1.  Capture your foundational networking characteristics: 

   1.  [VPC Flow Logs ](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

   1.  [AWS Transit Gateway metrics](https://docs.aws.amazon.com/vpc/latest/tgw/transit-gateway-cloudwatch-metrics.html) 

   1.  [AWS PrivateLink metrics](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-cloudwatch-metrics.html) 

1.  Capture your application networking characteristics: 

   1.  [Elastic Network Adaptor](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html) 

   1.  [AWS App Mesh metrics](https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy-metrics.html) 

   1.  [Amazon API Gateway metrics](https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-metrics-and-dimensions.html) 

1.  Capture your edge networking characteristics: 

   1.  [Amazon CloudFront metrics](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/viewing-cloudfront-metrics.html) 

   1.  [Amazon Route 53 metrics](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/monitoring-cloudwatch.html) 

   1.  [AWS Global Accelerator metrics](https://docs.aws.amazon.com/global-accelerator/latest/dg/cloudwatch-monitoring.html) 

1.  Capture your hybrid networking characteristics: 

   1.  [Direct Connect metrics](https://docs.aws.amazon.com/directconnect/latest/UserGuide/monitoring-cloudwatch.html) 

   1.  [AWS Site-to-Site VPN metrics](https://docs.aws.amazon.com/vpn/latest/s2svpn/monitoring-cloudwatch-vpn.html) 

   1.  [AWS Client VPN metrics](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/monitoring-cloudwatch.html) 

   1.  [AWS Cloud WAN metrics](https://docs.aws.amazon.com/vpc/latest/cloudwan/cloudwan-cloudwatch-metrics.html) 

1.  Capture your security networking characteristics: 

   1.  [AWS Shield, WAF, and Network Firewall metrics](https://docs.aws.amazon.com/waf/latest/developerguide/monitoring-cloudwatch.html) 

1.  Capture end-to-end performance metrics with tracing tools: 

   1.  [AWS X-Ray](https://aws.amazon.com/xray/) 

   1.  [Amazon CloudWatch RUM](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html) 

1.  Benchmark and test network performance: 

   1.  [Benchmark](https://aws.amazon.com/premiumsupport/knowledge-center/network-throughput-benchmark-linux-ec2/) network throughput: Some factors that can affect EC2 network performance when the instances are in the same VPC. Measure the network bandwidth between EC2 Linux instances in the same VPC. 

   1.  Perform [load tests](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) to experiment with networking solutions and options 

 **Level of effort for the implementation plan: **There is a *medium* level of effort to document workload networking requirements, options, and available solutions. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+ [EC2 Enhanced Networking on Linux ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+ [EC2 Enhanced Networking on Windows ](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+ [EC2 Placement Groups ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+ [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+ [Network Load Balancer ](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+ [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [Transit Gateway ](https://docs.aws.amazon.com/vpc/latest/tgw)
+ [Transitioning to latency-based routing in Amazon Route 53 ](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+ [VPC Endpoints ](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+ [VPC Flow Logs ](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

 **Related videos:** 
+ [Connectivity to AWS and hybrid AWS network architectures (NET317-R1) ](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+ [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1) ](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+  [Improve Global Network Performance for Applications](https://youtu.be/vNIALfLTW9M) 
+  [EC2 Instances and Performance Optimization Best Practices](https://youtu.be/W0PKclqP3U0) 
+  [Optimizing Network Performance for Amazon EC2 Instances](https://youtu.be/DWiwuYtIgu0) 
+  [Networking best practices and tips with the Well-Architected Framework](https://youtu.be/wOMNpG49BeM) 
+  [AWS networking best practices in large-scale migrations](https://youtu.be/qCQvwLBjcbs) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP02 Evaluate available networking features
<a name="perf_select_network_evaluate_features"></a>

Evaluate networking features in the cloud that may increase performance. Measure the impact of these features through testing, metrics, and analysis. For example, take advantage of network-level features that are available to reduce latency, packet loss, or jitter. 

**Desired outcome:** You have documented the inventory of components within your workload and have identified which networking configurations per component will help you meet your performance requirements. After evaluating the networking features, you have experimented and measured the performance metrics to identify how to use the features available to you. 

**Common anti-patterns:** 
+ You put all your workloads into an AWS Region closest to your headquarters instead of an AWS Region close to your users. 
+ You fail to benchmark your workload performance and do not continually evaluate your workload performance against that benchmark.
+ You do not review service configurations for performance improving options. 

**Benefits of establishing this best practice:** Evaluating all service features and options can increase your workload performance, reduce the cost of infrastructure, decrease the effort required to maintain your workload, and increase your overall security posture. You can use the global AWS backbone to ensure that you provide the optimal networking experience for your customers. 

**Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Review which network-related configuration options are available to you, and review how they could impact your workload. Performance optimization depends on understanding how these options interact with your architecture and the impact that they will have on both measured performance and user experience. 

 Many services are created to improve performance and others commonly offer features to optimize network performance. Services such as AWS Global Accelerator and Amazon CloudFront exist to improve performance while most other services have product features to optimize network traffic. Review service features, such as Amazon EC2 instance network capability, enhanced networking instance types, Amazon EBS-optimized instances, Amazon S3 transfer acceleration, and CloudFront, to improve your workload performance. 

**Implementation steps:** 

1. Create a list of workload components. 

   1. Build, manage and monitor your organizations network using [AWS Cloud WAN](https://aws.amazon.com/cloud-wan/) when building a unified global network. 

   1.  Monitor your global and core networks with [Amazon CloudWatch metrics](https://docs.aws.amazon.com/network-manager/latest/tgwnm/monitoring-cloudwatch-metrics.html). Leverage [CloudWatch Real-User Monitoring (RUM)](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudwatch-rum-applications-client-side-performance/), which provides insights to help to identify, understand, and enhance users’ digital experience. 

   1.  View aggregate network latency between AWS Regions and Availability Zones, as well as within each Availability Zone, using [AWS Network Manager](https://aws.amazon.com/transit-gateway/network-manager/) to gain insight into how your application performance relates to the performance of the underlying AWS network. 

   1.  Use an existing configuration management database (CMDB) tool or a service such as [AWS Config](https://aws.amazon.com/config/) to create an inventory of your workload and how it’s configured. 

1.  If this is an existing workload, identify and document the benchmark for your performance metrics, focusing on the bottlenecks and areas to improve. Performance-related networking metrics will differ per workload based on business requirements and workload characteristics. As a start, these metrics might be important to review for your workload: bandwidth, latency, packet loss, jitter, and retransmits. 

1.  If this is a new workload, perform [load tests](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) to identify performance bottlenecks. 

1.  For the performance bottlenecks you identify, review the configuration options for your solutions to identify performance improvement opportunities. 

1.  If you don’t know your network path or routes, use [Network Access Analyzer](https://docs.aws.amazon.com/vpc/latest/network-access-analyzer/what-is-vaa.html) to identify them. 

1.  Review your network protocols to further reduce your latency (see [PERF05-BP05 Choose network protocols to improve performance](perf_select_network_protocols.md)). 

1.  When connection from on-premises environments to AWS is required, review available configuration options for connectivity and estimate the bandwidth and latency requirements for your hybrid workload (see [PERF05-BP03 Choose appropriately sized dedicated connectivity or VPN for hybrid workloads](perf_select_network_hybrid.md)). 
   +  If you are using an AWS Site-to-Site VPN across multiple locations to connect to an AWS Region, then use an [accelerated Site-to-Site VPN connection](https://docs.aws.amazon.com/vpn/latest/s2svpn/accelerated-vpn.html) for the opportunity to improve network performance. 
   +  If your hybrid network design consists of IPSec VPN connection over [AWS Direct Connect](https://aws.amazon.com/directconnect/), consider using Private IP VPN to improve security and achieve segmentation. [AWS Site-to-Site Private IP VPN](https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-aws-site-to-site-vpn-private-ip-vpns/) is deployed on top of Transit virtual Interface (VIF). 
   +  [AWS Direct Connect SiteLink](https://aws.amazon.com/blogs/aws/new-site-to-site-connectivity-with-aws-direct-connect-sitelink/) allows creating low-latency and redundant connections between your data centers worldwide by sending data over the fastest path between [AWS Direct Connect locations](https://aws.amazon.com/directconnect/locations/), bypassing AWS Regions. 

1. When your workload traffic is spread across multiple accounts, evaluate your network topology and services to reduce latency. 
   + Evaluate your operational and performance tradeoffs between [VPC Peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) and [AWS Transit Gateway](https://aws.amazon.com/transit-gateway/) when connecting multiple accounts. AWS Transit Gateway supports Equal Cost Multipath (ECMP) across multiple AWS Site-to-Site VPN connections in order to deliver additional bandwidth. Traffic between an Amazon VPC and AWS Transit Gateway remains on the private AWS network and is not exposed to the internet. AWS Transit Gateway simplifies how you interconnect all of your VPCs, which can span across thousands of AWS accounts and into on-premises networks. Share your AWS Transit Gateway between multiple accounts using [Resource Access Manager](https://aws.amazon.com/ram/).

1. Review your user locations and minimize the distance between your users and the workload.

   1. [AWS Global Accelerator](https://aws.amazon.com/global-accelerator/) is a networking service that improves the performance of your users’ traffic by up to 60% using the AWS global network infrastructure. When the internet is congested, AWS Global Accelerator optimizes the path to your application to keep packet loss, jitter, and latency consistently low. It also provides static IP addresses that simplify moving endpoints between Availability Zones or AWS Regions without needing to update your DNS configuration or change client-facing applications. Add an accelerator when creating a load balancer to improve the performance and availability of your workload taking advantages of AWS backbone.

   1. [Amazon CloudFront](https://aws.amazon.com/cloudfront/) can improve the performance of your workload content delivery and latency globally. CloudFront has over 410 globally dispersed points of presence that can cache your content and lower the latency to the end user. Using Lambda@edge to run functions that customize the content that CloudFront delivers closer to the users, reduces latency and Improves performance.

   1. Amazon Route 53 offers [latency-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html), [geolocation routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-geo.html), [geoproximity routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-geoproximity.html), and [IP-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-ipbased.html) options to help you improve your workload’s performance for a global audience. Identify which routing option would optimize your workload performance by reviewing your workload traffic and user location when your workload is distributed globally. 

1. Evaluate additional Amazon S3 features to improve storage IOPs. 

   1.  [Amazon S3 Transfer acceleration](https://aws.amazon.com/s3/transfer-acceleration/) is a feature that lets external users benefit from the networking optimizations of CloudFront to upload data to Amazon S3. This improves the ability to transfer large amounts of data from remote locations that don’t have dedicated connectivity to the AWS Cloud. 

   1.  [Amazon S3 Multi-Region Access Points](https://docs.aws.amazon.com/AmazonS3/latest/userguide/MultiRegionAccessPoints.html) replicates content to multiple Regions and simplifies the workload by providing one access point. When a Multi-Region Access Point is used, you can request or write data to Amazon S3 with the service identifying the lowest latency bucket. 

1. Review your compute resource network bandwidth.

   1. Elastic Network Interfaces (ENA) used by EC2 instances, containers, and Lambda functions are limited on a per-flow basis. Review your placement groups to optimize your [EC2 networking throughput](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html). To avoid the bottleneck at the per flow-basis, design your application to use multiple flows. To monitor and get visibility into your compute related networking metrics, use [CloudWatch Metrics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-instance-network-bandwidth.html) and [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html). `ethtool` is included in the ENA driver and exposes additional network-related metrics that can be published as a [custom metric](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html) to CloudWatch. 

   1. Newer Amazon EC2 instances can leverage enhanced networking. C7gn instances featuring new AWS Nitro Cards (Nitro V5) with enhanced networking offer the highest network bandwidth and packet rate performance across Amazon EC2 network-optimized instances. 

   1. [Amazon Elastic Network Adapters](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) (ENA) provide further optimization by delivering better throughput for your instances within a [cluster placement group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-cluster%23placement-groups-limitations-cluster). Elastic Network Adapter Express (ENA-X) is a new ENA feature using scalable reliable datagram (SRD) protocol that improves single flow bandwidth and lower tail latency by increasing maximum single flow bandwidth of Amazon EC2 instances from 5 Gbps up to 25 Gbps, and it can provide up to 85% improvement in P99.9 latency for high throughput workloads. ENA-X is currently available for C6gn.16xl.

   1. [Elastic Fabric Adapter](https://aws.amazon.com/hpc/efa/) (EFA) is a network interface for Amazon EC2 instances that allows you to run workloads requiring high levels of internode communications at scale on AWS. With EFA, High Performance Computing (HPC) applications using the Message Passing Interface (MPI) and Machine Learning (ML) applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of CPUs or GPUs. 

   1. [Amazon EBS-optimized](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) instances use an optimized configuration stack and provide additional, dedicated capacity to increase the Amazon EBS I/O. This optimization provides the best performance for your EBS volumes by minimizing contention between Amazon EBS I/O and other traffic from your instance. 

**Level of effort for the implementation plan: **

 Low to Medium. To establish this best practice, you must be aware of your current workload component options that impact network performance. Gathering the components, evaluating network improvement options, experimenting, implementing, and documenting those improvements is a *low* to *moderate* level of effort. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS - Optimized Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) 
+  [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+  [Amazon EC2 instance network bandwidth](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html) 
+  [EC2 Enhanced Networking on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+  [EC2 Enhanced Networking on Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+  [EC2 Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+  [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+  [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [AWS Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+  [Transitioning to Latency-Based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+  [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 
+  [Building a cloud CMDB](https://aws.amazon.com/blogs/mt/building-a-cloud-cmdb-on-aws-for-consistent-resource-configuration-in-hybrid-environments/) 
+  [Scaling VPN throughput using AWS Transit Gateway](https://aws.amazon.com/blogs/networking-and-content-delivery/scaling-vpn-throughput-using-aws-transit-gateway/) 

 **Related videos:** 
+  [Connectivity to AWS and hybrid AWS network architectures (NET317-R1)](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+  [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1)](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+  [AWS Global Accelerator](https://www.youtube.com/watch?v=lAOhr-5Urfk) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP03 Choose appropriately sized dedicated connectivity or VPN for hybrid workloads
<a name="perf_select_network_hybrid"></a>

 When a common network is required to connect on-premises and cloud resources in AWS, verify that you have adequate bandwidth to meet your performance requirements. Estimate the bandwidth and latency requirements for your hybrid workload. These numbers will drive the sizing requirements for your connectivity options. 

 **Desired outcome:** When deploying a workload that will need hybrid networking, you have multiple configuration options for connectivity, such as a dedicated connection or virtual private network (VPN). Select the appropriate connection type for each workload while verifying that you have adequate bandwidth and encryption requirements between your location and the cloud. 

 **Common anti-patterns:** 
+ You fail to understand or identify all workload requirements (bandwidth, latency, jitter, encryption and traffic needs).
+  You don’t evaluate backup or parallel connectivity options. 

 **Benefits of establishing this best practice:** Selecting and configuring appropriately sized hybrid network solutions will increase the reliability of your workload and maximize performance opportunities. By identifying workload requirements, planning ahead, and evaluating hybrid solutions you will minimize expensive physical network changes and operational overhead while increasing your time to market. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

Develop a hybrid networking architecture based on your bandwidth requirements. Estimate the bandwidth and latency requirements of your hybrid applications. Consider appropriate connectivity option between using a dedicated network connection or internet-based VPN.

Dedicated connection establishes network connection over private lines. It is suitable when you need high-bandwidth, low-latency while achieving consistent performance. VPN connection establishes secure connection over the internet. It is suitable when you need encrypted connection using an existing internet connection.

Based on your bandwidth requirements, a single VPN or dedicated connection might not be enough, and you must architect a hybrid setup to permit traffic load balancing across multiple connections. 

 **Implementation steps** 

1.  Estimate the bandwidth and latency requirements of your hybrid applications. 

   1.  For existing apps that are moving to AWS, leverage the data from your internal network monitoring systems. 

   1.  For new apps or existing apps for which you don’t have monitoring data, consult with the product owners to derive adequate performance metrics and provide a good user experience. 

1.  Select dedicated connection or VPN as your connectivity option. Based on all workload requirements (encryption, bandwidth and traffic needs), you can either choose AWS Direct Connect or AWS Site-to-Site VPN (or both). The following diagram will help you choose the appropriate connection type. 

   1.  If you consider dedicated connection, AWS Direct Connect may be required, which offers more predictable and consistent performance due to its private network connectivity. AWS Direct Connect provides dedicated connectivity to the AWS environment, from 50 Mbps up to 100 Gbps, using either dedicated connection or hosted connection. This gives you managed and controlled latency and provisioned bandwidth so your workload can connect efficiently to other environments. Using an AWS Direct Connect partners, you can have end-to-end connectivity from multiple environments, providing an extended network with consistent performance. AWS offers scaling direct connect connection bandwidth using either native 100 Gbps, Link Aggregation Group (LAG), or BGP Equal-cost multipath (ECMP). 

   1.  If you consider VPN connection, an AWS managed VPN is the recommended option. The AWS Site-to-Site VPN provides a managed VPN service supporting Internet Protocol security (IPsec) protocol. When a VPN connection is created, each VPN connection includes two tunnels for high availability. With AWS Transit Gateway, you can simplify the connectivity between multiple VPCs and also connect to any VPC attached to AWS Transit Gateway with a single VPN connection. AWS Transit Gateway also allows you to scale beyond the 1.25Gbps IPsec VPN throughput limit by allowing equal cost multi-path (ECMP) routing support over multiple VPN tunnels. 

![\[A flowchart that describes the options you should consider when determining if you need deterministic performance in your networking or not.\]](http://docs.aws.amazon.com/wellarchitected/2023-04-10/framework/images/deterministic-performance-flowchart.png)


 **Level of effort for the implementation plan:** High. There is significant effort in evaluating workload needs for hybrid networks and implementing hybrid networking solutions. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [Network Load Balancer ](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+ [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+ [AWS Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+ [Transitioning to latency-based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+ [VPC Endpoints ](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+ [VPC Flow Logs ](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 
+  [AWS Site-to-Site VPN](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html) 
+  [Building a Scalable and Secure Multi-VPC AWS Network Infrastructure](https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/welcome.html) 
+  [AWS Direct Connect](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html) 
+  [Client VPN](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/what-is.html) 

 **Related videos:** 
+ [Connectivity to AWS and hybrid AWS network architectures (NET317-R1) ](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+ [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1) ](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+  [AWS Global Accelerator](https://www.youtube.com/watch?v=lAOhr-5Urfk) 
+  [AWS Direct Connect](https://www.youtube.com/watch?v=DXFooR95BYc&t=6s) 
+  [Transit Gateway Connect](https://www.youtube.com/watch?v=_MPY_LHSKtM&t=491s) 
+  [VPN Solutions](https://www.youtube.com/watch?v=qmKkbuS9gRs) 
+  [Security with VPN Solutions](https://www.youtube.com/watch?v=FrhVV9nG4UM) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP04 Leverage load-balancing and encryption offloading
<a name="perf_select_network_encryption_offload"></a>

Use load balancers to achieve optimal performance efficiency of your target resources and improve the responsiveness of your system.

 **Desired outcome:** Reduce the number of computing resources to serve your traffic. Avoid resource consumption imbalance in your targets. Offload compute-intensive tasks to the Load Balancer. Leverage cloud elasticity and flexibility to improve performance and optimize your architecture. 

 **Common anti-patterns:** 
+ You don’t consider your workload requirements when choosing the load balancer type.
+ You don’t leverage the load balancer features for performance optimization.
+  The workload is exposed directly to the Internet without a load balancer. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Load balancers act as the entry point for your workload and from there they distribute the traffic to your back-end targets, such as compute instances or containers. Choosing the right load balancer type is the first step to optimize your architecture. 

 Start by listing your workload characteristics, such as protocol (like TCP, HTTP, TLS, or WebSockets), target type (like instances, containers, or serverless), application requirements (like long running connections, user authentication, or stickiness), and placement (like Region, Local Zone, Outpost, or zonal isolation). 

 After choosing the right load balancer, you can start leveraging its features to reduce the amount of effort your back-end has to do to serve the traffic. 

 For example, using both Application Load Balancer (ALB) and Network Load Balancer (NLB), you can perform SSL/TLS encryption offloading, which is an opportunity to avoid the CPU-intensive TLS handshake from being completed by your targets and also to improve certificate management. 

 When you configure SSL/TLS offloading in your load balancer, it becomes responsible for the encryption of the traffic from and to clients while delivering the traffic unencrypted to your back-ends, freeing up your back-end resources and improving the response time for the clients. 

 Application Load Balancer can also serve HTTP2 traffic without needing to support it on your targets. This simple decision can improve your application response time, as HTTP2 uses TCP connections more efficiently. 

 Load balancers can also be used to make your architecture more flexible by distributing traffic across different back-end types such as containers and serverless. For example, Application Load Balancer can be configured with [listener rules](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-update-rules.html) that forward traffic to different target groups based on the request parameters such as header, method or pattern. 

 Your workload latency requirements should also be considered when defining the architecture. As an example, if you have a latency-sensitive application, you may decide to use Network Load Balancer, which offers extremely low latencies. Alternatively, you may decide to bring your workload closer to your customers by leveraging Application Load Balancer in [AWS Local Zones](https://aws.amazon.com/about-aws/global-infrastructure/localzones/) or even [AWS Outposts](https://aws.amazon.com/outposts/rack/). 

 Another consideration for latency-sensitive workloads is cross-zone load balancing. With cross-zone load balancing, each load balancer node distributes traffic across the registered targets in all allowed Availability Zones. This improves availability, although it can add a single digit millisecond to the roundtrip latency. 

 Lastly, both ALB and NLB offer monitoring resources such as logs and metrics. Properly setting up monitoring can help with gathering performance insights of your application. For example, you can use ALB access logs to find which requests are taking longer to be answered or which back-end targets are causing performance issues. 

 **Implementation steps** 

1.  Choose the right load balancer for your workload. 

   1.  Use Application Load Balancer for HTTP/HTTPS workloads. 

   1.  Use Network Load Balancer for non-HTTP workloads that run on TCP or UDP. 

   1.  Use a combination of both ([ALB as a target of NLB](https://aws.amazon.com/blogs/networking-and-content-delivery/application-load-balancer-type-target-group-for-network-load-balancer/)) if you want to leverage features of both products. For example, you can do this if you want to use the static IPs of NLB together with HTTP header based routing from ALB, or if you want to expose your HTTP workload to an [AWS PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html). 

   1.  For a full comparison of load balancers, see [ELB product comparison](https://aws.amazon.com/elasticloadbalancing/features/). 

1.  Use SSL/TLS offloading. 

   1.  Configure HTTPS/TLS listeners with both [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-https-listener.html) and [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/create-tls-listener.html) integrated with [AWS Certificate Manager](https://aws.amazon.com/certificate-manager/). 

   1.  Note that some workloads may require end-to-end encryption for compliance reasons. In this case, it is a requirement to allow encryption at the targets. 

   1.  For security best practices, see [SEC09-BP02 Enforce encryption in transit](https://docs.aws.amazon.com/wellarchitected/latest/security-pillar/sec_protect_data_transit_encrypt.html). 

1.  Select the right routing algorithm. 

   1.  The routing algorithm can make a difference in how well-used your back-end targets are and therefore how they impact performance. For example, ALB provides [two options for routing algorithms](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#modify-routing-algorithm): 

   1.  **Least outstanding requests:** Use to achieve a better load distribution to your back-end targets for cases when the requests for your application vary in complexity or your targets vary in processing capability. 

   1.  **Round robin:** Use when the requests and targets are similar, or if you need to distribute requests equally among targets. 

1.  Consider cross-zone or zonal isolation. 

   1.  Use cross-zone turned off (zonal isolation) for latency improvements and zonal failure domains. It is turned off by default in NLB and in [ALB you can turn it off per target group](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/disable-cross-zone.html). 

   1.  Use cross-zone turned on for increased availability and flexibility. By default, cross-zone is turned on for ALB and in [NLB you can turn it on per target group](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-cross-zone.html). 

1.  Turn on HTTP keep-alives for your HTTP workloads. 

   1.  For HTTP workloads, turn on HTTP keep-alive in the web server settings for your back-end targets. With this feature, the load balancer can reuse backend connections until the keep-alive timeout expires, improving your HTTP request and response time and also reducing resource utilization on your back-end targets. For detail on how to do this for Apache and Nginx, see [What are the optimal settings for using Apache or NGINX as a backend server for ELB?](https://aws.amazon.com/premiumsupport/knowledge-center/apache-backend-elb/) 

1.  Use Elastic Load Balancing integrations for better orchestration of compute resources. 

   1.  Use Auto Scaling integrated with your load balancer. One of the key aspects of a performance efficient system has to do with right-sizing your back-end resources. To do this, you can leverage load balancer integrations for back-end target resources. Using the load balancer integration with Auto Scaling groups, targets will be added or removed from the load balancer as required in response to incoming traffic. 

   1.  Load balancers can also integrate with Amazon ECS and Amazon EKS for containerised workloads. 
      + [ Use Elastic Load Balancing to distribute traffic across the instances in your Auto Scaling group ](https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-load-balancer.html)
      + [ Amazon ECS - Service load balancing ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-load-balancing.html)
      + [ Application load balancing on Amazon EKS ](https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html)
      + [ Network load balancing on Amazon EKS ](https://docs.aws.amazon.com/eks/latest/userguide/network-load-balancing.html)

1.  Monitor your load balancer to find performance bottlenecks. 

   1.  Turn on access logs for your [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/enable-access-logging.html) and [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-access-logs.html). 

   1.  The main fields to consider for ALB are `request_processing_time`, `request_processing_time`, and `response_processing_time`. 

   1.  The main fields to consider for NLB are `connection_time` and `tls_handshake_time`. 

   1.  Be ready to query the logs when you need them. You can use Amazon Athena to query both [ALB logs](https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html) and [NLB Logs](https://docs.aws.amazon.com/athena/latest/ug/networkloadbalancer-classic-logs.html). 

   1.  Create alarms for performance related metrics such as [`TargetResponseTime` for ALB](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html). 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+  [SEC09-BP02 Enforce encryption in transit](https://docs.aws.amazon.com/wellarchitected/latest/security-pillar/sec_protect_data_transit_encrypt.html) 

 **Related documents:** 
+ [ ELB product comparison ](https://aws.amazon.com/elasticloadbalancing/features/)
+ [AWS Global Infrastructure ](https://aws.amazon.com/about-aws/global-infrastructure/)
+ [ Improving Performance and Reducing Cost Using Availability Zone Affinity ](https://aws.amazon.com/blogs/architecture/improving-performance-and-reducing-cost-using-availability-zone-affinity/)
+ [ Step by step for Log Analysis with Amazon Athena ](https://github.com/aws/elastic-load-balancing-tools/tree/master/amazon-athena-for-elb)
+ [ Querying Application Load Balancer logs ](https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html)
+ [ Monitor your Application Load Balancers ](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-monitoring.html)
+ [ Monitor your Network Load Balancers ](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-monitoring.html)

 **Related videos:** 
+ [AWS re:Invent 2018: [REPEAT 1] Elastic Load Balancing: Deep Dive and Best Practices (NET404-R1) ](https://www.youtube.com/watch?v=VIgAT7vjol8)
+ [AWS re:Invent 2021 - How to choose the right load balancer for your AWS workloads ](https://www.youtube.com/watch?v=p0YZBF03r5A)
+ [AWS re:Inforce 2022 - How to use Elastic Load Balancing to enhance your security posture at scale (NIS203) ](https://www.youtube.com/watch?v=YhNc5VSzOGQ)
+ [AWS re:Invent 2019: Get the most from Elastic Load Balancing for different workloads (NET407-R2) ](https://www.youtube.com/watch?v=HKh54BkaOK0)

 **Related examples:** 
+ [ CDK and CloudFormation samples for Log Analysis with Amazon Athena ](https://github.com/aws/elastic-load-balancing-tools/tree/master/log-analysis-elb-cdk-cf-template)

# PERF05-BP05 Choose network protocols to improve performance
<a name="perf_select_network_protocols"></a>

Assess the performance requirements for your workload, and choose the network protocols that optimize your workload’s overall performance.

There is a relationship between latency and bandwidth to achieve throughput. For instance, if your file transfer is using Transmission Control Protocol (TCP), higher latencies will reduce overall throughput. There are approaches to fix this with TCP tuning and optimized transfer protocols (some approaches use User Datagram Protocol (UDP)).

 The [scalable reliable datagram (SRD)](https://ieeexplore.ieee.org/document/9167399) protocol is a network transport protocol built by AWS for Elastic Fabric Adapters that provides reliable datagram delivery. Unlike the TCP protocol, SRD can reorder packets and deliver them out of order. This out of order delivery mechanism of SRD sends packets in parallel over alternate paths, increasing throughput. 

 **Common anti-patterns:** 
+  Using TCP for all workloads regardless of performance requirements. 

 **Benefits of establishing this best practice:** 
+  Selecting the proper protocol for communication between workload components ensures that you are getting the best performance for that workload. 
+  Verifying that an appropriate protocol is used for communication between users and workload components helps improve overall user experience for your applications. For instance, by using both TCP and UDP together, VDI workloads can take advantage of the reliability of TCP for critical data and the speed of UDP for real-time data. 

 **Level of risk exposed if this best practice is not established:** Medium (Using an inappropriate network protocol can lead to poor performance, such as slow response times, high latency and poor scalability) 

## Implementation guidance
<a name="implementation-guidance"></a>

A primary consideration for improving your workload’s performance is to understand the latency and throughput requirements, and then choose network protocols that optimize performance. 

 **When to consider using TCP** 

 TCP provides reliable data delivery, and can be used for communication between workload components where reliability and guaranteed delivery of data is important. Many web-based applications rely on TCP-based protocols, such as HTTP and HTTPS, to open TCP sockets for communication with servers on AWS. Email and file data transfer are common applications that also make use of TCP due to TCP’s ability to control the rate of data exchange and network congestion. Using TLS with TCP can add some overhead to the communication, which can result in increased latency and reduced throughput. The overhead comes mainly from the added overhead of the handshake process, which can take several round-trips to complete. Once the handshake is complete, the overhead of encrypting and decrypting data is relatively small. 

 **When to consider using UDP** 

 UDP is a connectionless-oriented protocol and is therefore suitable for applications that need fast, efficient transmission, such as log, monitoring, and VoIP data. Also, consider using UDP if you have workload components that respond to small queries from large numbers of clients to ensure optimal performance of the workload. Datagram Transport Layer Security (DTLS) is the UDP equivalent of TLS. When using DTLS with UDP, the overhead comes from encrypting and decrypting the data, as the handshake process is simplified. DTLS also adds a small amount of overhead to the UDP packets, as it includes additional fields to indicate the security parameters and to detect tampering. 

 **When to consider using SRD** 

 Scalable reliable datagram (SRD) is a network transport protocol optimized for high-throughput workloads due to its ability to load-balancer traffic across multiple paths and quickly recover from packet drops or link failures. SRD is therefore best used for high performance computing (HPC) workloads that require high throughput and low latency communication between compute nodes. This might include parallel processing tasks such as simulation, modelling, and data analysis that involve a large amount of data transfer between nodes. 

 **Implementation steps** 

1.  Use the [AWS Global Accelerator](https://aws.amazon.com/global-accelerator/) and [AWS Transfer Family](https://aws.amazon.com/aws-transfer-family/) services to improve the throughput of your online file transfer applications. The AWS Global Accelerator service helps you achieve lower latency between your client devices and your workload on AWS. With AWS Transfer Family, you can use TCP-based protocols such as Secure Shell File Transfer Protocol (SFTP) and File Transfer Protocol over SSL (FTPS) to securely scale and manage your file transfers to AWS storage services. 

1.  Use network latency to determine if TCP is appropriate for communication between workload components. If the network latency between your client application and server is high, then the TCP three-way handshake can take some time, thereby impacting on the responsiveness of your application. Metrics such as Time to First Byte (TTFB) and Round-Trip Time (RTT) can be used to measure network latency. If your workload serves dynamic content to users, consider using [Amazon CloudFront](https://aws.amazon.com/cloudfront/), which establishes a persistent connection to each origin for dynamic content to eliminate the connection setup time that would otherwise slow down each client request. 

1.  Using TLS with TCP or UDP can result in increased latency and reduced throughput for your workload due to the impact of encryption and decryption. For such workloads, consider SSL/TLS offloading on [Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) to improve workload performance by allowing the load balancer to handle SSL/TLS encryption and decryption process instead of having backend instances do it. This can help reduce the CPU utilization on the backend instances, which can improve performance and increase capacity. 

1.  Use the [Network Load Balancer (NLB)](https://aws.amazon.com/elasticloadbalancing/network-load-balancer/) to deploy services that rely on the UDP protocol, such as authentication and authorization, logging, DNS, IoT, and streaming media, to improve the performance and reliability of your workload. The NLB distributes incoming UDP traffic across multiple targets, allowing you to scale your workload horizontally, increase capacity, and reduce the overhead of a single target. 

1.  For your High Performance Computing (HPC) workloads, consider using the [Elastic Network Adapter (ENA) Express](https://aws.amazon.com/about-aws/whats-new/2022/11/elastic-network-adapter-ena-express-amazon-ec2-instances/) functionality that uses the SRD protocol to improve network performance by providing a higher single flow bandwidth (25Gbps) and lower tail latency (99.9 percentile) for network traffic between EC2 instances. 

1.  Use the [Application Load Balancer (ALB)](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) to route and load balance your gRPC (Remote Procedure Calls) traffic between workload components or between gRPC clients and services. gRPC uses the TCP-based HTTP/2 protocol for transport and it provides performance benefits such as lighter network footprint, compression, efficient binary serialization, support for numerous languages, and bi-directional streaming. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS - Optimized Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) 
+  [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+  [EC2 Enhanced Networking on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+  [EC2 Enhanced Networking on Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+  [EC2 Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+  [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+  [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+  [Transitioning to Latency-Based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+  [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

 **Related videos:** 
+  [Connectivity to AWS and hybrid AWS network architectures (NET317-R1)](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+  [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1)](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+ [ Tuning Your Cloud: Improve Global Network Performance for Application ](https://www.youtube.com/watch?v=00ukhVcgWrs)
+ [ Application Scaling with EFA and SRD ](https://pages.awscloud.com/HPC-Application-Scaling-with-Elastic-Fabric-Adapter-EFA-and-Scalable-Reliable-Datagram-SRD_2020_0004-CMP_OD.html)

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP06 Choose your workload’s location based on network requirements
<a name="perf_select_network_location"></a>

Evaluate options for resource placement to reduce network latency and improve throughput, providing an optimal user experience by reducing page load and data transfer times.

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

Resources, such as Amazon EC2 instances, are placed into availability zones within [AWS Regions](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/), [AWS Local Zones](https://aws.amazon.com/about-aws/global-infrastructure/localzones/), [AWS Outposts](https://aws.amazon.com/outposts/), or [AWS Wavelength](https://aws.amazon.com/wavelength/) zones. Selection of this location influences network latency and throughput from a given user location. Edge services such as [Amazon CloudFront](https://aws.amazon.com/cloudfront/) and [AWS Global Accelerator](https://aws.amazon.com/global-accelerator/) can also be used to improve network performance by either caching content at edge locations or providing users with an optimal path to the workload through the AWS global network.

 **Implementation steps** 

1.  Choose the appropriate AWS Region or Regions for your deployment based on the following key elements: 

   1.  **Where your users are located:** choosing a Region close to your workload’s users to ensure low latency when they use the workload. 

   1.  **Where your data is located:** for data-heavy applications, the major bottleneck in data transfer is latency. Application code should run as close to the data as possible. 

   1.  **Other constraints:** consider constraints such as security and compliance (for example, data residency requirements). 

1.  For a given workload, if a component consists of a group of interdependent Amazon EC2 instances requiring low-latency, consider using [cluster placement groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) to influence placement of those instances to meet the requirements of the workload. Instances in the same cluster placement group enjoy a higher per-flow throughput limit for TCP/IP traffic and are placed in the same high-bisection bandwidth segment of the network. Cluster placement groups are recommended for applications that benefit from low network latency, high network throughput, or both. 

1.  For a workload that is location-sensitive, for example with low-latency or data residency requirements, review [AWS Local Zones](https://aws.amazon.com/about-aws/global-infrastructure/localzones/) or [AWS Outposts](https://aws.amazon.com/outposts/). 

   1.  AWS Local Zones are a type of infrastructure deployment that places compute, storage, database, and other select AWS services close to large population and industry centers. 

   1.  AWS Outposts is a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. 

1.  Applications such as high-resolution live video streaming, high-fidelity audio, and augmented reality/virtual reality (AR/VR) require ultra-low-latency for 5G devices. For such applications, consider [AWS Wavelength](https://aws.amazon.com/wavelength/). AWS Wavelength embeds AWS compute and storage services within 5G networks, providing mobile edge computing infrastructure for developing, deploying, and scaling ultra-low-latency applications. 

1.  If you have geographically distributed users, a content distribution network (CDN) may be used to accelerate distribution of static and dynamic web content by delivering data through globally dispersed points of presence (PoPs). CDNs typically also provide edge computing capabilities, performing latency sensitive operations such as HTTP header manipulations and URL rewrites and redirects at large scale at the edge. [Amazon CloudFront](https://aws.amazon.com/cloudfront/) is a web service that speeds up distribution of your static and dynamic web content. Use cases for CloudFront include accelerating static website content delivery and serving video on demand or live streaming video. CloudFront can also be used to customize the content and experience for viewers, at reduced latency. 

1.  Some applications require fixed entry points or higher performance by reducing first byte latency and jitter, and increasing throughput. These applications can benefit from networking services that provide static anycast IP addresses and TCP termination at edge locations. [AWS Global Accelerator](https://aws.amazon.com/global-accelerator/) can improve performance for your applications by up to 60% and provide quick failover for multi-region architectures. AWS Global Accelerator provides you with static anycast IP addresses that serve as a fixed entry point for your applications hosted in one or more AWS Regions. These IP addresses permit traffic to ingress onto the AWS global network as close to your users as possible. AWS Global Accelerator reduces the initial connection setup time by establishing a TCP connection between the client and the AWS edge location closest to the client. Review the use of AWS Global Accelerator to improve the performance of your TCP/UDP workloads and provide quick failover for multi-region architectures. 

1.  If you have applications or users on-premises, you may benefit from having a dedication network connection between your network and the cloud. A dedicated network connection can reduce the chance of encountering congestion or unexpected increases in latency. [AWS Direct Connect](https://aws.amazon.com/directconnect/) can improve application performance by connecting your network directly to AWS and bypassing the public internet. When creating a new connection, you can choose a hosted connection provided by an AWS Direct Connect Delivery Partner, or choose a dedicated connection from AWS and deploy at over 100 AWS Direct Connect locations around the globe. You can also reduce your networking costs with low data transfer rates out of AWS, and optionally configure a Site-to-Site VPN for failover. 

1.  If you configure a [Site-to-Site VPN](https://aws.amazon.com/vpn/site-to-site-vpn/) to connect to your resources within AWS, you can optionally turn on acceleration. An accelerated Site-to-Site VPN connection uses AWS Global Accelerator to route traffic from your on-premises network to an AWS edge location that is closest to your customer gateway device. 

1.  Identify which DNS routing option would optimize your workload performance by reviewing your workload traffic and user location. [Amazon Route 53](https://aws.amazon.com/route53) offers [latency-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html), [geolocation routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-geo.html), [geoproximity routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-geoproximity.html), and [IP-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-ipbased.html) options to help you improve your workload’s performance for a global audience. 

   1.  Route 53 also offers low query latency for your end users. Using a global anycast network of DNS servers around the world, Route 53 is designed to automatically answer queries from the optimal location depending on network conditions. 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+ [ COST07-BP02 Implement Regions based on cost ](https://docs.aws.amazon.com/wellarchitected/latest/framework/cost_pricing_model_region_cost.html)
+ [ COST08-BP03 Implement services to reduce data transfer costs ](https://docs.aws.amazon.com/wellarchitected/latest/framework/cost_data_transfer_implement_services.html)
+ [ REL10-BP01 Deploy the workload to multiple locations ](https://docs.aws.amazon.com/wellarchitected/latest/framework/rel_fault_isolation_multiaz_region_system.html)
+ [ REL10-BP02 Select the appropriate locations for your multi-location deployment ](https://docs.aws.amazon.com/wellarchitected/latest/framework/rel_fault_isolation_select_location.html)
+ [ SUS01-BP01 Choose Regions near Amazon renewable energy projects and Regions where the grid has a published carbon intensity that is lower than other locations (or Regions) ](https://docs.aws.amazon.com/wellarchitected/latest/framework/sus_sus_region_a2.html)
+ [ SUS02-BP04 Optimize geographic placement of workloads for user locations ](https://docs.aws.amazon.com/wellarchitected/latest/framework/sus_sus_user_a5.html)
+ [ SUS04-BP07 Minimize data movement across networks ](https://docs.aws.amazon.com/wellarchitected/latest/framework/sus_sus_data_a8.html)

 **Related documents:** 
+ [AWS Global Infrastructure ](https://aws.amazon.com/about-aws/global-infrastructure/)
+ [AWS Local Zones and AWS Outposts, choosing the right technology for your edge workload ](https://aws.amazon.com/blogs/compute/aws-local-zones-and-aws-outposts-choosing-the-right-technology-for-your-edge-workload/)
+  [Placement groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [AWS Local Zones](https://aws.amazon.com/about-aws/global-infrastructure/localzones/) 
+  [AWS Outposts](https://aws.amazon.com/outposts/) 
+  [AWS Wavelength](https://aws.amazon.com/wavelength/) 
+  [Amazon CloudFront](https://aws.amazon.com/cloudfront/) 
+  [AWS Global Accelerator](https://aws.amazon.com/global-accelerator/) 
+  [AWS Direct Connect](https://aws.amazon.com/directconnect/) 
+  [Site-to-Site VPN](https://aws.amazon.com/vpn/site-to-site-vpn/) 
+  [Amazon Route 53](https://aws.amazon.com/route53) 

 **Related videos:** 
+ [AWS Local Zones Explainer Video ](https://www.youtube.com/watch?v=JHt-D4_zh7w)
+ [AWS Outposts: Overview and How It Works ](https://www.youtube.com/watch?v=ppG2FFB0mMQ)
+ [AWS re:Invent 2021 - AWS Outposts: Bringing the AWS experience on premises ](https://www.youtube.com/watch?v=FxVF6A22498)
+ [AWS re:Invent 2020: AWS Wavelength: Run apps with ultra-low latency at 5G edge ](https://www.youtube.com/watch?v=AQ-GbAFDvpM)
+ [AWS re:Invent 2022 - AWS Local Zones: Building applications for a distributed edge ](https://www.youtube.com/watch?v=bDnh_d-slhw)
+ [AWS re:Invent 2021 - Building low-latency websites with Amazon CloudFront ](https://www.youtube.com/watch?v=9npcOZ1PP_c)
+ [AWS re:Invent 2022 - Improve performance and availability with AWS Global Accelerator](https://www.youtube.com/watch?v=s5sjsdDC0Lg)
+ [AWS re:Invent 2022 - Build your global wide area network using AWS](https://www.youtube.com/watch?v=flBieylTwvI)
+ [AWS re:Invent 2020: Global traffic management with Amazon Route 53 ](https://www.youtube.com/watch?v=E33dA6n9O7I)

 **Related examples:** 
+ [AWS Global Accelerator Workshop ](https://catalog.us-east-1.prod.workshops.aws/workshops/effb1517-b193-4c59-8da5-ce2abdb0b656/en-US)
+ [ Handling Rewrites and Redirects using Edge Functions ](https://catalog.us-east-1.prod.workshops.aws/workshops/814dcdac-c2ad-4386-98d5-27d37bb77766/en-US)

# PERF05-BP07 Optimize network configuration based on metrics
<a name="perf_select_network_optimize"></a>

Improper network configuration often affects network performance, efficiency, and cost. In common network environments, in order to quickly complete the deployment in the early stage, the proper network configuration is not fully considered in terms of network performance. To optimize your network configuration, you must first have visibility and data about your network environment.

To understand how your network resources are performing, collect and analyze data to make informed decisions about optimizing your network configuration. Measure the impact of those changes and use the impact measurements to make future decisions. 

 **Desired outcome:** Use metrics and network monitoring tools to optimize network configuration as workloads evolve. Cloud-based networks can be optimized quickly, so evolving your network architecture over time is necessary to maintain performance efficiency. 

 **Common anti-patterns:** 
+  You assume that all performance-related issues are application-related. 
+  You only test your network performance from a location close to where you have deployed the workload. 
+  You use default configurations for all network services. 
+  You overprovision the network resource to provide sufficient capacity. 

 **Benefits of establishing this best practice:** Collecting necessary metrics of your AWS network and implementing network monitoring tools allows you to understand network performance and optimize network configurations. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Monitoring traffic to and from VPCs, subnets, or network interfaces is crucial to understanding how to utilize AWS network resources and how you can optimize network configurations. By using the following tools, you can further inspect information about the traffic usage, network access and logs. 

 **Implementation steps** 

1.  Use [Amazon VPC IP Address Manager](https://docs.aws.amazon.com/vpc/latest/ipam/what-it-is-ipam.html). You can use IPAM to plan, track, and monitor IP addresses for your AWS and on-premises workloads. This is the best practice for you for to optimize IP address usage and allocation. 

1.  Turn on [VPC Flow logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html). Use VPC Flow Logs to capture detailed information about traffic to and from network interfaces in your VPCs. With VPC Flow Logs, you can diagnose overly restrictive or permissive security group rules and determine the direction of the traffic to and from the network interfaces. Data ingestion and archival charges for vended logs apply when you publish flow logs. 

1.  Turn on [DNS query logging](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/query-logs.html). You can configure Amazon Route 53 to log information about public or private DNS queries Route 53 receives. With DNS logs, you can optimize DNS configurations by understanding the domain or subdomain that was requested or Route 53 EDGE locations that responded to DNS queries. 

1.  Use [Reachability Analyzer](https://docs.aws.amazon.com/vpc/latest/reachability/what-is-reachability-analyzer.html) to analyze and debug network reachability. Reachability Analyzer is a configuration analysis tool that allows you to perform connectivity testing between a source resource and a destination resource in your VPCs. This tool helps you verify that your network configuration matches your intended connectivity. 

1.  Use [Network Access Analyzer](https://docs.aws.amazon.com/vpc/latest/network-access-analyzer/what-is-network-access-analyzer.html) to understand network access to your resources. You can use Network Access Analyzer to specify your network access requirements and identify potential network paths that do not meet your specified requirements. By optimizing your corresponding network configuration, you can understand and verify the state of your network and demonstrate if your network on AWS meets your compliance requirements. 

1.  Use [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html) and turn on the appropriate metrics for network options. Make sure to choose the right network metric for your workload. For example, you can turn on metrics for VPC Network Address Usage, VPC NAT Gateway, AWS Transit Gateway, VPN tunnel, AWS Network Firewall, Elastic Load Balancing, and AWS Direct Connect. Continually monitoring metrics is a good practice to observe and understand your network status and usage, and helps you optimize network configuration based on your observations. 

 **Level of effort for the implementation plan:** Medium 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 
+ [ Public DNS query logging ](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/query-logs.html)
+ [ What is IPAM? ](https://docs.aws.amazon.com/vpc/latest/ipam/what-it-is-ipam.html)
+  [What is Reachability Analyzer?](https://docs.aws.amazon.com/vpc/latest/reachability/what-is-reachability-analyzer.html) 
+ [ What is Network Access Analyzer? ](https://docs.aws.amazon.com/vpc/latest/network-access-analyzer/what-is-network-access-analyzer.html)
+ [ CloudWatch metrics for your VPCs ](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-cloudwatch.html)
+ [ Optimize performance and reduce costs for network analytics with VPC Flow Logs in Apache Parquet format ](https://aws.amazon.com/blogs/big-data/optimize-performance-and-reduce-costs-for-network-analytics-with-vpc-flow-logs-in-apache-parquet-format/)
+  [Monitoring your global and core networks with Amazon Cloudwatch metrics](https://docs.aws.amazon.com/vpc/latest/tgwnm/monitoring-cloudwatch-metrics.html) 
+  [Continuously monitor network traffic and resources](https://docs.aws.amazon.com/whitepapers/latest/security-best-practices-for-manufacturing-ot/continuously-monitor-network-traffic-and-resources.html) 

 **Related videos:** 
+ [ Networking best practices and tips with the Well-Architected Framework ](https://www.youtube.com/watch?v=wOMNpG49BeM)
+ [ Monitoring and troubleshooting network traffic ](https://www.youtube.com/watch?v=Ed09ReWRQXc)

 **Related examples:** 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 
+  [AWS Network Monitoring](https://github.com/aws-samples/monitor-vpc-network-patterns)