

# Overview of DevOps Guru for RDS
<a name="working-with-rds.overview"></a>

Following, you can find a summary of the key benefits and features of DevOps Guru for RDS. For background on insights and anomalies, see [DevOps Guru concepts](concepts.md).

**Topics**
+ [Benefits of DevOps Guru for RDS](working-with-rds.overview.benefits.md)
+ [Key concepts for database performance tuning](working-with-rds.overview.tuning.md)
+ [Key concepts for DevOps Guru for RDS](working-with-rds.overview.definitions.md)
+ [How DevOps Guru for RDS works](working-with-rds.overview.how-it-works.md)
+ [Supported database engines](working-with-rds.overview.supported-engines.md)

# Benefits of DevOps Guru for RDS
<a name="working-with-rds.overview.benefits"></a>

If you're responsible for an Amazon RDS database, you might not know that an event or regression that is affecting that database is occurring. When you learn about the issue, you might not know why it's occurring or what to do about it. Rather than turning to a database administrator (DBA) for help or relying on third-party tools, you can follow recommendations from DevOps Guru for RDS. 

You gain the following advantages from the detailed analysis of DevOps Guru for RDS:

**Fast diagnosis**  
DevOps Guru for RDS continuously monitors and analyzes database telemetry. Performance Insights, Enhanced Monitoring, and Amazon CloudWatch collect telemetry data for your database instances. DevOps Guru for RDS uses statistical and machine learning techniques to mine this data and detect anomalies. To learn more about telemetry data for Amazon Aurora databases, see [Monitoring DB load with Performance Insights on Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.html) and [Monitoring the OS by using Enhanced Monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_Monitoring.OS.html) in the *Amazon Aurora User Guide*. To learn more about telemetry data for other Amazon RDS databases, see [Monitoring DB load with Performance Insights on Amazon Relational Database Service](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) and [Monitoring OS metrics with Enhanced Monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.html) in the *Amazon RDS User Guide*.

**Fast resolution**  
Each anomaly identifies the performance issue and suggests avenues of investigation or corrective actions. For example, DevOps Guru for RDS might recommend that you investigate specific wait events. Or it might recommend that you tune your application pool settings to limit the number of database connections. Based on these recommendations, you can resolve performance issues more quickly than by troubleshooting manually.

**Proactive insights**  
DevOps Guru for RDS uses metrics from your resources to detect potentially problematic behavior before it becomes a bigger problem. For example, it can detect when sessions connected to the database are not performing active work and might be keeping database resources blocked. DevOps Guru then provides recommendations to help you address issues before they become bigger problems.

**Deep knowledge of Amazon engineers and machine learning**  
To detect performance issues and help you resolve bottlenecks, DevOps Guru for RDS relies on machine learning (ML) and advanced statistical analysis. Amazon database engineers contributed to the development of the DevOps Guru for RDS findings, which encapsulate many years of managing hundreds of thousands of databases. By drawing on this collective knowledge, DevOps Guru for RDS can teach you best practices.

# Key concepts for database performance tuning
<a name="working-with-rds.overview.tuning"></a>

DevOps Guru for RDS assumes that you're familiar with a few key performance concepts. To learn more about these concepts, see [Overview of Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.Overview.html) in the *Amazon Aurora User Guide* or [Overview of Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.html) in the *Amazon RDS User Guide*.

**Topics**
+ [Metrics](#working-with-rds.overview.tuning.metrics)
+ [Problem detection](#working-with-rds.overview.tuning.problems)
+ [DB load](#working-with-rds.overview.tuning.db-load)
+ [Wait events](#working-with-rds.overview.tuning.waits)

## Metrics
<a name="working-with-rds.overview.tuning.metrics"></a>

 A metric represents a time-ordered set of data points. Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time. Amazon RDS provides metrics in real time for the database and for the operating system (OS) that your DB instance runs on. You can view all the system metrics and process information for your Amazon RDS DB instances on the Amazon RDS console. DevOps Guru for RDS monitors and provides insights for some of these metrics. For more information, see [Monitoring metrics in an Amazon Aurora cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/MonitoringAurora.html) or [Monitoring metrics in an Amazon Relational Database Service instance](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html). 

## Problem detection
<a name="working-with-rds.overview.tuning.problems"></a>

 DevOps Guru for RDS employs database and operating system (OS) metrics to detect critical database performance issues, whether those issues are impending or ongoing. There are 2 primary ways DevOps Guru for RDS problem detection works: 
+ Using thresholds
+ Using anomalies

### Detecting problems with thresholds
<a name="working-with-rds.overview.tuning.threshold"></a>

 Thresholds are the bounding values against which the monitored metrics are evaluated. You can think of a threshold as a horizontal line on a metric chart that separates normal behavior from potentially problematic behavior. DevOps Guru for RDS monitors specific metrics and creates thresholds by analyzing what levels are considered potentially problematic for a specified resource. DevOps Guru for RDS then creates insights in the DevOps Guru console when new metric values cross a specified threshold over a given period of time on a consistent basis. The insights contain recommendations to prevent future database performance impact.

 For example, DevOps Guru for RDS might monitor the number of temporary tables using disk over a period of 15 minutes and create an insight when the rate of temporary tables using disk per second is abnormally high. Increased levels of on-disk temporary table usage might impact the database performance. By exposing this situation before it becomes critical, DevOps Guru for RDS helps you take corrective actions to prevent problems. 

### Detecting problems with anomalies
<a name="working-with-rds.overview.tuning.anomaly"></a>

While thresholds provide a simple and effective way to detect database problems, in some situations they are not sufficient. Consider a case where metric values are spiking and crossing into potentially problematic behavior on a regular basis because of a known process, such as a daily reporting job. Since such spikes are expected, creating insights and notifications for each of them would be counterproductive and would likely lead to alert fatigue. 

However, it is still necessary to detect spikes that are highly unusual, since metrics that are much higher than the rest or last much longer could represent real database performance issues. To address this concern, DevOps Guru for RDS monitors certain metrics to detect when a metric’s behavior becomes highly unusual or anomalous. DevOps Guru then reports these anomalies in insights.

For example, DevOps Guru for RDS might create an insight when DB load is not only high, but also significantly deviates from its usual behavior, which indicates a major unexpected slowdown of database operations. By recognizing only the anomalous DB load spikes, DevOps Guru for RDS lets you focus on the issues that are truly important. 

## DB load
<a name="working-with-rds.overview.tuning.db-load"></a>

The key concept for database tuning is the *database load (DB load)* metric. The DB load represents how busy your database is at any given time. An increase in DB load means an increase in database activity.

A *database session* represents an application's dialogue with a relational database. An *active session* is a session that is in the process of running a database request. A session is active when it's either running on CPU or waiting for a resource to become available so that it can proceed. For example, an active session might wait for a page to be read into memory, and then consume CPU while it reads data from the page.

The `DBLoad` metric in Performance Insights is measured in *average active sessions (AAS)*. To calculate AAS, Performance Insights samples the number of active sessions every second. For a specific time period, the AAS is the total number of active sessions divided by the total number of samples. An AAS value of 2 means that, on average, 2 sessions were active in requests at any given time.

An analogy for DB load is activity in a warehouse. Suppose that the warehouse employs 100 workers. If 1 order comes in, 1 worker fulfills the order while the other workers are idle. If 100 or more orders come in, all 100 workers fulfill orders simultaneously. If you periodically sample how many workers are active over a given time period, you can calculate the average number of active workers. The calculation shows that, on average, *N* workers are busy fulfilling orders at any given time. If the average was 50 workers yesterday and 75 workers today, the activity level in the warehouse increased. In the same way, DB load increases as session activity increases.

To learn more, see [Database load](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.Overview.ActiveSessions.html) in the *Amazon Aurora User Guide* or [Database load](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.ActiveSessions.html) in the *Amazon RDS User Guide*.

## Wait events
<a name="working-with-rds.overview.tuning.waits"></a>

A *wait event* is a type of database instrumentation that tells you which resource a database session is waiting for so it can proceed. When Performance Insights counts active sessions to calculate database load, it also records the wait events that are causing the active sessions to wait. This technique allows Performance Insights to show you which wait events are contributing to DB load.

Every active session is either running on the CPU or waiting. For example, sessions consume CPU when they search memory, perform a calculation, or run procedural code. When sessions aren't consuming CPU, they might be waiting for a data file to be read or a log to be written to. The more time that a session waits for resources, the less time it runs on the CPU.

When you tune a database, you often try to find the resources that sessions are waiting for. For example, two or three wait events might account for 90% of DB load. This measure means that, on average, active sessions are spending most of their time waiting for a small number of resources. If you can find out the cause of these waits, you can try to remedy the problem.

Consider the analogy of a warehouse worker. An order comes in for a book. The worker might be delayed in fulfilling the order. For example, a different worker might be currently restocking the shelves, or a trolley might not be available. Or the system used to enter the order status might be slow. The longer the worker waits, the longer the order takes to fulfill. Waiting is a natural part of the warehouse workflow, but if wait time become excessive, productivity decreases. In the same way, repeated or lengthy session waits can degrade database performance.

For more information about wait events in Amazon Aurora, see [Tuning with wait events for Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Tuning.html) and [Tuning with wait events for Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.Tuning.html) in the *Amazon Aurora User Guide*.

For more information about wait events in other Amazon RDS databases, see [Tuning with wait events for RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Tuning.html) in the *Amazon RDS User Guide*.

# Key concepts for DevOps Guru for RDS
<a name="working-with-rds.overview.definitions"></a>

An *insight* is generated by DevOps Guru when it detects anomalous or problematic behavior in your operational applications. An insight contains anomalies for one or more resources. An *anomaly* represents one or more related metrics detected by DevOps Guru that are unexpected or unusual. 

An insight has a severity of *high*, *medium*, or *low*. The insight severity is determined by the most severe anomaly that contributed to creating the insight. For example, if the insight **AWS-ECS\$1MemoryUtilization\$1and\$1others** includes one anomaly with low severity and another with high severity, the overall severity of the insight is high.

If Amazon RDS DB instances have Performance Insights turned on, DevOps Guru for RDS provides detailed analysis and recommendations in the anomalies for these instances. To identify an anomaly, DevOps Guru for RDS develops a baseline for database metric values. DevOps Guru for RDS then compares current metric values to the historical baseline.

**Topics**
+ [Proactive insights](#working-with-rds.overview.definitions.proactive)
+ [Reactive insights](#working-with-rds.overview.definitions.reactive)
+ [Recommendations](#working-with-rds.overview.definitions.finding.recommendations)

## Proactive insights
<a name="working-with-rds.overview.definitions.proactive"></a>

A proactive insight lets you know about problematic behavior before it occurs. It contains anomalies with recommendations and related metrics to help you address the issues before they become bigger problems.

Each proactive insight page provides details about one anomaly.

## Reactive insights
<a name="working-with-rds.overview.definitions.reactive"></a>

A reactive insight identifies anomalous behavior as it occurs. It contains anomalies with recommendations, related metrics, and events to help you understand and address the issues now.

### Causal anomalies
<a name="working-with-rds.overview.definitions.insight"></a>

A *causal anomaly* is a top-level anomaly within a reactive insight. It is shown as the **Primary metric** on the anomaly details page in the DevOps Guru console.**Database load (DB load)** is the causal anomaly for DevOps Guru for RDS. For example, the insight **AWS-ECS\$1MemoryUtilization\$1and\$1others** could have several metric anomalies, one of which is **Database load (DB load)** for the resource **AWS/RDS**. 

Within an insight, the anomaly **Database load (DB load)** can occur for multiple Amazon RDS DB instances. The severity of the anomaly might be different for each DB instance. For example, the severity for one DB instance might be high while the severity for the others is low. The console defaults to the anomaly with the highest severity.

### Contextual anomalies
<a name="working-with-rds.overview.definitions.finding"></a>

A *contextual anomaly* is a finding within **Database load (DB load)** that is related to a reactive insight. It is displayed in the **Related metrics** section of the anomaly details page in the DevOps Guru console. Each contextual anomaly describes a specific Amazon RDS performance issue that requires investigation. For example, a causal anomaly can include the following contextual anomalies:
+ **CPU capacity exceeded** – The CPU run queue or CPU utilization are above normal.
+ **Database memory low** – Processes don't have enough memory.
+ **Database connections spiked** – The number of database connections is above normal.

## Recommendations
<a name="working-with-rds.overview.definitions.finding.recommendations"></a>

Each insight has at least one suggested action. The following examples are recommendations generated by DevOps Guru for RDS:
+ Tune SQL IDs *list\$1of\$1IDs* to reduce CPU usage, or upgrade the instance type to increase CPU capacity.
+ Review the associated spike of current database connections. Consider tuning the application pool settings to avoid frequent dynamic allocation of new database connections.
+ Look for SQL statements that perform excessive memory operations, such as in-memory sorting or large joins.
+ Investigate the heavy I/O usage for the following SQL IDs: *list\$1of\$1IDs*.
+ Check for statements that create large amounts of temporary data, for example those that perform large sorts or use large temporary tables. 
+ Check applications to see what is causing the increase in database workload.
+ Consider enabling the MySQL Performance Schema.
+ Check for long-running transactions and end them with a commit or rollback.
+ Configure the idle\$1in\$1transaction\$1session\$1timeout parameter to end any session that has been in the 'idle in transaction' state for longer than the specified time.

# How DevOps Guru for RDS works
<a name="working-with-rds.overview.how-it-works"></a>

DevOps Guru for RDS collects metric data, analyzes it, and then publishes anomalies in the dashboard.

**Topics**
+ [Data collection and analysis](#working-with-rds.overview.how-it-works.collects)
+ [Anomaly publication](#working-with-rds.overview.how-it-works.publishing)

## Data collection and analysis
<a name="working-with-rds.overview.how-it-works.collects"></a>

DevOps Guru for RDS collects data about your Amazon RDS databases from Amazon RDS Performance Insights. This feature monitors Amazon RDS DB instances, collects metrics, and makes it possible for you to explore the metrics in a chart. The most important performance metric is `DBLoad`. DevOps Guru for RDS consumes Performance Insights metrics and analyzes them to detect anomalies. For more information about Performance Insights, see [Monitoring DB load with Performance Insights on Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.html) in the *Amazon Aurora User Guide* or [Monitoring DB load with Performance Insights on Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) in the *Amazon RDS User Guide*. 

DevOps Guru for RDS uses machine learning and advanced statistical analysis to analyze the data that it collects from Performance Insights. If DevOps Guru for RDS finds performance issues, it proceeds to the next step.

## Anomaly publication
<a name="working-with-rds.overview.how-it-works.publishing"></a>

A database performance issue such as high DB load can degrade the quality of service for your database. When DevOps Guru detects an issue in an RDS database, it publishes an insight in the dashboard. The insight contains an anomaly for the resource **AWS/RDS**.

If Performance Insights is turned on for your instances, the anomaly contains a detailed analysis of the problem. DevOps Guru for RDS also recommends that you perform an investigation or specific corrective action. For example, the recommendation might be to investigate a specific high-load SQL statement, consider increasing CPU capacity, or to close idle-in-transaction sessions.

# Supported database engines
<a name="working-with-rds.overview.supported-engines"></a>

DevOps Guru for RDS is supported for the following database engines:

Amazon Aurora with MySQL compatibility  
To learn more about this engine, see [Working with Amazon Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.html) in the *Amazon Aurora User Guide*.

Amazon Aurora with PostgreSQL compatibility  
To learn more about this engine, see [Working with Amazon Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html) in the *Amazon Aurora User Guide*.

Amazon RDS for PostgreSQL compatibility  
To learn more about this engine, see [Amazon RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html) in the *Amazon RDS User Guide*.

DevOps Guru reports anomalies and gives basic analysis for other database engines. DevOps Guru for RDS gives detailed analysis and recommendations only for Amazon Aurora and RDS for PostgreSQL instances.