

# Working with anomalies in DevOps Guru for RDS
<a name="working-with-rds"></a>

DevOps Guru detects, analyzes, and provides recommendations for supported AWS resources, including Amazon RDS engines. For Amazon Aurora and RDS for PostgreSQL database instances with Performance Insights turned on, DevOps Guru for RDS provides detailed, database-specific analyses of performance issues and recommends corrective actions.

**Topics**
+ [Overview of DevOps Guru for RDS](working-with-rds.overview.md)
+ [Enabling DevOps Guru for RDS](working-with-rds.enabling.md)
+ [Analyzing anomalies in Amazon RDS](working-with-rds.analyzing.md)

# Overview of DevOps Guru for RDS
<a name="working-with-rds.overview"></a>

Following, you can find a summary of the key benefits and features of DevOps Guru for RDS. For background on insights and anomalies, see [DevOps Guru concepts](concepts.md).

**Topics**
+ [Benefits of DevOps Guru for RDS](working-with-rds.overview.benefits.md)
+ [Key concepts for database performance tuning](working-with-rds.overview.tuning.md)
+ [Key concepts for DevOps Guru for RDS](working-with-rds.overview.definitions.md)
+ [How DevOps Guru for RDS works](working-with-rds.overview.how-it-works.md)
+ [Supported database engines](working-with-rds.overview.supported-engines.md)

# Benefits of DevOps Guru for RDS
<a name="working-with-rds.overview.benefits"></a>

If you're responsible for an Amazon RDS database, you might not know that an event or regression that is affecting that database is occurring. When you learn about the issue, you might not know why it's occurring or what to do about it. Rather than turning to a database administrator (DBA) for help or relying on third-party tools, you can follow recommendations from DevOps Guru for RDS. 

You gain the following advantages from the detailed analysis of DevOps Guru for RDS:

**Fast diagnosis**  
DevOps Guru for RDS continuously monitors and analyzes database telemetry. Performance Insights, Enhanced Monitoring, and Amazon CloudWatch collect telemetry data for your database instances. DevOps Guru for RDS uses statistical and machine learning techniques to mine this data and detect anomalies. To learn more about telemetry data for Amazon Aurora databases, see [Monitoring DB load with Performance Insights on Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.html) and [Monitoring the OS by using Enhanced Monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_Monitoring.OS.html) in the *Amazon Aurora User Guide*. To learn more about telemetry data for other Amazon RDS databases, see [Monitoring DB load with Performance Insights on Amazon Relational Database Service](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) and [Monitoring OS metrics with Enhanced Monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.html) in the *Amazon RDS User Guide*.

**Fast resolution**  
Each anomaly identifies the performance issue and suggests avenues of investigation or corrective actions. For example, DevOps Guru for RDS might recommend that you investigate specific wait events. Or it might recommend that you tune your application pool settings to limit the number of database connections. Based on these recommendations, you can resolve performance issues more quickly than by troubleshooting manually.

**Proactive insights**  
DevOps Guru for RDS uses metrics from your resources to detect potentially problematic behavior before it becomes a bigger problem. For example, it can detect when sessions connected to the database are not performing active work and might be keeping database resources blocked. DevOps Guru then provides recommendations to help you address issues before they become bigger problems.

**Deep knowledge of Amazon engineers and machine learning**  
To detect performance issues and help you resolve bottlenecks, DevOps Guru for RDS relies on machine learning (ML) and advanced statistical analysis. Amazon database engineers contributed to the development of the DevOps Guru for RDS findings, which encapsulate many years of managing hundreds of thousands of databases. By drawing on this collective knowledge, DevOps Guru for RDS can teach you best practices.

# Key concepts for database performance tuning
<a name="working-with-rds.overview.tuning"></a>

DevOps Guru for RDS assumes that you're familiar with a few key performance concepts. To learn more about these concepts, see [Overview of Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.Overview.html) in the *Amazon Aurora User Guide* or [Overview of Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.html) in the *Amazon RDS User Guide*.

**Topics**
+ [Metrics](#working-with-rds.overview.tuning.metrics)
+ [Problem detection](#working-with-rds.overview.tuning.problems)
+ [DB load](#working-with-rds.overview.tuning.db-load)
+ [Wait events](#working-with-rds.overview.tuning.waits)

## Metrics
<a name="working-with-rds.overview.tuning.metrics"></a>

 A metric represents a time-ordered set of data points. Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time. Amazon RDS provides metrics in real time for the database and for the operating system (OS) that your DB instance runs on. You can view all the system metrics and process information for your Amazon RDS DB instances on the Amazon RDS console. DevOps Guru for RDS monitors and provides insights for some of these metrics. For more information, see [Monitoring metrics in an Amazon Aurora cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/MonitoringAurora.html) or [Monitoring metrics in an Amazon Relational Database Service instance](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html). 

## Problem detection
<a name="working-with-rds.overview.tuning.problems"></a>

 DevOps Guru for RDS employs database and operating system (OS) metrics to detect critical database performance issues, whether those issues are impending or ongoing. There are 2 primary ways DevOps Guru for RDS problem detection works: 
+ Using thresholds
+ Using anomalies

### Detecting problems with thresholds
<a name="working-with-rds.overview.tuning.threshold"></a>

 Thresholds are the bounding values against which the monitored metrics are evaluated. You can think of a threshold as a horizontal line on a metric chart that separates normal behavior from potentially problematic behavior. DevOps Guru for RDS monitors specific metrics and creates thresholds by analyzing what levels are considered potentially problematic for a specified resource. DevOps Guru for RDS then creates insights in the DevOps Guru console when new metric values cross a specified threshold over a given period of time on a consistent basis. The insights contain recommendations to prevent future database performance impact.

 For example, DevOps Guru for RDS might monitor the number of temporary tables using disk over a period of 15 minutes and create an insight when the rate of temporary tables using disk per second is abnormally high. Increased levels of on-disk temporary table usage might impact the database performance. By exposing this situation before it becomes critical, DevOps Guru for RDS helps you take corrective actions to prevent problems. 

### Detecting problems with anomalies
<a name="working-with-rds.overview.tuning.anomaly"></a>

While thresholds provide a simple and effective way to detect database problems, in some situations they are not sufficient. Consider a case where metric values are spiking and crossing into potentially problematic behavior on a regular basis because of a known process, such as a daily reporting job. Since such spikes are expected, creating insights and notifications for each of them would be counterproductive and would likely lead to alert fatigue. 

However, it is still necessary to detect spikes that are highly unusual, since metrics that are much higher than the rest or last much longer could represent real database performance issues. To address this concern, DevOps Guru for RDS monitors certain metrics to detect when a metric’s behavior becomes highly unusual or anomalous. DevOps Guru then reports these anomalies in insights.

For example, DevOps Guru for RDS might create an insight when DB load is not only high, but also significantly deviates from its usual behavior, which indicates a major unexpected slowdown of database operations. By recognizing only the anomalous DB load spikes, DevOps Guru for RDS lets you focus on the issues that are truly important. 

## DB load
<a name="working-with-rds.overview.tuning.db-load"></a>

The key concept for database tuning is the *database load (DB load)* metric. The DB load represents how busy your database is at any given time. An increase in DB load means an increase in database activity.

A *database session* represents an application's dialogue with a relational database. An *active session* is a session that is in the process of running a database request. A session is active when it's either running on CPU or waiting for a resource to become available so that it can proceed. For example, an active session might wait for a page to be read into memory, and then consume CPU while it reads data from the page.

The `DBLoad` metric in Performance Insights is measured in *average active sessions (AAS)*. To calculate AAS, Performance Insights samples the number of active sessions every second. For a specific time period, the AAS is the total number of active sessions divided by the total number of samples. An AAS value of 2 means that, on average, 2 sessions were active in requests at any given time.

An analogy for DB load is activity in a warehouse. Suppose that the warehouse employs 100 workers. If 1 order comes in, 1 worker fulfills the order while the other workers are idle. If 100 or more orders come in, all 100 workers fulfill orders simultaneously. If you periodically sample how many workers are active over a given time period, you can calculate the average number of active workers. The calculation shows that, on average, *N* workers are busy fulfilling orders at any given time. If the average was 50 workers yesterday and 75 workers today, the activity level in the warehouse increased. In the same way, DB load increases as session activity increases.

To learn more, see [Database load](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.Overview.ActiveSessions.html) in the *Amazon Aurora User Guide* or [Database load](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.ActiveSessions.html) in the *Amazon RDS User Guide*.

## Wait events
<a name="working-with-rds.overview.tuning.waits"></a>

A *wait event* is a type of database instrumentation that tells you which resource a database session is waiting for so it can proceed. When Performance Insights counts active sessions to calculate database load, it also records the wait events that are causing the active sessions to wait. This technique allows Performance Insights to show you which wait events are contributing to DB load.

Every active session is either running on the CPU or waiting. For example, sessions consume CPU when they search memory, perform a calculation, or run procedural code. When sessions aren't consuming CPU, they might be waiting for a data file to be read or a log to be written to. The more time that a session waits for resources, the less time it runs on the CPU.

When you tune a database, you often try to find the resources that sessions are waiting for. For example, two or three wait events might account for 90% of DB load. This measure means that, on average, active sessions are spending most of their time waiting for a small number of resources. If you can find out the cause of these waits, you can try to remedy the problem.

Consider the analogy of a warehouse worker. An order comes in for a book. The worker might be delayed in fulfilling the order. For example, a different worker might be currently restocking the shelves, or a trolley might not be available. Or the system used to enter the order status might be slow. The longer the worker waits, the longer the order takes to fulfill. Waiting is a natural part of the warehouse workflow, but if wait time become excessive, productivity decreases. In the same way, repeated or lengthy session waits can degrade database performance.

For more information about wait events in Amazon Aurora, see [Tuning with wait events for Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Tuning.html) and [Tuning with wait events for Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.Tuning.html) in the *Amazon Aurora User Guide*.

For more information about wait events in other Amazon RDS databases, see [Tuning with wait events for RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Tuning.html) in the *Amazon RDS User Guide*.

# Key concepts for DevOps Guru for RDS
<a name="working-with-rds.overview.definitions"></a>

An *insight* is generated by DevOps Guru when it detects anomalous or problematic behavior in your operational applications. An insight contains anomalies for one or more resources. An *anomaly* represents one or more related metrics detected by DevOps Guru that are unexpected or unusual. 

An insight has a severity of *high*, *medium*, or *low*. The insight severity is determined by the most severe anomaly that contributed to creating the insight. For example, if the insight **AWS-ECS\$1MemoryUtilization\$1and\$1others** includes one anomaly with low severity and another with high severity, the overall severity of the insight is high.

If Amazon RDS DB instances have Performance Insights turned on, DevOps Guru for RDS provides detailed analysis and recommendations in the anomalies for these instances. To identify an anomaly, DevOps Guru for RDS develops a baseline for database metric values. DevOps Guru for RDS then compares current metric values to the historical baseline.

**Topics**
+ [Proactive insights](#working-with-rds.overview.definitions.proactive)
+ [Reactive insights](#working-with-rds.overview.definitions.reactive)
+ [Recommendations](#working-with-rds.overview.definitions.finding.recommendations)

## Proactive insights
<a name="working-with-rds.overview.definitions.proactive"></a>

A proactive insight lets you know about problematic behavior before it occurs. It contains anomalies with recommendations and related metrics to help you address the issues before they become bigger problems.

Each proactive insight page provides details about one anomaly.

## Reactive insights
<a name="working-with-rds.overview.definitions.reactive"></a>

A reactive insight identifies anomalous behavior as it occurs. It contains anomalies with recommendations, related metrics, and events to help you understand and address the issues now.

### Causal anomalies
<a name="working-with-rds.overview.definitions.insight"></a>

A *causal anomaly* is a top-level anomaly within a reactive insight. It is shown as the **Primary metric** on the anomaly details page in the DevOps Guru console.**Database load (DB load)** is the causal anomaly for DevOps Guru for RDS. For example, the insight **AWS-ECS\$1MemoryUtilization\$1and\$1others** could have several metric anomalies, one of which is **Database load (DB load)** for the resource **AWS/RDS**. 

Within an insight, the anomaly **Database load (DB load)** can occur for multiple Amazon RDS DB instances. The severity of the anomaly might be different for each DB instance. For example, the severity for one DB instance might be high while the severity for the others is low. The console defaults to the anomaly with the highest severity.

### Contextual anomalies
<a name="working-with-rds.overview.definitions.finding"></a>

A *contextual anomaly* is a finding within **Database load (DB load)** that is related to a reactive insight. It is displayed in the **Related metrics** section of the anomaly details page in the DevOps Guru console. Each contextual anomaly describes a specific Amazon RDS performance issue that requires investigation. For example, a causal anomaly can include the following contextual anomalies:
+ **CPU capacity exceeded** – The CPU run queue or CPU utilization are above normal.
+ **Database memory low** – Processes don't have enough memory.
+ **Database connections spiked** – The number of database connections is above normal.

## Recommendations
<a name="working-with-rds.overview.definitions.finding.recommendations"></a>

Each insight has at least one suggested action. The following examples are recommendations generated by DevOps Guru for RDS:
+ Tune SQL IDs *list\$1of\$1IDs* to reduce CPU usage, or upgrade the instance type to increase CPU capacity.
+ Review the associated spike of current database connections. Consider tuning the application pool settings to avoid frequent dynamic allocation of new database connections.
+ Look for SQL statements that perform excessive memory operations, such as in-memory sorting or large joins.
+ Investigate the heavy I/O usage for the following SQL IDs: *list\$1of\$1IDs*.
+ Check for statements that create large amounts of temporary data, for example those that perform large sorts or use large temporary tables. 
+ Check applications to see what is causing the increase in database workload.
+ Consider enabling the MySQL Performance Schema.
+ Check for long-running transactions and end them with a commit or rollback.
+ Configure the idle\$1in\$1transaction\$1session\$1timeout parameter to end any session that has been in the 'idle in transaction' state for longer than the specified time.

# How DevOps Guru for RDS works
<a name="working-with-rds.overview.how-it-works"></a>

DevOps Guru for RDS collects metric data, analyzes it, and then publishes anomalies in the dashboard.

**Topics**
+ [Data collection and analysis](#working-with-rds.overview.how-it-works.collects)
+ [Anomaly publication](#working-with-rds.overview.how-it-works.publishing)

## Data collection and analysis
<a name="working-with-rds.overview.how-it-works.collects"></a>

DevOps Guru for RDS collects data about your Amazon RDS databases from Amazon RDS Performance Insights. This feature monitors Amazon RDS DB instances, collects metrics, and makes it possible for you to explore the metrics in a chart. The most important performance metric is `DBLoad`. DevOps Guru for RDS consumes Performance Insights metrics and analyzes them to detect anomalies. For more information about Performance Insights, see [Monitoring DB load with Performance Insights on Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.html) in the *Amazon Aurora User Guide* or [Monitoring DB load with Performance Insights on Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) in the *Amazon RDS User Guide*. 

DevOps Guru for RDS uses machine learning and advanced statistical analysis to analyze the data that it collects from Performance Insights. If DevOps Guru for RDS finds performance issues, it proceeds to the next step.

## Anomaly publication
<a name="working-with-rds.overview.how-it-works.publishing"></a>

A database performance issue such as high DB load can degrade the quality of service for your database. When DevOps Guru detects an issue in an RDS database, it publishes an insight in the dashboard. The insight contains an anomaly for the resource **AWS/RDS**.

If Performance Insights is turned on for your instances, the anomaly contains a detailed analysis of the problem. DevOps Guru for RDS also recommends that you perform an investigation or specific corrective action. For example, the recommendation might be to investigate a specific high-load SQL statement, consider increasing CPU capacity, or to close idle-in-transaction sessions.

# Supported database engines
<a name="working-with-rds.overview.supported-engines"></a>

DevOps Guru for RDS is supported for the following database engines:

Amazon Aurora with MySQL compatibility  
To learn more about this engine, see [Working with Amazon Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.html) in the *Amazon Aurora User Guide*.

Amazon Aurora with PostgreSQL compatibility  
To learn more about this engine, see [Working with Amazon Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html) in the *Amazon Aurora User Guide*.

Amazon RDS for PostgreSQL compatibility  
To learn more about this engine, see [Amazon RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html) in the *Amazon RDS User Guide*.

DevOps Guru reports anomalies and gives basic analysis for other database engines. DevOps Guru for RDS gives detailed analysis and recommendations only for Amazon Aurora and RDS for PostgreSQL instances.

# Enabling DevOps Guru for RDS
<a name="working-with-rds.enabling"></a>

When you enable DevOps Guru for RDS, you enable DevOps Guru to analyze anomalies in resources such as DB instances. Amazon RDS makes it easy to discover and enable recommended functionality for an RDS DB instance or DB cluster. To achieve this, RDS makes API calls to other services, such as Amazon EC2, DevOps Guru, and IAM. When the RDS console makes these API calls, AWS CloudTrail logs them for visibility.

To allow DevOps Guru to publish insights for an Amazon RDS database, complete the tasks in the following sections.

**Topics**
+ [Turning on Performance Insights for your Amazon RDS DB instances](#working-with-rds.enabling.pi)
+ [Configuring access policies for DevOps Guru for RDS](#working-with-rds.enabling.policy)
+ [Adding Amazon RDS DB instances to your DevOps Guru coverage](#working-with-rds.enabling.cf)

## Turning on Performance Insights for your Amazon RDS DB instances
<a name="working-with-rds.enabling.pi"></a>

For DevOps Guru for RDS to analyze anomalies on a DB instance, make sure that Performance Insights is turned on. If Performance Insights isn't turned on for a DB instance, DevOps Guru for RDS notifies you in the following places:

Dashboard  
If you view insights by resource type, the **RDS** tile alerts you that Performance Insights isn't turned on. Choose the link to turn on Performance Insights in the Amazon RDS console.

Insights  
In the **Recommendations** section at the bottom of the page, choose **Enable Amazon RDS Performance Insights**.

Settings  
In the **Service: Amazon RDS** section, choose the link to turn on Performance Insights in the Amazon RDS console.

For more information, see [Turning Performance Insights on and off](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.Enabling.html) in the *Amazon Aurora User Guide*, or [Turning Performance Insights on and off](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Enabling.html) in the *Amazon RDS User Guide*.

## Configuring access policies for DevOps Guru for RDS
<a name="working-with-rds.enabling.policy"></a>

For a user to access DevOps Guru for RDS, they must have permissions from either of the following policies:
+ The AWS managed policy `AmazonRDSFullAccess`
+ A customer managed policy that allows the following actions:
  + `pi:GetResourceMetrics`
  + `pi:DescribeDimensionKeys`
  + `pi:GetDimensionKeyDetails`

For more information, see [Configuring access policies for Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PerfInsights.Enabling.html) in the *Amazon Aurora User Guide* or [Configuring access policies for Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.access-control.html) in the *Amazon RDS User Guide*.

## Adding Amazon RDS DB instances to your DevOps Guru coverage
<a name="working-with-rds.enabling.cf"></a>

You can configure DevOps Guru to monitor your Amazon RDS databases either in the DevOps Guru console or the Amazon RDS console. 

 In the DevOps Guru console, you have the following options: 
+ Turn on DevOps Guru at the account level. This is the default. When you choose this option, DevOps Guru analyzes all supported AWS resources in your AWS Region and AWS account, including Amazon RDS databases.
+ Specify AWS CloudFormation stacks for DevOps Guru for RDS.

  For more information, see [Using CloudFormation stacks to identify resources in your DevOps Guru applications](working-with-cfn-stacks.md).
+ Tag your Amazon RDS resources.

  A *tag* is a custom attribute label that you assign to an AWS resource. Use tags to identify the AWS resources that make up your application. You can then filter your insights by tag to view only those created by your application. To view only insights generated by the Amazon RDS resources in your application, add a value such as `Devops-guru-rds` to your Amazon RDS resource tags. For more information, see [Using tags to identify resources in your DevOps Guru applications](working-with-resource-tags.md).
**Note**  
When you tag Amazon RDS resources, you must tag the database instance and not the cluster.

To enable DevOps Guru monitoring from the Amazon RDS console, see [Turning on DevOps Guru in the RDS console](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/devops-guru-for-rds.html#devops-guru-for-rds.configuring.coverage.rds-console). Note that to enable DevOps Guru from the Amazon RDS console you must use tags. For more information about tags, see [Using tags to identify resources in your DevOps Guru applications](working-with-resource-tags.md).

# Analyzing anomalies in Amazon RDS
<a name="working-with-rds.analyzing"></a>

When DevOps Guru for RDS publishes a performance anomaly in the dashboard, you typically perform the following steps:

1. View the insight in the DevOps Guru dashboard. DevOps Guru for RDS reports both reactive and proactive insights.

   For more information, see [Viewing insights](working-with-rds.analyzing.insights.md).

1. View anomalies for **AWS/RDS** resources.

   For more information, see [Viewing reactive anomalies](working-with-rds.analyzing.metrics.md) and [Viewing proactive anomalies](working-with-rds.analyzing.proactive.metrics.md).

1. Respond to DevOps Guru for RDS recommendations.

   For more information, see [Responding to recommendations](working-with-rds.analyzing.recommend.md).

1. Monitor the health of your DB instances to make sure that resolved performance problems don't recur.

   For more information, see [Monitoring metrics in an Amazon Aurora DB cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/MonitoringAurora.html) in the *Amazon Aurora User Guide* and [Monitoring metrics in an Amazon RDS instance](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html) in the *Amazon RDS User Guide*.

# Viewing insights
<a name="working-with-rds.analyzing.insights"></a>

Access the **Insights** page in the DevOps Guru console to find reactive and proactive insights. From there, you can choose an insight from the list to view a detailed page of metrics, recommendations, and more information about the insight.

**To view an insight**

1. Open the Amazon DevOps Guru console at [https://console.aws.amazon.com/devops-guru/](https://console.aws.amazon.com/devops-guru/).

1. Open the navigation pane, and then choose **Insights**.

1. Choose the **Reactive** tab to view reactive insights, or choose **Proactive** to view proactive insights.

1. Choose the name of an insight, prioritizing by status and severity.

   The detailed insight page appears.

# Viewing reactive anomalies
<a name="working-with-rds.analyzing.metrics"></a>

Within an insight, you can view anomalies for Amazon RDS resources. On a reactive insight page, in the **Aggregated metrics** section, you can view a list of anomalies with corresponding timelines. There are also sections that display information about log groups and events related to the anomalies. Causal anomalies in a reactive insight each have a corresponding page with details about the anomaly. 

## Viewing the detailed analysis of an RDS reactive anomaly
<a name="working-with-rds.analyzing.details"></a>

In this stage, drill down in the anomaly to get the detailed analysis and recommendations for your Amazon RDS DB instances. 

The detailed analysis is only available for Amazon RDS DB instances that have Performance Insights turned on.

**To drill down to the anomaly details page**

1. On the insight page, find an aggregated metric with the resource type **AWS/RDS**.

1. Choose **View details**.

   The anomaly details page appears. The title begins with **Database performance anomaly** and names the resource show. The console defaults to the anomaly with the highest severity, regardless of when the anomaly occurred.

1. (Optional) If multiple resources are affected, choose a different resource from the list at the top of the page.

Following, you can find descriptions for the components of the details page.

## Resource overview
<a name="working-with-rds.analyzing.details.overview"></a>

The top section of the details page is **Resource overview**. This section summarizes the performance anomaly experienced by your Amazon RDS DB instance.

![\[The overview of the anomaly details page\]](http://docs.aws.amazon.com/devops-guru/latest/userguide/images/rds-insight-overview.png)


This section has the following fields:
+ **Resource name** – The name of the DB instance that is experiencing the anomaly. In this example, the resource is named **prod\$1db\$1678**.
+ **DB engine** – The name of the DB instance that experiencing the anomaly. In this example, the engine is **Aurora MySQL**.
+ **Anomaly severity** – The measure of the negative impact of the anomaly on your instance. Possible severities are **High**, **Medium**, and **Low**.
+ **Anomaly summary** – A brief summary of the issue. A typical summary is **Unusually high DB load**.
+ **Start time** and **End time** – The time when the anomaly began and ended. If the end time is **Ongoing**, the anomaly is still occurring.
+ **Duration** – The duration of the anomalous behavior. In this example, the anomaly is ongoing and has been occurring for 3 hours and 2 minutes.

## Primary metric
<a name="working-with-rds.analyzing.details.what-we-found"></a>

The **Primary metric** section summarizes the casual anomaly, which is the top-level anomaly within the insight. You can think of the causal anomaly as the general problem experienced by your DB instance.

![\[The What we found section of the anomaly details page\]](http://docs.aws.amazon.com/devops-guru/latest/userguide/images/rds-primary-metric.png)


The left panel provides more details about the issue. In this example, the summary includes the following information:
+ **Database load (DB load)** – A categorization of the anomaly as a database load issue. The corresponding metric in Performance Insights is `DBLoad`. This metric is also published to Amazon CloudWatch.
+ **db.r5.4xlarge** – The DB instance class. The number of vCPUs, which is 16 in this example, corresponds to the dotted line in the **Average active sessions (AAS)** chart.
+ **24 (6x spike)** – The DB load, measured in average active sessions (AAS) during the time interval reported in the insight. Thus, at any given time during the period of the anomaly, an average of 24 sessions were active on the database. The DB load is 6 times the normal DB load for this instance.
+ **Typically: DB load up to 4** – The baseline of DB load, measured in AAS, during a typical workload. The value 4 means that, during normal operations, an average of 4 or fewer sessions are active on the database at any given time.

By default, the load chart is sliced by wait events. This means that for each bar in the chart, the largest colored area represents the wait event that is contributing most to total DB load. The chart shows the time (in red) when the issue began. Focus your attention on the wait events that take up the most space in the bar:
+ `CPU`
+ `IO:wait/io/sql/table/handler`

The preceding wait events appear more than normal for this Aurora MySQL database. To learn how to tune performance using wait events in Amazon Aurora, see [Tuning with wait events for Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.Tuning.html) and [Tuning with wait events for Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Tuning.html) in the *Amazon Aurora User Guide*. To learn how to tune performance using wait events in RDS for PostgreSQL, see [Tuning with wait events for RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Tuning.html) in the *Amazon RDS User Guide*.

## Related metrics
<a name="working-with-rds.analyzing.details.relevant-metrics"></a>

The **Related metrics** section lists the contextual anomalies, which are specific findings within the causal anomaly. These findings give additional information about the performance issues.

![\[The Related metrics section of the details page\]](http://docs.aws.amazon.com/devops-guru/latest/userguide/images/rds-related-metrics.png)


The **Related metrics** table has two columns: **Metrics name** and **Timeline (UTC)**. Every row in the table corresponds to a specific metric.

The first column of every row has the following pieces of information:
+ ****Name**** – The name of the metric. The first row identifies the metric as **CPU running tasks**.
+ **Currently** – The current value of the metric. In the first row, the current value is **162 processes (3x)**. 
+ **Normally** – The baseline of this metric for this database when it is functioning normally. DevOps Guru for RDS calculates the baseline as the 95th percentile value over 1 week of history. The first row indicates that 56 processes are typically running on the CPU.
+ **Contributing to** – The finding associated with this metric. In the first row, the **CPU running tasks** metric is associated with the **CPU capacity exceeded** anomaly.

The **Timeline** column shows a line chart for the metric. The shaded area shows the time interval when DevOps Guru for RDS designated the finding as high severity.

## Analysis and recommendations
<a name="working-with-rds.analyzing.details.findings"></a>

Whereas the causal anomaly describes the overall issue, a contextual anomaly describes a specific finding that requires investigation. Each finding corresponds to a set of related metrics.

In the following example of an **Analysis and recommendations** section, the high DB load anomaly has two findings.

![\[The Analysis and recommendations section of the details page\]](http://docs.aws.amazon.com/devops-guru/latest/userguide/images/rds-analysis-recs.png)


The table has the following columns:
+ **Anomaly** – A general description of this contextual anomaly. In this example, the first anomaly is high-load wait events, and the second is CPU capacity exceeded.
+ **Analysis** – A detailed explanation of the anomaly.

  In the first anomaly, three wait types contribute to 90% of DB load. In the second anomaly, the CPU run queue exceeded 150, which means that at any given time, more than 150 sessions were waiting for CPU time. CPU utilization was over 97%, which means that for the duration of the issue, the CPU was busy 97% of the time. Thus, the CPU was almost continually occupied while an average of 150 sessions waited to run on the CPU.
+ **Recommendations** – The suggested user response to the anomaly.

  In the first anomaly, DevOps Guru for RDS recommends that you investigate the wait events `cpu` and `io/table/sql/handler`. To learn how to tune your database performance based on these events, see [cpu](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/ams-waits.cpu.html) and [io/table/sql/handler](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/ams-waits.waitio.html) in the *Amazon Aurora User Guide*.

  In the second anomaly, DevOps Guru for RDS recommends that you reduce CPU consumption by tuning three SQL statements. You can hover over the links to see the SQL text.
+ **Related metrics** – Metrics that give you specific measurements for the anomaly. For more information about these metrics, see [Metrics reference for Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/metrics-reference.html) in the *Amazon Aurora User Guide* or [Metrics reference for Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/metrics-reference.html) in the *Amazon RDS User Guide*.

  In the first anomaly, DevOps Guru for RDS recommends that compare DB load to the maximum CPU for your instance. In the second anomaly, the recommendation is to look at CPU run queue, CPU utilization, and SQL execution rate.

# Viewing proactive anomalies
<a name="working-with-rds.analyzing.proactive.metrics"></a>

Within insights, you can view anomalies for Amazon RDS resources. Each proactive insight provides details about one proactive anomaly. On a proactive insight page, you can view an insight overview, detailed metrics about the anomaly, and recommendations to prevent future issues. To view a proactive anomaly, [go to the proactive insight page](https://docs.aws.amazon.com/devops-guru/latest/userguide/working-with-rds.analyzing.insights.html).

## Insight overview
<a name="working-with-rds.analyzing.proactive.overview"></a>

The **Insight overview** section provides details about why the insight was created. It displays the severity of the insight as well as a description of the anomaly and a timeframe for when the anomaly occurred. It also lists the number of affected services and applications detected by DevOps Guru.

## Metrics
<a name="working-with-rds.analyzing.proactive.metrics.details"></a>

The **Metrics** section provides graphs of the anomaly. Each graph displays a threshold determined by the resource's baseline behavior, as well as data of the metric reported from the time of the anomaly.

## Recommendations for aggregated resources
<a name="working-with-rds.analyzing.proactive.recommendations"></a>

This section suggests actions that you can take to mitigate the reported issues before they become a bigger problem. Actions that you can take are presented in the **Recommended custom change** column. The rationale behind the recommendations is presented in the **Why is DevOps Guru recommending this?** column. For more information about how to respond to recommendations, see [Responding to recommendations](working-with-rds.analyzing.recommend.md).

# Responding to recommendations
<a name="working-with-rds.analyzing.recommend"></a>

Recommendations are the most important part of the insight. In this stage of the analysis, you act to resolve the performance issue. Typically, you take the following steps:

1. Decide whether the reported performance issue indicates a real problem.

   In some cases, an issue might be expected and benign. For example, if you subject a test database to an extreme DB load, DevOps Guru for RDS reports the load as a performance anomaly. However, you don't need to remedy this anomaly because it's an expected result of your testing.

   If you determine that the issue needs a response, go to the next step.

1. Decide whether to implement the recommendation.

   In the table of recommendations, a column shows the recommended actions. For reactive insights, this is the **What we recommend** column on a reactive anomaly detail page. For proactive insights, this is the **Recommended custom change** column on a proactive insight page.

   DevOps Guru for RDS offers a list of recommendations that cover several potential problematic scenarios. After reviewing this list, determine which recommendation is more relevant to your current situation and consider applying it. If a recommendation works for your situation, go to the next step. If not, skip the remaining step and troubleshoot the issue using manual techniques.

1. Perform the recommended actions.

   DevOps Guru for RDS recommends that you do either of the following:
   + Perform a specific corrective action.

     For example, DevOps Guru for RDS might recommend that you upgrade CPU capacity, tune application pool settings, or enable the Performance Schema.
   + Investigate the cause of the issue.

     Typically, DevOps Guru for RDS recommends that you investigate specific SQL statements or wait events. For example, a recommendation might be to investigate the wait event `io/table/sql/handler`. Look up the listed wait event in [Tuning with wait events for Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Tuning.html) or [Tuning with wait events for Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.Tuning.html) in the *Amazon Aurora User Guide*, or in [Tuning with wait events for RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Tuning.html) in the *Amazon RDS User Guide*. Then perform the recommended actions.
**Important**  
We recommend that you test any changes on a test instance before modifying a production instance. In this way, you understand the impact of the change.