# Troubleshooting internal server errors in Amazon DynamoDB
<a name="TroubleshootingInternalServerErrors"></a>

In DynamoDB, internal server errors (500 errors) indicate that the service is unable to serve the request. These errors can occur for various reasons, such as transient network issues in the fleet, infrastructure issues, storage node related issues, and more.

You may encounter some internal server errors during the lifecycle of your DynamoDB table. This is expected due to the distributed nature of the service and usually shouldn't be a cause for concern. DynamoDB automatically repairs and heals any transient issues with the service in real time, without requiring any intervention from you. However, if you observe a consistently high number of internal server errors on requests to your table (as seen in the [SystemErrors](metrics-dimensions.md#SystemErrors) metric), you should investigate further.

**Topics**
+ [Investigating internal server errors](#ServerErrors-investigating)
+ [Minimizing the impact from internal server errors](#ServerErrors-minimizing-impact)
+ [Improving operational awareness](#ServerErrors-improving-operational-awareness)

## Investigating internal server errors
<a name="ServerErrors-investigating"></a>

If you encounter internal server errors in your DynamoDB table, consider these options:

1. **Check the AWS Health Dashboard.**

   To identify the issue, the first step is to check the [AWS Service Health Dashboard](https://health.aws.amazon.com/health/status) and your AWS Account Health Dashboard. These dashboards provide valuable information about any service-wide issues, impacted tables, ongoing problems, and the root cause once the issue has been resolved.

   Reviewing the details in these dashboards will give you a better understanding of the current status of the AWS services you're using and any potential problems affecting your account. This information can help you determine the next steps to address the issue and minimize any disruptions to your operations.

1. **Reach out to Support.**

   If you observe prolonged, sustained errors in your requests, it may indicate an issue with the service. As a general rule, if you see an overall failure rate of 1% or more over the last 15 minutes, it's an appropriate time to escalate the issue to the AWS Support team. See, [DynamoDB Service Level Agreement](https://aws.amazon.com/dynamodb/sla/) to learn more.

   When opening a case with the AWS Support team, provide the following details to help expedite the troubleshooting process:
   + Impacted DDB; tables or secondary indexes
   + Time window when the errors were observed
   + DynamoDB request IDs, such as `4KBNVRGD25RG1KEO9UT4V3FQDJVV4KQNSO5AEMVJF66Q9ASUAAJG`, which you can find in your application logs.

   Including these details in the support case will help the AWS team understand the problem and provide a faster resolution. If you don't have the request IDs, you should still log the case with the other available details.

## Minimizing the impact from internal server errors
<a name="ServerErrors-minimizing-impact"></a>

If internal server errors happen when using DynamoDB, minimize the impact of these on your application, consider the following best practices:
+ Use backoffs and retries – DynamoDB's default SDK behaviors are designed to find the right balance for most applications in terms of back-off and retry strategy. However, you can adjust these settings based on your application's tolerance for downtime and performance requirements. Learn more about back-offs and retries to understand how you can fine-tune these retry settings.
+ Use eventually consistent reads – If your application doesn't require strongly consistent reads, consider using eventually consistent reads. These reads are lower cost and less likely to experience transient issues due to internal server errors as it would be served from any of the available Storage Nodes. For more information, see [DynamoDB read consistency](HowItWorks.ReadConsistency.md).

## Improving operational awareness
<a name="ServerErrors-improving-operational-awareness"></a>

Maintaining high availability and reliability of your applications is crucial in today's digital landscape. One key aspect of this is proactively monitoring for internal server errors (ISEs) in your DynamoDB tables and global secondary indexes (GSIs). By creating CloudWatch alarms to monitor these errors, you can gain better operational awareness and be alerted to potential issues before they impact your end-users. This approach aligns with the Operational Excellence pillar of the AWS Well-Architected Framework, ensuring your DynamoDB workload is optimized for performance, security, and reliability.

**Creating CloudWatch alarms**

You should have CloudWatch alarms set on your DynamoDB tables to receive notifications for consistently high numbers of internal server errors instead of observing the metrics manually. This ties with the operational excellence pillar of the Well-Architected framework for any workload on AWS. See [Using the DynamoDB Well-Architected Lens to optimize your DynamoDB workload](bp-wal.md) to learn more about Well-Architecting your DynamoDB tables.

When you create an alarm on the [SystemErrors](metrics-dimensions.md#SystemErrors) metric, specify both the `TableName` and `Operation` dimensions (or `TableName` and `GlobalSecondaryIndexName` for a global secondary index). DynamoDB emits `SystemErrors` per operation, not on `TableName` alone, so an alarm that specifies only `TableName` stays in the `INSUFFICIENT_DATA` state and never alerts.