

# App Mesh troubleshooting
<a name="troubleshooting"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

This chapter discusses troubleshooting best practices and common issues that you might encounter when using App Mesh. Select one of the following areas to review best practices and common issues for that area.

**Topics**
+ [App Mesh troubleshooting best practices](troubleshooting-best-practices.md)
+ [App Mesh setup troubleshooting](troubleshooting-setup.md)
+ [App Mesh connectivity troubleshooting](troubleshooting-connectivity.md)
+ [App Mesh scaling troubleshooting](troubleshooting-scaling.md)
+ [App Mesh observability troubleshooting](troubleshooting-observability.md)
+ [App Mesh security troubleshooting](troubleshooting-security.md)
+ [App Mesh Kubernetes troubleshooting](troubleshooting-kubernetes.md)

# App Mesh troubleshooting best practices
<a name="troubleshooting-best-practices"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

We recommend that you follow the best practices in this topic to troubleshoot issues when using App Mesh.

## Enable the Envoy proxy administration interface
<a name="ts-bp-enable-proxy-admin-interface"></a>

The Envoy proxy ships with an administration interface that you can use to discover configuration and statistics and to perform other administrative functions such as connection draining. For more information, see [Administration interface](https://www.envoyproxy.io/docs/envoy/latest/operations/admin) in the Envoy documentation.

If you use the managed [Envoy image](envoy.md), the administration endpoint is enabled by default on port 9901. Examples provided in [App Mesh setup troubleshooting](troubleshooting-setup.md) display the example administration endpoint URL as `http://my-app.default.svc.cluster.local:9901/`. 

**Note**  
The administration endpoint should never be exposed to the public internet. Additionally, we recommend monitoring the administration endpoint logs, which are set by the `ENVOY_ADMIN_ACCESS_LOG_FILE` environment variable to `/tmp/envoy_admin_access.log` by default. 

## Enable Envoy DogStatsD integration for metric offload
<a name="ts-bp-enable-envoy-statsd-integration"></a>

The Envoy proxy can be configured to offload statistics for OSI Layer 4 and Layer 7 traffic and for internal process health. While this topic shows how to use these statistics without offloading the metrics to sinks like CloudWatch metrics and Prometheus., having these statistics in a centralized location for all of your applications can help you diagnose issues and confirm behavior more quickly. For more information, see [Using Amazon CloudWatch Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) and the [Prometheus documentation](https://prometheus.io/docs/introduction/overview/). 

You can configure DogStatsD metrics by setting the parameters defined in [DogStatsD variables](envoy-config.md#envoy-dogstatsd-config). For more information about DogStatsD, see the [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent) documentation. You can find a demonstration of metric offload to AWS CloudWatch metrics in the [App Mesh with Amazon ECS basics walk-through](https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-ecs-basics) on GitHub.

## Enable access logs
<a name="ts-bp-enable-access-logs"></a>

We recommend enabling access logs on your [Virtual nodes](virtual_nodes.md) and [Virtual gateways](virtual_gateways.md) to discover details about traffic transiting between your applications. For more information, see [Access logging](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/observability/access_logging) in the Envoy documentation. The logs provide detailed information on OSI Layer 4 and Layer 7 traffic behavior. When you use Envoy’s default format, you can analyze the access logs with [CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html) using the following parse statement.

```
parse @message "[*] \"* * *\" * * * * * * * * * * *" as StartTime, Method, Path, Protocol, ResponseCode, ResponseFlags, BytesReceived, BytesSent, DurationMillis, UpstreamServiceTimeMillis, ForwardedFor, UserAgent, RequestId, Authority, UpstreamHost
```

## Enable Envoy debug logging in pre-production environments
<a name="ts-bp-enable-envoy-debug-logging"></a>

We recommend setting the Envoy proxy’s log level to `debug` in a pre-production environment. Debug logs can help you identify issues before you graduate the associated App Mesh configuration to your production environment. 

If you’re using the [Envoy image](envoy.md), you can set the log level to `debug` through the `ENVOY_LOG_LEVEL` environment variable. 

**Note**  
We do not recommend using the `debug` level in production environments. Setting the level to `debug` increases the logging and may affect performance and the overall cost of logs offloaded to solutions like [CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html). 

When you use Envoy’s default format, you can analyze the process logs with [CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html) using the following parse statement: 

```
parse @message "[*][*][*][*] [*] *" as Time, Thread, Level, Name, Source, Message
```

## Monitor the Envoy Proxy Connectivity with App Mesh control plane
<a name="ts-bp-monitor-envoy-proxy-connectivity-state"></a>

We recommend you monitor the Envoy metrics `control_plane.connected_state` to make sure that the Envoy proxy communicates with the App Mesh control plane to fetch the dynamic configuration resources. For more information, see [Management Server](https://www.envoyproxy.io/docs/envoy/latest/configuration/overview/mgmt_server.html).

# App Mesh setup troubleshooting
<a name="troubleshooting-setup"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

This topic details common issues that you may experience with App Mesh setup.

## Cannot pull Envoy container image
<a name="ts-setup-cannot-pull-envoy"></a>

**Symptoms**  
You receive the following error message in an Amazon ECS task. The Amazon ECR *account ID* and *Region* in the following message may be different, depending on which Amazon ECR repository that you pulled the container image from.

```
CannotPullContainerError: Error response from daemon: pull access denied for 840364872350.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-envoy, repository does not exist or may require 'docker login'
```

**Resolution**  
This error indicates that the task execution role being used does not have permission to communicate to Amazon ECR and cannot pull the Envoy container image from the repository. The task execution role assigned to your Amazon ECS task needs an IAM policy with the following statements:

```
{
  "Action": [
    "ecr:BatchCheckLayerAvailability",
    "ecr:GetDownloadUrlForLayer",
    "ecr:BatchGetImage"
  ],
  "Resource": "arn:aws:ecr:us-west-2:111122223333:repository/aws-appmesh-envoy",
  "Effect": "Allow"
},
{
  "Action": "ecr:GetAuthorizationToken",
  "Resource": "*",
  "Effect": "Allow"
}
```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Cannot connect to App Mesh Envoy management service
<a name="ts-setup-cannot-connect-ems"></a>

**Symptoms**  
Your Envoy proxy is unable to connect to the App Mesh Envoy management service. You are seeing:
+ Connection refused errors
+ Connection timeouts
+ Errors resolving the App Mesh Envoy management service endpoint
+ gRPC errors

**Resolution**  
Make sure that your Envoy proxy has access to the internet or to a private [VPC endpoint](vpc-endpoints.md) and that your [security groups](https://docs.aws.amazon.com//vpc/latest/userguide/VPC_SecurityGroups.html) allow outbound traffic on port 443. App Mesh’s public Envoy management service endpoints follow the fully qualified domain name (FQDN) format.

```
# App Mesh Production Endpoint
appmesh-envoy-management.Region-code.amazonaws.com

# App Mesh Preview Endpoint
appmesh-preview-envoy-management.Region-code.amazonaws.com
```

You can debug your connection to EMS using the command below. This sends a valid, but empty gRPC request to the Envoy Management Service.

```
curl -v -k -H 'Content-Type: application/grpc' -X POST https://appmesh-envoy-management.Region-code.amazonaws.com:443/envoy.service.discovery.v3.AggregatedDiscoveryService/StreamAggregatedResources
```

If you receive these messages back, your connection to Envoy Management Service is functional. For debugging gRPC related errors, see the errors in [Envoy disconnected from App Mesh Envoy management service with error text.](https://docs.aws.amazon.com/app-mesh/latest/userguide/troubleshooting-setup.html#ts-setup-grpc-error-codes) 

```
grpc-status: 16
grpc-message: Missing Authentication Token
```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Envoy disconnected from App Mesh Envoy management service with error text
<a name="ts-setup-grpc-error-codes"></a>

**Symptoms**  
Your Envoy proxy is unable to connect to the App Mesh Envoy management service and receive its configuration. Your Envoy proxy logs contain a log entry like the following.

```
gRPC config stream closed: gRPC status code, message
```

**Resolution**  
In most cases, the message portion of the log should indicate the problem. The following table lists the most common gRPC status codes that you might see, their causes, and their resolutions.


| gRPC status code | Cause | Resolution | 
| --- | --- | --- | 
| 0 | Graceful disconnect from the Envoy management service. | There is no issue. App Mesh occasionally disconnects Envoy proxies with this status code. Envoy will reconnect and continue receiving updates. | 
| 3 | The mesh endpoint (virtual node or virtual gateway), or one of its associated resources, could not be found. | Double check your Envoy configuration to make sure that it has the appropriate name of the App Mesh resource that it represents. If your App Mesh resource is integrated with other AWS resources, such as AWS Cloud Map namespaces or ACM certificates, then make sure that those resources exist. | 
| 7 | The Envoy proxy is unauthorized to perform an action, such as connect to the Envoy management service, or retrieve associated resources. | Make sure that you [create an IAM policy ](proxy-authorization.md#create-iam-policy) that has the appropriate policy statements for App Mesh and other services and attach that policy to the IAM user or role that your Envoy proxy is using to connect to the Envoy management service.  | 
| 8 | The number of Envoy proxies for a given App Mesh resource exceeds the account-level service quota. | See [App Mesh service quotas](service-quotas.md) for information on default account quotas and how to request a quota increase. | 
| 16 | The Envoy proxy does not have valid authentication credentials for AWS. | Make sure that the Envoy has appropriate credentials to connect to AWS services through an IAM user or role. A known issue, [\$124136](https://github.com/envoyproxy/envoy/issues/24136), in Envoy for version v1.24 and before fails to fetch the credentials if Envoy process uses over 1024 file descriptors. This happens when Envoy is serving high traffic volume. You can confirm this issue by checking Envoy logs at debug level for the text "A libcurl function was given a bad argument". To mitigate this issue, upgrade to Envoy version v1.25.1.0-prod or later. | 

You can observe the status codes and messages from your Envoy proxy with [Amazon CloudWatch Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html) by using the following query:

```
filter @message like /gRPC config stream closed/
| parse @message "gRPC config stream closed: *, *" as StatusCode, Message
```

If the provided error message was not helpful, or your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here).

## Envoy container health check, readiness probe, or liveliness probe failing
<a name="ts-setup-envoy-container-checks"></a>

**Symptoms**  
Your Envoy proxy is failing health checks in an Amazon ECS task, Amazon EC2 instance, or Kubernetes pod. For example, you query the Envoy administration interface with the following command and receive a status other than `LIVE`.

```
curl -s http://my-app.default.svc.cluster.local:9901/server_info | jq '.state'
```

**Resolution**  
The following is a list of remediation steps depending on the status returned by the Envoy proxy.
+ `PRE_INITIALIZING` or `INITIALIZING` – The Envoy proxy has yet to receive configuration, or cannot connect and retrieve configuration from App Mesh Envoy management service. The Envoy may be receiving an error from the Envoy management service when trying to connect. For more information, see the errors in [Envoy disconnected from App Mesh Envoy management service with error text](#ts-setup-grpc-error-codes).
+ `DRAINING` – The Envoy proxy has begun draining connections in response to a `/healthcheck/fail` or `/drain_listeners` request on the Envoy administration interface. We do not recommend invoking these paths on the administration interface unless you are about to terminate your Amazon ECS task, Amazon EC2 instance, or Kubernetes pod.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Health check from the load balancer to the mesh endpoint is failing
<a name="ts-setup-lb-mesh-endpoint-health-check"></a>

**Symptoms**  
Your mesh endpoint is considered healthy by the container health check or readiness probe, but the health check from the load balancer to the mesh endpoint is failing.

**Resolution**  
To resolve the issue, complete the following tasks.
+ Make sure that the [security group](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html) associated with your mesh endpoint accepts inbound traffic on the port you've configured for your health check.
+ Make sure that the health check is succeeding consistently when requested manually; for example, from a [bastion host within your VPC](https://aws.amazon.com/quickstart/architecture/linux-bastion/).
+ If you are configuring a health check for a virtual node, then we recommend implementing a health check endpoint in your application; for example, /ping for HTTP. This ensures that both the Envoy proxy and your application are routable from the load balancer.
+ You can use any elastic load balancer type for the virtual node, depending on the features that you need. For more information, see [Elastic Load Balancing features](https://aws.amazon.com/elasticloadbalancing/features/#compare).
+ If you are configuring a health check for a [virtual gateway](virtual_gateways.md), then we recommend using a [network load balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html) with a TCP or TLS health check on the virtual gateway's listener port. This ensures that the virtual gateway listener is bootstrapped and ready to accept connections.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Virtual gateway not accepting traffic on ports 1024 or less
<a name="virtual-gateway-low-ports"></a>

**Symptoms**  
Your virtual gateway is not accepting traffic on port 1024 or less, but does accept traffic on a port number that is greater than 1024. For example, you query the Envoy stats with the following command and receive a value other than zero.

```
curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "update_rejected"
```

You might see text similar to the following text in your logs describing a failure to bind to a privileged port:

```
gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) lds_ingress_0.0.0.0_port_<port num>: cannot bind '0.0.0.0:<port num>': Permission denied
```

**Resolution**  
To resolve the issue, the user specified for the gateway needs to have the linux capability `CAP_NET_BIND_SERVICE`. For more information, see [Capabilities](https://www.man7.org/linux/man-pages/man7/capabilities.7.html) in the Linux Programmer's Manual, [Linux parameters](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_linuxparameters) in ECS Task definition parameters, and [Set capabilities for a container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container) in the Kubernetes documentation.

**Important**  
Fargate must use a port value greater than 1024.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

# App Mesh connectivity troubleshooting
<a name="troubleshooting-connectivity"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

This topic details common issues that you may experience with App Mesh connectivity.

## Unable to resolve DNS name for a virtual service
<a name="ts-connectivity-dns-resolution-virtual-service"></a>

**Symptoms**  
Your application is unable to resolve the DNS name of a virtual service that it is attempting to connect to.

**Resolution**  
This is a known issue. For more information, see the [Name VirtualServices by any hostname or FQDN](https://github.com/aws/aws-app-mesh-roadmap/issues/65) GitHub issue. Virtual services in App Mesh can be named anything. As long as there is a DNS `A` record for the virtual service name and the application can resolve the virtual service name, the request will be proxied by Envoy and routed to its appropriate destination. To resolve the issue, add a DNS `A` record to any non-loopback IP address, such as `10.10.10.10`, for the virtual service name. The DNS `A` record can be added under the following conditions:
+ In Amazon Route 53, if the name is suffixed by your private hosted zone name
+ Within the application container's `/etc/hosts` file
+ In a third-party DNS server that you manage

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to connect to a virtual service backend
<a name="ts-connectivity-virtual-service-backend"></a>

**Symptoms**  
Your application is unable to establish a connection to a virtual service defined as a backend on your virtual node. When attempting to establish a connection, the connection may fail entirely, or the request from the application's perspective may fail with an `HTTP 503` response code.

**Resolution**  
If the application fails to connect at all (no `HTTP 503` response code returned), then do the following:
+ Make sure that your compute environment has been set up to work with App Mesh.
  + For Amazon ECS, make sure that you have the appropriate [proxy configuration](proxy-authorization.md) enabled. For an end-to-end walkthrough, see [Getting Started with App Mesh and Amazon ECS](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/appmesh-getting-started.html).
  + For Kubernetes, including Amazon EKS, make sure that you have the latest App Mesh controller installed via Helm. For more information, see [App Mesh Controller](https://hub.helm.sh/charts/aws/appmesh-controller) on Helm Hub or [Tutorial: Configure App Mesh integration with Kubernetes](https://docs.aws.amazon.com/app-mesh/latest/userguide/mesh-k8s-integration.html).
  + For Amazon EC2, make sure that you have setup your Amazon EC2 instance for proxying App Mesh traffic. For more information, see [Update services](https://docs.aws.amazon.com/app-mesh/latest/userguide/appmesh-getting-started.html#update-services).
+ Make sure that the Envoy container that is running on your compute service has successfully connected to the App Mesh Envoy management service. You can confirm this by checking Envoy stats for the field `control_plane.connected_state`. For more information on `control_plane.connected_state`, see [Monitor the Envoy Proxy Connectivity](https://docs.aws.amazon.com/app-mesh/latest/userguide/troubleshooting-best-practices.html#ts-bp-enable-envoy-control-plane-connected-state) in our **Troubleshooting Best Practices**.

  If the Envoy was able to establish the connection initially, but later was disconnected and never reconnected, see [Envoy disconnected from App Mesh Envoy management service with error text](https://docs.aws.amazon.com/app-mesh/latest/userguide/troubleshooting-setup.html#ts-setup-grpc-error-codes) to troubleshoot why it was disconnected.

If the application connects but the request fails with an `HTTP 503` response code, try the following:
+ Make sure that the virtual service you're connecting to exists in the mesh.
+ Make sure that the virtual service has a provider (a virtual router or virtual node).
+ When using Envoy as an HTTP Proxy, if you're seeing egress traffic coming into `cluster.cds_egress_*_mesh-allow-all` instead of the correct destination through Envoy stats, most likely Envoy isn't routing requests properly through `filter_chains`. This can be a result of using an unqualified virtual service name. We recommend that you use the service discovery name of the actual service as the virtual service name, because Envoy proxy communicates with other virtual services through their names.

  For more information, see [virtual services](https://docs.aws.amazon.com/app-mesh/latest/userguide/virtual_services.html).
+ Inspect the Envoy proxy logs for any of the following error messages:
  + `No healthy upstream` – The virtual node that the Envoy proxy is attempting to route to does not have any resolved endpoints, or it does not have any healthy endpoints. Make sure that the target virtual node has the correct service discovery and health check settings.

    If requests to the service are failing during a deployment or scaling of the backend virtual service, follow the guidance in [Some requests fail with HTTP status code `503` when a virtual service has a virtual node provider](#ts-connectivity-virtual-node-provider).
  + `No cluster match for URL` – This is most likely caused when a request is sent to a virtual service that does not match the criteria defined by any of the routes defined under a virtual router provider. Make sure that the requests from the application are sent to a supported route by ensuring the path and HTTP request headers are correct.
  + `No matching filter chain found` – This is most likely caused when a request is sent to a virtual service on an invalid port. Make sure that the requests from the application are using the same port specified on the virtual router.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to connect to an external service
<a name="ts-connectivity-external-service"></a>

**Symptoms**  
Your application is unable to connect to a service outside of the mesh, such as `amazon.com`.

**Resolution**  
By default, App Mesh does not allow outbound traffic from applications within the mesh to any destination outside of the mesh. To enable communication with an external service, there are two options:
+ Set the [outbound filter](https://docs.aws.amazon.com/app-mesh/latest/APIReference/API_EgressFilter.html) on the mesh resource to `ALLOW_ALL`. This setting will allow any application within the mesh to communicate with any destination IP address inside or outside of the mesh.
+ Model the external service in the mesh using a virtual service, virtual router, route, and virtual node. For example, to model the external service `example.com`, you can create a virtual service named `example.com` with a virtual router and route that sends all traffic to a virtual node with a DNS service discovery hostname of `example.com`.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to connect to a MySQL or SMTP server
<a name="ts-connectivity-troubleshooting-mysql-and-smtp"></a>

**Symptoms**  
When allowing outbound traffic to all destinations (Mesh `EgressFilter type`=`ALLOW_ALL`), such as an SMTP server or a MySQL database using a virtual node definition, the connection from your application fails. As an example, the following is an error message from attempting to connect to a MySQL server.

```
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
```

**Resolution**  
This is a known issue that is resolved by using App Mesh image version 1.15.0 or later. For more information, see the [Unable to connect to MySQL with App Mesh](https://github.com/aws/aws-app-mesh-roadmap/issues/62) GitHub issue. This error occurs because the outbound listener in Envoy configured by App Mesh adds the Envoy TLS Inspector listener filter. For more information, see [TLS Inspector](https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/tls_inspector#config-listener-filters-tls-inspector) in the Envoy documentation. This filter evaluates whether or not a connection is using TLS by inspecting the first packet sent from the client. With MySQL and SMTP, however, the server sends the first packet after connection. For more information about MySQL, see [Initial Handshake](https://dev.mysql.com/doc/internals/en/initial-handshake.html) in the MySQL documentation. Because the server sends the first packet, inspection at the filter fails.

**To work around this issue depending on your version of Envoy:**
+ If your App Mesh image Envoy version is 1.15.0 or later, do not model external services such as **MySQL**, **SMTP**, **MSSQL**, etc. as a backend for your application's virtual node.
+ If your App Mesh image Envoy version is prior to 1.15.0, add port `3306` to the list of values for the `APPMESH_EGRESS_IGNORED_PORTS` in your services for **MySQL** and as the port you are using for **STMP**.

**Important**  
While the standard SMTP ports are `25`, `587`, and `465`, you should only add the port you are using to `APPMESH_EGRESS_IGNORED_PORTS` and not all three.

For more information, see [Update services](https://docs.aws.amazon.com/app-mesh/latest/userguide/getting-started-kubernetes.html#create-update-services) for Kubernetes , [Update services](https://docs.aws.amazon.com/app-mesh/latest/userguide/getting-started-ecs.html#update-services) for Amazon ECS, or [Update services](https://docs.aws.amazon.com/app-mesh/latest/userguide/getting-started-ec2.html#update-services) for Amazon EC2. 

If your issue is still not resolved, then you can provide us with details on what you're experiencing using the existing [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/62) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to connect to a service modeled as a TCP virtual node or virtual router in App Mesh
<a name="ts-connectivity-virtual-node-router"></a>

**Symptoms**  
Your application is unable to connect to a backend that uses the TCP protocol setting in the App Mesh [PortMapping](https://docs.aws.amazon.com/app-mesh/latest/APIReference/API_PortMapping.html) definition.

**Resolution**  
This is a known issue. For more information, see [Routing to multiple TCP destinations on the same port](https://github.com/aws/aws-app-mesh-roadmap/issues/195) on GitHub. App Mesh does not currently allow multiple backend destinations modeled as TCP to share the same port due to restrictions in the information provided to the Envoy proxy at OSI Layer 4. To make sure that TCP traffic can be routed appropriately for all backend destinations, do the following: 
+ Make sure that all destinations are using a unique port. If you are using a virtual router provider for the backend virtual service, you can change the virtual router port without changing the port on the virtual nodes that it routes to. This allows the applications to open connections on the virtual router port while the Envoy proxy continues to use the port defined in the virtual node.
+ If the destination modeled as TCP is a MySQL server, or any other TCP-based protocol in which the server sends the first packets after connection, see [Unable to connect to a MySQL or SMTP server](#ts-connectivity-troubleshooting-mysql-and-smtp).

If your issue is still not resolved, then you can provide us with details on what you're experiencing using the existing [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/195) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Connectivity succeeds to service not listed as a virtual service backend for a virtual node
<a name="ts-connectivity-not-virtual-service"></a>

**Symptoms**  
Your application is able to connect and send traffic to a destination that is not specified as a virtual service backend on your virtual node.

**Resolution**  
If requests are succeeding to a destination that has not been modeled in the App Mesh APIs, then the most likely cause is that the mesh's [outbound filter](https://docs.aws.amazon.com/app-mesh/latest/APIReference/API_EgressFilter.html) type has been set to `ALLOW_ALL`. When the outbound filter is set to `ALLOW_ALL`, an outbound request from your application that does not match a modeled destination (backend) will be sent to the destination IP address set by the application. 

If you want to disallow traffic to destinations not modeled in the mesh, consider setting the outbound filter value to `DROP_ALL`.

**Note**  
Setting the mesh outbound filter value affects all virtual nodes within the mesh.  
Configuring `egress_filter` as `DROP_ALL` and enabling TLS isn't available for outbound traffic that isn't to an AWS domain.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Some requests fail with HTTP status code `503` when a virtual service has a virtual node provider
<a name="ts-connectivity-virtual-node-provider"></a>

**Symptoms**  
A portion of your application's requests fail to a virtual service backend that is using a virtual node provider instead of a virtual router provider. When using a virtual router provider for the virtual service, requests do not fail.

**Resolution**  
This is a known issue. For more information, see [Retry policy on Virtual Node provider for a Virtual Service](https://github.com/aws/aws-app-mesh-roadmap/issues/194) on GitHub. When using a virtual node as a provider for a virtual service, you cannot specify the default retry policy that you want the clients of your virtual service to use. By comparison, virtual router providers allow retry policies to be specified because they are a property of the child route resources.

To reduce request failures to virtual node providers, use a virtual router provider instead, and specify a retry policy on its routes. For other ways to reduce request failures to your applications, see [App Mesh best practices](best-practices.md). 

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to connect to an Amazon EFS filesystem
<a name="ts-connectivity-efs"></a>

**Symptoms**  
When configuring an Amazon ECS task with an Amazon EFS filesystem as a volume, the task fails to start with the following error.

```
ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: mount.nfs4: Connection refused : unsuccessful EFS utils command execution; code: 32
```

**Resolution**  
This is a known issue. This error occurs because the NFS connection to Amazon EFS occurs before any containers in your task are started. This traffic is routed by the proxy configuration to Envoy, which will not be running at this point. Because of the ordering of startup, the NFS client fails to connecting to the Amazon EFS filesystem and the task fails to launch. To resolve the issue, add port `2049` to the list of values for the `EgressIgnoredPorts` setting in the proxy configuration of your Amazon ECS task definition. For more information, see [Proxy configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#proxyConfiguration).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Connectivity succeeds to service, but the incoming request does not appear in access logs, traces, or metrics for Envoy
<a name="ts-connectivity-iptables"></a>

**Symptoms**  
 Even though your application can connect and send requests to another application, you either can not see incoming requests in the access logs or in tracing information for the Envoy proxy.

**Resolution**  
This is a known issue. From more information, see [iptables rules setup](https://github.com/aws/aws-app-mesh-roadmap/issues/166) issue on Github. The Envoy proxy only intercepts inbound traffic to the port of which its corresponding virtual node is listening on. Requests to any other port will bypass the Envoy proxy and reach to the service behind it directly. In order to let the Envoy proxy intercept the inbound traffic for your service you need to set your virtual node and service to listen on the same port.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Setting the `HTTP_PROXY`/`HTTPS_PROXY` environment variables at container level doesn't work as expected.
<a name="http-https-proxy"></a>

**Symptoms**  
When HTTP\$1PROXY/HTTPS\$1PROXY is set as an environment variable at the:
+ App container in the task definition with App Mesh enabled, requests being sent to the namespace of the App Mesh services will get `HTTP 500` error responses from the Envoy sidecar.
+ Envoy container in task definition with App Mesh enabled, requests coming out of Envoy sidecar will not go through the `HTTP`/`HTTPS` proxy server, and the environment variable will not work.

**Resolution**  
For the app container:

App Mesh functions by having traffic within your task go through the Envoy proxy. `HTTP_PROXY`/`HTTPS_PROXY` configuration overrides this behavior by configuring container traffic to go through a different external proxy. The traffic will still be intercepted by Envoy, but it doesn't support proxying the mesh traffic using an external proxy.

If you want to proxy all non-mesh traffic, please set `NO_PROXY` to include your mesh's CIDR/namespace, localhost, and the credential's endpoints like in the following example.

```
NO_PROXY=localhost,127.0.0.1,169.254.169.254,169.254.170.2,10.0.0.0/16
```

For the Envoy container:

Envoy doesn't support a generic proxy. We do not recommend setting these variables.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Upstream request timeouts even after setting the timeout for routes.
<a name="upstream-timeout-request"></a>

**Symptoms**  
You defined the timeout for:
+ The routes, but you are still getting an upstream request timeout error.
+ The virtual node listener and the timeout and the retry timeout for the routes, but you are still getting an upstream request timeout error.

**Resolution**  
For the high latency requests greater than 15 seconds to complete successfully, you need to specify a timeout at both the route and virtual node listener level.

If you specify a route timeout that is greater than the default 15 seconds, make sure that the timeout is also specified for the listener for all participating virtual nodes. However, if you decrease the timeout to a value that is lower than the default, it's optional to update the timeouts at virtual nodes. For more information about options when setting up virtual nodes and routes, see [virtual nodes](https://docs.aws.amazon.com/app-mesh/latest/userguide/virtual_nodes.html) and [routes](https://docs.aws.amazon.com/app-mesh/latest/userguide/routes.html).

If you specified a **retry policy**, the duration that you specify for the request timeout should always be greater than or equal to the *retry timeout* multiplied by the *max retries* that you defined in the **retry policy**. This allows your request with all the retries to complete successfully. For more information, see [routes](https://docs.aws.amazon.com/app-mesh/latest/userguide/routes.html).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Envoy responds with HTTP Bad request.
<a name="ts-http-bad-request"></a>

**Symptoms**  
Envoy responds with **HTTP 400 Bad request** for all requests sent through the Network Load Balancer (NLB). When we check the Envoy logs, we see:
+ Debug logs:

  ```
  dispatch error: http/1.1 protocol error: HPE_INVALID_METHOD
  ```
+ Access logs:

  ```
  "- - HTTP/1.1" 400 DPE 0 11 0 - "-" "-" "-" "-" "-"
  ```

**Resolution**  
The resolution is to disable the proxy protocol version 2 (PPv2) on your NLB's [target group attributes](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#target-group-attributes).

As of today the PPv2 is not supported by virtual gateway and virtual node Envoy that are run using the App Mesh control plane. If you deploy NLB using AWS load balancer controller on Kubernetes, then disable PPv2 by setting the following attribute to `false`:

```
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: proxy_protocol_v2.enabled
```

See [AWS Load Balancer Controller Annotations](https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/annotations/#resource-attributestrue) for more details about NLB resource attributes.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to configure timeout properly.
<a name="ts-configure-timeout"></a>

**Symptoms**  
Your request timeouts within 15 seconds even after configuring the timeout on the virtual node listener and the timeout on the route towards virtual node backend.

**Resolution**  
 Make sure that the correct virtual service is included under the backend list.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

# App Mesh scaling troubleshooting
<a name="troubleshooting-scaling"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

This topic details common issues that you may experience with App Mesh scaling.

## Connectivity fails and container health checks fail when scaling beyond 50 replicas for a virtual node/virtual gateway
<a name="ts-scaling-exceed-virtual-node-envoy-quota"></a>

**Symptoms**  
When you are scaling the number of replicas, such as Amazon ECS tasks, Kubernetes pods, or Amazon EC2 instances, for a virtual node/virtual gateway beyond 50, Envoy container health checks for new and currently running Envoys begin to fail. Downstream applications sending traffic to the virtual node/virtual gateway begin seeing request failures with HTTP status code `503`.

**Resolution**  
App Mesh's default quota for the number of Envoys per virtual node/virtual gateway is 50. When the number of running Envoys exceeds this quota, new and currently running Envoys fail to connect to App Mesh's Envoy management service with gRPC status code `8` (`RESOURCE_EXHAUSTED`). This quota can be raised. For more information, see [App Mesh service quotas](service-quotas.md).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Requests fail with `503` when a virtual service backend horizontally scales out or in
<a name="ts-scaling-out-in"></a>

**Symptoms**  
When a backend virtual service is horizontally scaled out or in, requests from downstream applications fail with an `HTTP 503` status code.

**Resolution**  
App Mesh recommends several approaches to mitigate failure cases while scaling applications horizontally. For detailed information about how to prevent these failures, see [App Mesh best practices](best-practices.md).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Envoy container crashes with segfault under increased load
<a name="ts-scaling-segfault"></a>

**Symptoms**  
Under a high traffic load, the Envoy proxy crashes due to a segmentation fault (Linux exit code `139`). The Envoy process logs contain a statement like the following.

```
Caught Segmentation fault, suspect faulting address 0x0"
```

**Resolution**  
The Envoy proxy has likely breached the operating system's default nofile ulimit, the limit on the number of files a process can have open at a time. This breach is due to the traffic causing more connections, which consume additional operating system sockets. To resolve this issue, increase the ulimit nofile value on the host operating system. If you are using Amazon ECS, this limit can be changed through the [Ulimit settings](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_Ulimit.html) on the task definition's [resource limits settings](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_limits).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Increase in default resources is not reflected in Service Limits
<a name="default-resources-increase"></a>

**Symptoms**  
After increasing the default limit of App Mesh resources, the new value is not reflected when you look at your service limits.

**Resolution**  
While the new limits aren't currently shown, customers can still exercise them.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Application crashes due to a huge number of health checks calls.
<a name="crash-health-checks"></a>

**Symptoms**  
After enabling active health checks for a virtual node, there is an uptick in the number of health check calls. The application crashes due to the greatly increased volume of health check calls made to the application.

**Resolution**  
When active health checking is enabled, each Envoy endpoint of the downstream (client) sends health requests to each endpoint of the upstream cluster (server) in order to make routing decisions. As a result the total number of health check requests would be `number of client Envoys` \$1 `number of server Envoys` \$1 `active health check frequency`.

To resolve this issue, modify the frequency of the health check probe, which would reduce the total volume of health check probes. In addition to active health checks, App Mesh allows configuring [outlier detection](https://docs.aws.amazon.com/app-mesh/latest/APIReference/API_OutlierDetection.html) as means of passive health checking. Use outlier detection to configure when to remove a particular host based on consecutive `5xx` responses.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

# App Mesh observability troubleshooting
<a name="troubleshooting-observability"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

This topic details common issues that you may experience with App Mesh observability.

## Unable to see AWS X-Ray traces for my applications
<a name="ts-observability-x-ray-traces"></a>

**Symptoms**  
Your application in App Mesh is not displaying X-Ray tracing information in the X-Ray console or APIs.

**Resolution**  
To use X-Ray in App Mesh, you must correctly configure components to enable communication between your application, sidecar containers, and the X-Ray service. Take the following steps to confirm that X-Ray has been set up correctly:
+ Make sure the App Mesh Virtual Node listener protocol is not set as `TCP`.
+ Make sure that the X-Ray container that is deployed with your application exposes UDP port `2000` and runs as user `1337`. For more information, see the [Amazon ECS X-Ray example](https://github.com/aws/aws-app-mesh-examples/blob/main/walkthroughs/howto-ecs-basics/deploy/2-meshify.yaml#L374-L386) on GitHub.
+ Make sure that the Envoy container has tracing enabled. If you are using the [App Mesh Envoy image](envoy.md), you can enable X-Ray by setting the `ENABLE_ENVOY_XRAY_TRACING` environment variable to a value of `1` and the `XRAY_DAEMON_PORT` environment variable to `2000`.
+ If you’ve instrumented X-Ray in your application code with one of the [language-specific SDKs ](https://docs.aws.amazon.com/xray/index.html), then make sure that it is configured correctly by following the guides for your language.
+ If all of the previous items are configured correctly, then review the X-Ray container logs for errors and follow the guidance in [Troubleshooting AWS X-Ray](https://docs.aws.amazon.com/xray/latest/devguide/xray-troubleshooting.html). A more detailed explanation of X-Ray integration in App Mesh can be found in [Integrating X-Ray with App Mesh](https://aws.amazon.com/blogs/compute/integrating-aws-x-ray-with-aws-app-mesh/).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to see Envoy metrics for my applications in Amazon CloudWatch metrics
<a name="ts-observability-envoy-metrics"></a>

**Symptoms**  
Your application in App Mesh is not emitting metrics generated by the Envoy proxy to CloudWatch metrics.

**Resolution**  
When you use CloudWatch metrics in App Mesh, you must correctly configure several components to enable communication between your Envoy proxy, CloudWatch agent sidecar, and the CloudWatch metrics service. Take the following steps to confirm that CloudWatch metrics for Envoy proxy have been setup correctly:
+ Make sure that you are using the CloudWatch agent image for App Mesh. For more information, see [App Mesh CloudWatch agent](https://github.com/aws-samples/aws-app-mesh-cloudwatch-agent) on GitHub.
+ Make sure that you have configured the CloudWatch agent for App Mesh appropriately by following the platform-specific usage instructions. For more information, see [App Mesh CloudWatch agent](https://github.com/aws-samples/aws-app-mesh-cloudwatch-agent#usage) on GitHub.
+ If all of the previous items are configured correctly, then review the CloudWatch agent container logs for errors and follow the guidance provided in [Troubleshooting the CloudWatch agent](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/troubleshooting-CloudWatch-Agent.html).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Unable to configure custom sampling rules for AWS X-Ray traces
<a name="ts-observability-custom-sampling"></a>

**Symptoms**  
Your application is using X-Ray tracing, but you are unable to configure sampling rules for your traces.

**Resolution**  
Since App Mesh Envoy currently does not support **Dynamic X-Ray sampling configuration**, the following workarounds are available.

If your Envoy version is `1.19.1` or later, you have the following options.
+ To only set the sampling rate, use the `XRAY_SAMPLING_RATE` environment variable on the Envoy container. The value should be specified as a decimal between `0` and `1.00` (100%). For more information, see [AWS X-Ray variables](envoy-config.md#envoy-xray-config).
+ To configure the localized custom sampling rules for the X-Ray tracer use the `XRAY_SAMPLING_RULE_MANIFEST` environment variable to specify a file path in the Envoy container file system. For more information, see [Sampling rules](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-go-configuration.html#xray-sdk-go-configuration-sampling) in the *AWS X-Ray Developer Guide*.

If your Envoy version is prior to `1.19.1`, then do the following.
+ Use the `ENVOY_TRACING_CFG_FILE` environment variable to change your sampling rate. For more information, see [Envoy configuration variables](envoy-config.md). Specify a custom tracing configuration and define local sampling rules. For more information, see [Envoy X-Ray config](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/trace/v3/xray.proto.html#config-trace-v3-xrayconfig).
+ Custom tracing configuration for the `ENVOY_TRACING_CFG_FILE` environment variable example:

  ```
  tracing:
     http:
       name: envoy.tracers.xray
       typedConfig:
         "@type": type.googleapis.com/envoy.config.trace.v3.XRayConfig
         segmentName: foo/bar
         segmentFields:
           origin: AWS::AppMesh::Proxy
           aws:
             app_mesh:
               mesh_name: foo
               virtual_node_name: bar
         daemonEndpoint:
               protocol: UDP
               address: 127.0.0.1
               portValue: 2000
         samplingRuleManifest:
               filename: /tmp/sampling-rules.json
  ```
+ For details on configuration for the sampling rule manifest in the `samplingRuleManifest` property, see [Configuring the X-Ray SDK for Go](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-go-configuration.html#xray-sdk-go-configuration-sampling).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

# App Mesh security troubleshooting
<a name="troubleshooting-security"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

This topic details common issues that you may experience with App Mesh security.

## Unable to connect to a backend virtual service with a TLS client policy
<a name="ts-security-tls-client-policy"></a>

**Symptoms**  
When adding a TLS client policy to a virtual service backend in a virtual node, connectivity to that backend fails. When attempting to send traffic to the backend service, the requests fail with an `HTTP 503` response code and the error message: `upstream connect error or disconnect/reset before headers. reset reason: connection failure`.

**Resolution**  
In order to determine the root cause of the issue, we recommend using the Envoy proxy process logs to help you diagnose the issue. For more information, see [Enable Envoy debug logging in pre-production environments](troubleshooting-best-practices.md#ts-bp-enable-envoy-debug-logging). Use the following list to determine the cause of the connection failure:
+ Make sure connectivity to the backend is succeeding by ruling out the errors mentioned in [Unable to connect to a virtual service backend](troubleshooting-connectivity.md#ts-connectivity-virtual-service-backend).
+ In the Envoy process logs, look for the following errors (logged at debug level).

  ```
  TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
  ```

  This error is caused by one or more of the following reasons:
  + The certificate was not signed by one of the certificate authorities defined in the TLS client policy trust bundle.
  + The certificate is no longer valid (expired).
  + The Subject Alternative Name (SAN) does not match the requested DNS hostname.
  + Make sure that the certificate offered by the backend service is valid, that it is signed by one of the certificate authorities in your TLS client policies trust bundle, and that it meets the criteria defined in [Transport Layer Security (TLS)](tls.md).
  + If the error you receive is like the one below, then that means the request is bypassing the Envoy proxy and reaching the application directly. When sending traffic, the stats on Envoy don't change indicating that Envoy isn't on the path to decrypt the traffic. In the proxy configuration of the virtual node, make sure the `AppPorts` contains the correct value that the application is listening on.

    ```
    upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
    ```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue-bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/). If you believe that you’ve found a security vulnerability or have questions about App Mesh’s security, then see the [AWS vulnerability reporting guidelines](https://aws.amazon.com/security/vulnerability-reporting/).

## Unable to connect to a backend virtual service when application is originating TLS
<a name="ts-security-originating-tls"></a>

**Symptoms**  
When originating a TLS session from an application, instead of from the Envoy proxy, connectivity to a backend virtual service fails.

**Resolution**  
This is a known issue. For more information, see the [Feature Request: TLS negotiation between the downstream application and upstream proxy](https://github.com/aws/aws-app-mesh-roadmap/issues/162) GitHub issue. In App Mesh, TLS origination is currently supported from the Envoy proxy but not from the application. To use TLS origination support at the Envoy, disable TLS origination in the application. This allows the Envoy to read the outbound request headers and forward the request to the appropriate destination through a TLS session. For more information, see [Transport Layer Security (TLS)](tls.md). 

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/). If you believe that you’ve found a security vulnerability or have questions about App Mesh’s security, then see the [AWS vulnerability reporting guidelines](https://aws.amazon.com/security/vulnerability-reporting/).

## Unable to assert that connectivity between Envoy proxies is using TLS
<a name="ts-security-tls-between-proxies"></a>

**Symptoms**  
Your application has enabled TLS termination on the virtual node or virtual gateway listener, or TLS origination on the backend TLS client policy, but you are unable to assert that connectivity between Envoy proxies is occurring over a TLS-negotiated session.

**Resolution**  
Steps defined in this resolution make use of the Envoy administration interface and Envoy statistics. For help configuring these, see [Enable the Envoy proxy administration interface](troubleshooting-best-practices.md#ts-bp-enable-proxy-admin-interface) and [Enable Envoy DogStatsD integration for metric offload](troubleshooting-best-practices.md#ts-bp-enable-envoy-statsd-integration). The following statistics examples use the administration interface for simplicity.
+ For the Envoy proxy performing TLS termination:
  + Make sure that the TLS certificate has been bootstrapped in the Envoy configuration with the following command.

    ```
    curl http://my-app.default.svc.cluster.local:9901/certs
    ```

    In the returned output, you should see at least one entry under `certificates[].cert_chain` for the certificate used in TLS termination.
  + Make sure that the number of successful inbound connections to the proxy’s listener is exactly the same as the number of SSL handshakes plus the number of SSL sessions re-used, as shown by the following example commands and output.

    ```
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "listener.0.0.0.0_15000" | grep downstream_cx_total
    listener.0.0.0.0_15000.downstream_cx_total: 11
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "listener.0.0.0.0_15000" | grep ssl.connection_error
    listener.0.0.0.0_15000.ssl.connection_error: 1
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "listener.0.0.0.0_15000" | grep ssl.handshake
    listener.0.0.0.0_15000.ssl.handshake: 9
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "listener.0.0.0.0_15000" | grep ssl.session_reused
    listener.0.0.0.0_15000.ssl.session_reused: 1
    # Total CX (11) - SSL Connection Errors (1) == SSL Handshakes (9) + SSL Sessions Re-used (1)
    ```
+ For the Envoy proxy performing TLS origination:
  + Make sure that the TLS trust store has been bootstrapped in the Envoy configuration with the following command.

    ```
    curl http://my-app.default.svc.cluster.local:9901/certs
    ```

    You should see at least one entry under `certificates[].ca_certs` for the certificates used in validating the backend’s certificate during TLS origination.
  + Make sure that the number of successful outbound connections to the backend cluster is exactly the same as the number of SSL handshakes plus the number of SSL sessions re-used, as shown by the following example commands and output.

    ```
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "virtual-node-name" | grep upstream_cx_total
    cluster.cds_egress_mesh-name_virtual-node-name_protocol_port.upstream_cx_total: 11
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "virtual-node-name" | grep ssl.connection_error
    cluster.cds_egress_mesh-name_virtual-node-name_protocol_port.ssl.connection_error: 1
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "virtual-node-name" | grep ssl.handshake
    cluster.cds_egress_mesh-name_virtual-node-name_protocol_port.ssl.handshake: 9
    curl -s http://my-app.default.svc.cluster.local:9901/stats | grep "virtual-node-name" | grep ssl.session_reused
    cluster.cds_egress_mesh-name_virtual-node-name_protocol_port.ssl.session_reused: 1
    # Total CX (11) - SSL Connection Errors (1) == SSL Handshakes (9) + SSL Sessions Re-used (1)
    ```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/). If you believe that you’ve found a security vulnerability or have questions about App Mesh’s security, then see the [AWS vulnerability reporting guidelines](https://aws.amazon.com/security/vulnerability-reporting/).

## Troubleshooting TLS with Elastic Load Balancing
<a name="ts-security-tls-elb"></a>

**Symptoms**  
When attempting to configure an Application Load Balancer or Network Load Balancer to encrypt traffic to a virtual node, connectivity and load balancer health checks can fail.

**Resolution**  
In order to determine the root cause of the issue, you need to check the following:
+ For the Envoy proxy performing TLS termination, you need to rule out any misconfiguration. Follow the steps provided above in the [Unable to connect to a backend virtual service with a TLS client policy](#ts-security-tls-client-policy).
+ For the load balancer, you need to look at the configuration of the `TargetGroup:`
  + Make sure that the `TargetGroup` port matches the virtual node’s defined listener port.
  + For Application Load Balancers that are originating TLS connections over HTTP to your service, make sure that the `TargetGroup` protocol is set to `HTTPS`. If health checks are being utilized, make sure that `HealthCheckProtocol` is set to `HTTPS`. 
  + For Network Load Balancers that are originating TLS connections over TCP to your service, make sure that the `TargetGroup` protocol is set to `TLS`. If health checks are being utilized, make sure that `HealthCheckProtocol` is set to `TCP`.
**Note**  
Any updates to `TargetGroup` require changing the `TargetGroup` name.

With this configured properly, your load balancer should provide a secure connection to your service using the certificate provided to the Envoy proxy.

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/). If you believe that you’ve found a security vulnerability or have questions about App Mesh’s security, then see the [AWS vulnerability reporting guidelines](https://aws.amazon.com/security/vulnerability-reporting/).

# App Mesh Kubernetes troubleshooting
<a name="troubleshooting-kubernetes"></a>

**Important**  
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post [Migrating from AWS App Mesh to Amazon ECS Service Connect](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-ecs-service-connect). 

This topic details common issues that you may experience when you use App Mesh with Kubernetes.

## App Mesh resources created in Kubernetes cannot be found in App Mesh
<a name="ts-kubernetes-missing-resources"></a>

**Symptoms**  
You have created the App Mesh resources using the Kubernetes custom resource definition (CRD), but the resources that you created are not visible in App Mesh when you use the AWS Management Console or APIs.

**Resolution**  
The likely cause is an error in the Kubernetes controller for App Mesh. For more information, see [Troubleshooting](https://github.com/aws/aws-app-mesh-controller-for-k8s/blob/master/docs/guide/troubleshooting.md) on GitHub. Check the controller logs for any errors or warnings indicating that the controller could not create any resources. 

```
kubectl logs -n appmesh-system -f \
    $(kubectl get pods -n appmesh-system -o name | grep controller)
```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Pods are failing readiness and liveliness checks after Envoy sidecar is injected
<a name="ts-kubernetes-pods-after-injection"></a>

**Symptoms**  
Pods for your application were previously running successfully, but after the Envoy sidecar is injected into a pod, readiness and liveliness checks begin failing.

**Resolution**  
Make sure that the Envoy container that was injected into the pod has bootstrapped with App Mesh’s Envoy management service. You can verify any errors by referencing the error codes in [Envoy disconnected from App Mesh Envoy management service with error text](troubleshooting-setup.md#ts-setup-grpc-error-codes). You can use the following command to inspect Envoy logs for the relevant pod.

```
kubectl logs -n appmesh-system -f \
    $(kubectl get pods -n appmesh-system -o name | grep controller) \
    | grep "gRPC config stream closed"
```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Pods not registering or deregistering as AWS Cloud Map instances
<a name="ts-kubernetes-pods-cmap"></a>

**Symptoms**  
Your Kubernetes pods are not being registered in or de-registered from AWS Cloud Map as part of their life cycle. A pod may start successfully and be ready to serve traffic, but not receive any. When a pod is terminated, clients may still retain its IP address and attempt to send traffic to it, failing.

**Resolution**  
This is a known issue. For more information, see the [Pods don't get auto registered/deregistered in Kubernetes with AWS Cloud Map](https://github.com/aws/aws-app-mesh-controller-for-k8s/issues/159) GitHub issue. Due to the relationship between pods, App Mesh virtual nodes, and AWS Cloud Map resources, the [App Mesh controller for Kubernetes](https://github.com/aws/aws-app-mesh-controller-for-k8s) may become desynchronized and lose resources. For example, this can happen if a virtual node resource is deleted from Kubernetes before terminating its associated pods. 

To mitigate this issue:
+ Make sure that you are running the latest version of the App Mesh controller for Kubernetes.
+ Make sure that the AWS Cloud Map `namespaceName` and `serviceName` are correct in your virtual node definition.
+ Make sure that you delete any associated pods prior to deleting your virtual node definition. If you need help identifying which pods are associated with a virtual node, see [Cannot determine where a pod for an App Mesh resource is running](#ts-kubernetes-where-pod-running).
+ If your issue persists, run the following command to inspect your controller logs for errors that may help reveal the underlying issue.

  ```
  kubectl logs -n appmesh-system \
      $(kubectl get pods -n appmesh-system -o name | grep appmesh-controller)
  ```
+ Consider using the following command to restart your controller pods. This may fix synchronization issues.

  ```
  kubectl delete -n appmesh-system \
      $(kubectl get pods -n appmesh-system -o name | grep appmesh-controller)
  ```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Cannot determine where a pod for an App Mesh resource is running
<a name="ts-kubernetes-where-pod-running"></a>

**Symptoms**  
When you run App Mesh on a Kubernetes cluster, an operator cannot determine where a workload, or pod, is running for a given App Mesh resource.

**Resolution**  
Kubernetes pod resources are annotated with the mesh and virtual node that they are associated to. You can query which pods are running for a given virtual node name with the following command.

```
kubectl get pods --all-namespaces -o json | \
    jq '.items[] | { metadata } | select(.metadata.annotations."appmesh.k8s.aws/virtualNode" == "virtual-node-name")'
```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Cannot determine what App Mesh resource a pod is running as
<a name="ts-kubernetes-pod-running-as"></a>

**Symptoms**  
When running App Mesh on a Kubernetes cluster, an operator cannot determine what App Mesh resource a given pod is running as.

**Resolution**  
Kubernetes pod resources are annotated with the mesh and virtual node that they are associated to. You can output the mesh and virtual node names by querying the pod directly using the following command.

```
kubectl get pod pod-name -n namespace -o json | \
    jq '{ "mesh": .metadata.annotations."appmesh.k8s.aws/mesh", "virtualNode": .metadata.annotations."appmesh.k8s.aws/virtualNode" }'
```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## Client Envoys are not able to communicate with App Mesh Envoy Management Service with IMDSv1 disabled
<a name="ts-kubernetes-imdsv1-disabled"></a>

**Symptoms**  
When `IMDSv1` is disabled, client Envoys aren't able to communicate with the App Mesh control plane (Envoy Management Service). `IMDSv2` support is not available on App Mesh Envoy version before `v1.24.0.0-prod`.

**Resolution**  
To resolve this issue, you can do one of these three things.
+ Upgrade to App Mesh Envoy version `v1.24.0.0-prod` or later, which has `IMDSv2` support.
+ Re-enable `IMDSv1` on the Instance where Envoy is running. For instructions on restoring `IMDSv1`, see [Configure the instance metadata options](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html).
+ If your services are running on Amazon EKS, it is recommended to use IAM roles for service accounts (IRSA) for fetching credentials. For instructions to enable IRSA, see [IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).

## IRSA does not work on application container when App Mesh is enabled and Envoy is injected
<a name="ts-kubernetes-irsa-not-working"></a>

**Symptoms**  
When App Mesh is enabled on an Amazon EKS cluster with the help of the App Mesh controller for Amazon EKS, Envoy and `proxyinit` containers are injected into the application pod. The application is not able to assume `IRSA` and instead assumes the `node role`. When we describe the pod details, we then see that either the `AWS_WEB_IDENTITY_TOKEN_FILE` or `AWS_ROLE_ARN` environment variable are not included in the application container.

**Resolution**  
If either `AWS_WEB_IDENTITY_TOKEN_FILE` or `AWS_ROLE_ARN` environment variables are defined, then the webhook will skip the pod. Don't provide either of these variables and the webhook will take care of injecting them for you.

```
reservedKeys := map[string]string{
        "AWS_ROLE_ARN":                "",
        "AWS_WEB_IDENTITY_TOKEN_FILE": "",
    }
    ...
    for _, env := range container.Env {
        if _, ok := reservedKeys[env.Name]; ok {
            reservedKeysDefined = true
        }
```

If your issue is still not resolved, then consider opening a [GitHub issue](https://github.com/aws/aws-app-mesh-roadmap/issues/new?assignees=&labels=Bug&template=issue--bug-report.md&title=Bug%3A+describe+bug+here) or contact [AWS Support](https://aws.amazon.com/premiumsupport/).