Deployment issues Connectivity issues Performance issues Monitoring and debugging tools

Troubleshooting Amazon ECS Express Mode services

This section helps you identify and resolve common issues when deploying and managing Express Mode services.

Deployment issues

Service stuck in ACTIVE or DRAINING status

Symptoms: DescribeServiceRevisions shows resources are still provisioning or deprovisioning. DescribeServices shows deployment not stabilized

Possible causes and solutions:

Insufficient IAM permissions - Verify that the task execution role and infrastructure role have the necessary permissions as shown in their respective managed policies.
```
# Check if the role has the required managed policy
aws iam list-attached-role-policies --role-name ecsTaskExecutionRole
                    
```

Image pull failures - Ensure the container image exists and is accessible.



# Test image pull manually
docker pull 123456789012.dkr.ecr.us-west-2.amazonaws.com/my-app:latest

Network connectivity issues - Check that subnets have internet access or Amazon VPC endpoints for AWS services.
Resource limits - Verify that your account has sufficient Fargate capacity and hasn't reached service quotas.

Diagnostic steps:

Use DescribeExpressGatewayService to get your current Service Revisionfollowed by DescribeServiceRevisions for the ServiceRevision to get status on the provisioning or deprovisioning
Check the service events in the Amazon ECS console for detailed error messages.
Check the container port was set correctly
Check AWS service quotas for Amazon ECS and Fargate.

Task startup failures

Symptoms: Tasks fail to start or immediately stop after starting.

Common causes:

Application errors - The container application exits due to configuration or runtime errors.
Health check failures - The application doesn't respond to health checks on the expected port or path.
Resource constraints - Insufficient CPU or memory allocation for the application.
Missing environment variables or secrets - Required configuration is not available to the application.

Resolution steps:

Check application logs in CloudWatch Logs, obtain the log group name from DescribeServiceRevisions:



aws logs describe-log-streams --log-group-name /ecs/express-service-my-app
aws logs get-log-events --log-group-name /ecs/express-service-my-app --log-stream-name stream-name

Verify that the health check path returns HTTP 200 status.
Test the container image locally to ensure it starts correctly.
Review and adjust CPU and memory allocations if needed.

Connectivity issues

Application unreachable via load balancer

Symptoms: The application URL returns timeouts or connection errors.

Troubleshooting steps:

Validate your resources have finished provisioning

Verify that tasks are running and healthy:



aws ecs describe-services --cluster my-cluster --services my-express-service

Check Application Load Balancer target group health:



aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/name/id

Ensure the application is listening on the correct port inside the container.

Performance issues

Slow response times

Symptoms: Application responses are slower than expected.

Diagnostic approach:

Monitor CPU and memory utilization:



# Check CloudWatch metrics for the service
aws cloudwatch get-metric-statistics \
    --namespace AWS/ECS \
    --metric-name CPUUtilization \
    --dimensions Name=ServiceName,Value=my-express-service Name=ClusterName,Value=my-cluster \
    --start-time 2024-01-01T00:00:00Z \
    --end-time 2024-01-01T01:00:00Z \
    --period 300 \
    --statistics Average

Review application logs for errors or performance warnings.
Check if auto scaling is responding appropriately to load.
Analyze load balancer metrics for request distribution.

Optimization strategies:

Increase CPU or memory allocation if resources are constrained.
Adjust auto scaling thresholds to scale out earlier.
Optimize application code and database queries.

Auto scaling not working as expected

Symptoms: The service doesn't scale up during high load or doesn't scale down during low load.

Troubleshooting steps:

Check auto scaling policies and their configuration:



aws application-autoscaling describe-scaling-policies \
    --service-namespace ecs \
    --resource-id service/my-cluster/my-express-service

Review CloudWatch metrics to ensure scaling triggers are being met.
Verify that the service has permission to scale (check IAM roles).
Check for any scaling activities and their outcomes.

Monitoring and debugging tools

Using CloudWatch Container Insights

Enable Container Insights for comprehensive monitoring:



aws ecs put-account-setting --name containerInsights --value enabled

Container Insights provides:

CPU, memory, disk, and network metrics
Performance monitoring dashboards
Log correlation and analysis
Anomaly detection

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Best practices for Amazon ECS Express Mode services

Updating Resources Outside of Express Mode