Troubleshooting Amazon ECS Express Mode services
This section helps you identify and resolve common issues when deploying and managing Express Mode services.
Deployment issues
Service stuck in ACTIVE or DRAINING status
Symptoms: DescribeServiceRevisions shows resources are still provisioning or deprovisioning. DescribeServices shows deployment not stabilized
Possible causes and solutions:
-
Insufficient IAM permissions - Verify that the task execution role and infrastructure role have the necessary permissions as shown in their respective managed policies.
# Check if the role has the required managed policy aws iam list-attached-role-policies --role-name ecsTaskExecutionRole -
Image pull failures - Ensure the container image exists and is accessible.
# Test image pull manually docker pull 123456789012.dkr.ecr.us-west-2.amazonaws.com/my-app:latest -
Network connectivity issues - Check that subnets have internet access or Amazon VPC endpoints for AWS services.
-
Resource limits - Verify that your account has sufficient Fargate capacity and hasn't reached service quotas.
Diagnostic steps:
Use DescribeExpressGatewayService to get your current Service Revisionfollowed by DescribeServiceRevisions for the ServiceRevision to get status on the provisioning or deprovisioning
Check the service events in the Amazon ECS console for detailed error messages.
Check the container port was set correctly
Check AWS service quotas for Amazon ECS and Fargate.
Task startup failures
Symptoms: Tasks fail to start or immediately stop after starting.
Common causes:
-
Application errors - The container application exits due to configuration or runtime errors.
-
Health check failures - The application doesn't respond to health checks on the expected port or path.
-
Resource constraints - Insufficient CPU or memory allocation for the application.
-
Missing environment variables or secrets - Required configuration is not available to the application.
Resolution steps:
-
Check application logs in CloudWatch Logs, obtain the log group name from DescribeServiceRevisions:
aws logs describe-log-streams --log-group-name /ecs/express-service-my-app aws logs get-log-events --log-group-name /ecs/express-service-my-app --log-stream-name stream-name Verify that the health check path returns HTTP 200 status.
Test the container image locally to ensure it starts correctly.
Review and adjust CPU and memory allocations if needed.
Connectivity issues
Application unreachable via load balancer
Symptoms: The application URL returns timeouts or connection errors.
Troubleshooting steps:
-
Validate your resources have finished provisioning
-
Verify that tasks are running and healthy:
aws ecs describe-services --cluster my-cluster --services my-express-service -
Check Application Load Balancer target group health:
aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/name/id -
Ensure the application is listening on the correct port inside the container.
Performance issues
Slow response times
Symptoms: Application responses are slower than expected.
Diagnostic approach:
-
Monitor CPU and memory utilization:
# Check CloudWatch metrics for the service aws cloudwatch get-metric-statistics \ --namespace AWS/ECS \ --metric-name CPUUtilization \ --dimensions Name=ServiceName,Value=my-express-service Name=ClusterName,Value=my-cluster \ --start-time 2024-01-01T00:00:00Z \ --end-time 2024-01-01T01:00:00Z \ --period 300 \ --statistics Average Review application logs for errors or performance warnings.
Check if auto scaling is responding appropriately to load.
Analyze load balancer metrics for request distribution.
Optimization strategies:
Increase CPU or memory allocation if resources are constrained.
Adjust auto scaling thresholds to scale out earlier.
Optimize application code and database queries.
Auto scaling not working as expected
Symptoms: The service doesn't scale up during high load or doesn't scale down during low load.
Troubleshooting steps:
-
Check auto scaling policies and their configuration:
aws application-autoscaling describe-scaling-policies \ --service-namespace ecs \ --resource-id service/my-cluster/my-express-service Review CloudWatch metrics to ensure scaling triggers are being met.
Verify that the service has permission to scale (check IAM roles).
Check for any scaling activities and their outcomes.
Monitoring and debugging tools
Using CloudWatch Container Insights
Enable Container Insights for comprehensive monitoring:
aws ecs put-account-setting --name containerInsights --value enabled
Container Insights provides:
CPU, memory, disk, and network metrics
Performance monitoring dashboards
Log correlation and analysis
Anomaly detection