Troubleshoot unhealthy instances in Amazon EC2 Auto Scaling
The following are error messages returned by Amazon EC2 Auto Scaling, the potential causes, and the steps you can take to resolve the issues.
To retrieve an error message, see View the reason for health check failures.
Error messages
An instance was taken out of service in response to an EC2 instance status check failure
Problem: Amazon EC2 Auto Scaling instances fail the Amazon EC2 status checks.
Cause 1: If there are issues that cause Amazon EC2 to consider the instances in your Amazon EC2 Auto Scaling group impaired, Amazon EC2 Auto Scaling automatically replaces the instances as part of its health checks.
Solution 1: When an instance status check fails, you typically must address the problem yourself by making instance configuration changes until your application is no longer exhibiting any problems. To resolve this issue, follow these steps:
-
Manually create an Amazon EC2 instance that is not part of the Amazon EC2 Auto Scaling group and investigate the problem. For general help with investigating impaired instances, see Troubleshoot instances with failed status checks in the Amazon EC2 User Guide.
-
After you confirm that your instance launched successfully and is healthy, deploy a new, error-free instance configuration to the Amazon EC2 Auto Scaling group.
-
Delete the instance that you created to avoid ongoing charges to your AWS account.
An instance was taken out of service in response to an EC2 health check that indicated it had been terminated or stopped
Problem: Amazon EC2 Auto Scaling instances that have been stopped, rebooted, or terminated are replaced.
Cause 1: A user manually stopped, rebooted, or terminated the instance.
Solution 1: If you need to stop or reboot the instances in your Amazon EC2 Auto Scaling group, we recommend that you put the instances on standby first. For more information, see Temporarily remove instances from your Amazon EC2 Auto Scaling group.
Cause 2: Amazon EC2 Auto Scaling attempts to replace Spot Instances after the Amazon EC2 Spot service interrupts the instances, because the Spot price increases above your maximum price or capacity is no longer available.
Solution 2: There is no guarantee that a Spot Instance exists to fulfill the request at any given point in time. However, you can try the following:
-
Use a higher Spot maximum price (possibly the On-Demand price). By setting your maximum price higher, it gives the Amazon EC2 Spot service a better chance of launching and maintaining your required amount of capacity.
-
Increase the number of different capacity pools that you can launch instances from by running multiple instance types in multiple Availability Zones. For more information, see Amazon EC2 Auto Scaling groups with multiple instance types and purchase options.
-
If you use multiple instance types, consider enabling the Capacity Rebalancing feature. This is useful if you want the Amazon EC2 Spot service to attempt to launch a new Spot Instance before a running instance is terminated. For more information, see Capacity Rebalancing in Amazon EC2 Auto Scaling to replace at-risk Spot Instances.
Cause 3: With Capacity Blocks, Amazon EC2 terminates any instances that are still running 30 minutes before the end time of the Capacity Block. This abrupt termination causes your Amazon EC2 Auto Scaling group to try to launch new instances to maintain its desired capacity, even as the Capacity Block is ending.
Solution 3: To resolve this issue, try the following:
-
Decrease the desired capacity of the Amazon EC2 Auto Scaling group to prevent it from trying to launch new instances. For more information, see Manual scaling for Amazon EC2 Auto Scaling.
-
Make sure you scale in your Amazon EC2 Auto Scaling group 30 minutes before the Capacity Block end time so that you do not encounter this error frequently. Make sure any lifecycle hooks have completed 30 minutes before the Capacity Block end time. For more information, see Use Capacity Blocks for machine learning workloads.
An instance was taken out of service in response to an ELB system health check failure
Problem: Amazon EC2 Auto Scaling instances might pass the EC2 status checks. But they might fail the ELB health checks for the target groups or Classic Load Balancers with which the Amazon EC2 Auto Scaling group is registered.
Cause 1: If your Amazon EC2 Auto Scaling group relies on health checks provided by ELB, Amazon EC2 Auto Scaling determines the health status of your instances by checking the results of both the EC2 status checks and the ELB health checks. The load balancer performs health checks by sending a request to each instance and waiting for the correct response, or by establishing a connection with the instance. An instance might fail the ELB health check because an application running on the instance has issues that cause the load balancer to consider the instance out of service.
Solution 1: To pass the ELB health checks:
-
Verify that the health check settings of your target groups are correctly configured. You define health check settings for your load balancer per target group. For more information, see Configure health checks for targets.
-
Make note of the success codes that the load balancer is expecting, and verify that your application is configured correctly to return these codes on success.
-
Verify that the security groups for your load balancer and Amazon EC2 Auto Scaling group are correctly configured.
-
Verify that the load balancer is configured in the same Availability Zones as your Amazon EC2 Auto Scaling group.
Solution 2: Update the Amazon EC2 Auto Scaling group to turn off ELB health checks. For instructions for how to turn off these health checks, see Detach a target group or Classic Load Balancer.
Cause 2: There is a mismatch between the health check grace period and the instance startup time.
Solution 3: Edit the health check grace period for your Amazon EC2 Auto Scaling group. Set the grace period to a long enough time period to support the number of consecutive successful health checks required before ELB considers a newly launched instance healthy. For more information, see Set the health check grace period for an Amazon EC2 Auto Scaling group.
Additional resources
If you have a different issue, see the following AWS re:Post articles for additional troubleshooting help: