Troubleshooting a cluster update timeout when cfn-hup isn't
running
The cfn-hup helper is a daemon that detects changes in resource metadata and runs user-specified actions when a change is detected. This is how you make configuration updates
on your running Amazon EC2 instances through the UpdateStack API action.
Currently the cfn-hup daemon is launched by the supervisord. But after launch,
the cfn-hup process is detached from supervisord control. If the cfn-hup demon is killed by an external actor, it's not restarted automatically.
If cfn-hup isn't running, during a cluster update, the CloudFormation stack starts the update process as expected but the update procedure isn't activated on the head node and the stack
eventually goes into timeout. From the cluster logs /var/log/chef-client, you can see that the update recipe is never invoked.
Check and restart cfn-hup in case of failures
-
On the head node, check if
cfn-hupis running:$ps aux | grep cfn-hup -
Check
cfn-huplog/var/log/cfn-hup.logand/var/log/supervisord.logon the head node. -
If
cfn-hupisn't running, try restarting it by running:$sudo /opt/parallelcluster/pyenv/versions/cookbook_virtualenv/bin/supervisorctl start cfn-hup