Rotate a cluster secret in AWS PCS
Rotate your cluster secret to comply with security requirements and address potential compromises. This process requires putting your cluster into maintenance mode.
Prerequisites
-
IAM role with
secretsmanager:RotateSecretpermission -
Cluster in
ACTIVEorUPDATE_FAILEDstate
Procedure
-
Notify cluster users of the upcoming maintenance window.
-
Put the cluster into maintenance mode by scaling all compute node groups to 0 capacity.
-
Use the UpdateComputeNodeGroup API to set both minInstanceCount and maxInstanceCount to 0 for all compute node groups.
-
Wait until all nodes stop.
-
Optional: Drain scheduler queues with Slurm commands before you terminate capacity for graceful job handling.
-
-
Initiate rotation through Secrets Manager.
-
Console method:
-
Navigate to Secrets Manager, select your cluster secret, and choose Rotate secret.
-
-
API method:
-
Use Secrets Manager
rotate-secretAPI.
-
-
-
Monitor rotation progress.
-
Track progress through CloudTrail events.
-
Check
lastRotatedDatethrough either the Secrets Manager console or thesecretsmanager:describeSecretAPI. -
Wait for
RotationSucceededorRotationFailedCloudTrail event.
-
-
After successful rotation, restore cluster capacity.
-
Use the UpdateComputeNodeGroup API to reset node groups to desired min/max capacity.
-
For AWS PCS-managed login nodes: No additional action required.
-
For BYO login nodes:
-
Connect to login nodes.
-
Update
/etc/slurm/slurm.keywith the new secret from Secrets Manager. -
Restart the Slurm Auth and Cred Kiosk Daemon (sackd).
-
-