Slurm metrics in AWS PCS
AWS PCS supports Slurm's metrics feature, which exposes real-time cluster data through HTTP
endpoints compatible with Prometheus and other monitoring systems. For details, including
performance impact and security considerations, see the
Metrics Guide
Prerequisites
Before enabling Slurm metrics, ensure you have:
-
Cluster version: Slurm version 25.11 or higher.
-
Security group: Rules allowing HTTP traffic on port 6817 from your desired sources.
Enable the metrics endpoint
Set the following cluster-level custom Slurm settings:
-
MetricsType– Must specify a supported metrics plugin, such asmetrics/openmetrics. -
CommunicationParameters– Must includeenable_http.Important
Enabling
enable_httpexposes an unauthenticated HTTP endpoint. Anyone with network access to port 6817 can read cluster, job, and node metrics. Use security group rules to restrict access to trusted sources only. -
PrivateData– Must not be set.
For additional information on custom Slurm settings, see Configuring custom Slurm settings in AWS PCS.
Use the metrics endpoint
Query the metrics endpoint from a host with network access to the controller:
curl http://controller-ip:6817/metrics
For additional information on available metrics and scraping configuration, see the
Metrics Guide