# Slurm versions in AWS PCS
<a name="slurm-versions"></a>

SchedMD continually enhances Slurm with new capabilities, optimizations, and security patches. SchedMD releases a new major version at [regular intervals](https://slurm.schedmd.com/upgrades.html#release_cycle) and plans to support up to 3 versions at any given time. AWS PCS is designed to automatically update the Slurm controller with patch versions. 

When SchedMD ends [support](https://slurm.schedmd.com/upgrades.html#compatibility_window) for a particular major version, AWS PCS designates that version as End of Life (EOL). After EOL, no new clusters can be created with that version, though existing clusters can continue running for up to 12 months without guaranteed support. AWS PCS sends advance notice if a Slurm major version is close to EOL, to help customers know when to upgrade their clusters to a newer supported version.

We recommend you use the latest supported Slurm version to deploy your cluster, to access the most recent advancements and improvements. 

## Supported Slurm versions in AWS PCS
<a name="slurm-versions_releases"></a>

The following table shows the supported Slurm versions and important dates and information for each version.


| Slurm version | SchedMD release date | AWS PCS release date | AWS PCS EOL date | Minimum compatible AWS PCS agent version | Supported AWS PCS sample AMIs | 
| --- | --- | --- | --- | --- | --- | 
| 25.05 | 5/29/2025 | 10/16/2025 | 11/30/2026 | 1.0.0-1 |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/pcs/latest/userguide/slurm-versions.html)  | 
| 24.11 | 11/29/2024 | 5/14/2025 | 5/31/2026 | 1.0.0-1 |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/pcs/latest/userguide/slurm-versions.html)  | 

## Unsupported Slurm versions in AWS PCS
<a name="slurm-versions_unsupported"></a>

The following table shows Slurm versions that aren't supported in AWS PCS.


| Slurm version | SchedMD release date | AWS PCS release date | AWS PCS EOL date | 
| --- | --- | --- | --- | 
| 24.05 | 5/30/2024 | 12/18/2024 | 11/30/2025 | 
| 23.11 | 11/21/2023 | 8/28/2024 | 5/31/2025 | 

# Release notes for Slurm versions in AWS PCS
<a name="slurm-versions_release-notes"></a>

This topic describes important changes for each Slurm version currently supported in AWS PCS. We recommend you review the changes between the old and new versions when you upgrade your cluster.

## Slurm 25.05
<a name="slurm-versions_release-notes_25.05"></a>

**Changes implemented in AWS PCS**
+ The Slurm requeue\$1on\$1resume\$1failure SchedulerParameter is now Enabled by default.
+ "stderr" was removed as an option for LogTimeFormat, as it was disabled in Slurm 25.05.
+ AWS PCS supports Multi-cluster sackd configuration: login node can access multiple clusters.

For more information about Slurm 25.05, see the following publications:
+ SchedMD release announcement: [https://www.schedmd.com/slurm-version-25-05-0-is-now-available/](https://www.schedmd.com/slurm-version-25-05-0-is-now-available/)
+ SchedMD release notes: [https://github.com/SchedMD/slurm/blob/slurm-25-05-0-1/RELEASE\$1NOTES.md](https://github.com/SchedMD/slurm/blob/slurm-25-05-0-1/RELEASE_NOTES.md)

## Slurm 24.11
<a name="slurm-versions_release-notes_24.11"></a>

**Changes implemented in AWS PCS**
+ AWS PCS supports Slurm accounting. For more information, see [Slurm accounting in AWS PCS](slurm-accounting.md).

For more information about Slurm 24.11, see the following publications:
+ [SchedMD release announcement](https://www.schedmd.com/slurm-version-24-11-0-is-now-available/)
+ [SchedMD release notes](https://github.com/SchedMD/slurm/blob/slurm-24-11-0-1/RELEASE_NOTES)

## Slurm 24.05
<a name="slurm-versions_release-notes_24.05"></a>

**Changes implemented in AWS PCS**
+ The new Slurm Step Manager module is now enabled by default in AWS PCS. This module provides significant benefits by offloading step management from the central controller to compute nodes, substantially improving system concurrency in environments with heavy step usage. To support this configuration and better isolate `Prolog` and `Epilog` process execution, new prolog flags (`Contain`, `Alloc`) are enabled. 
+ Hierarchical communication from controller to compute nodes is enabled to optimize Slurm intra-node communication, which improves scalability and performance. Additionally, the routing configuration now uses partition node lists for communications from the controller, instead of the plugin's default routing algorithm, enhancing system resiliency. 
+ A new hash plugin `HashPlugin=hash/sha3` replaces the previous `hash/k12 plugin`. This is now enabled by default in AWS PCS clusters. 
+ Slurm controller logs now include enhanced auditing capabilities for all inbound remote procedure calls (RPC) to `slurmctld`. The logs include the source address, authenticated user, and RPC type before connection processing. 

For more information about Slurm 24.05, see the following publications:
+ [SchedMD release announcement](https://www.schedmd.com/slurm-version-24-05-0-is-now-available/)
+ [SchedMD release notes](https://github.com/SchedMD/slurm/blob/slurm-24-05-0-1/RELEASE_NOTES)

## Slurm 23.11
<a name="slurm-versions_release-notes_23.11"></a>

**Slurm settings you can change in AWS PCS**
+  The `SuspendTime` defaults to `60`. Use the AWS PCS `scaleDownIdleTimeInSeconds` configuration parameter to set it. For more information, see the [https://docs.aws.amazon.com//pcs/latest/APIReference/API_ClusterSlurmConfiguration.html#PCS-Type-ClusterSlurmConfiguration-scaleDownIdleTimeInSeconds](https://docs.aws.amazon.com//pcs/latest/APIReference/API_ClusterSlurmConfiguration.html#PCS-Type-ClusterSlurmConfiguration-scaleDownIdleTimeInSeconds) parameter of the `ClusterSlurmConfiguration` data type in the *AWS PCS API Reference*. 
+  The `MaxJobCount` and `MaxArraySize` is based on the size you choose for the cluster. For more information, see the [https://docs.aws.amazon.com//pcs/latest/APIReference/API_CreateCluster.html#PCS-CreateCluster-request-size](https://docs.aws.amazon.com//pcs/latest/APIReference/API_CreateCluster.html#PCS-CreateCluster-request-size) parameter of the `CreateCluster` API action in the *AWS PCS API Reference*.
+  The `SelectTypeParameters` Slurm setting defaults to `CR_CPU`. You can provide it as a value for `slurmCustomSettings` to set it when you create a cluster. For more information, see the [https://docs.aws.amazon.com//pcs/latest/APIReference/API_ClusterSlurmConfigurationRequest.html#PCS-Type-ClusterSlurmConfigurationRequest-slurmCustomSettings](https://docs.aws.amazon.com//pcs/latest/APIReference/API_ClusterSlurmConfigurationRequest.html#PCS-Type-ClusterSlurmConfigurationRequest-slurmCustomSettings) parameter of the `CreateCluster` API action and [SlurmCustomSetting](https://docs.aws.amazon.com//pcs/latest/APIReference/API_SlurmCustomSetting.html) in the *AWS PCS API Reference*.
+ You can set `Prolog` and `Epilog` at the cluster level. You can provide it as a value for `slurmCustomSettings` to set it when you create a cluster. For more information, see [https://docs.aws.amazon.com//pcs/latest/APIReference/API_CreateCluster.html](https://docs.aws.amazon.com//pcs/latest/APIReference/API_CreateCluster.html) and [SlurmCustomSetting](https://docs.aws.amazon.com//pcs/latest/APIReference/API_SlurmCustomSetting.html) in the *AWS PCS API Reference*.
+ You can set `Weight` and `RealMemory` at the compute node group level. You can provide it as a value for `slurmCustomSettings` to set it when you create a compute node group. For more information, see [https://docs.aws.amazon.com//pcs/latest/APIReference/API_CreateComputeNodeGroup.html](https://docs.aws.amazon.com//pcs/latest/APIReference/API_CreateComputeNodeGroup.html) and [SlurmCustomSetting](https://docs.aws.amazon.com//pcs/latest/APIReference/API_SlurmCustomSetting.html) in the *AWS PCS API Reference*.

# Frequently asked questions about Slurm versions in AWS PCS
<a name="slurm-versions_faq"></a>

AWS PCS maintains support for multiple Slurm versions. When a new Slurm version is introduced, AWS PCS provides technical support and security patches until that version reaches its end of support (EOS) from SchedMD. AWS PCS refers to the EOS date for a Slurm version as end of life (EOL) to be consistent with AWS terminology.

**How long does AWS PCS support a Slurm version?**  
AWS PCS support for Slurm versions aligns with SchedMD’s support cycles for major versions. AWS PCS supports the current version and the 2 most recent previous major versions. When SchedMD releases a new major version, AWS PCS ends support for the oldest supported version. AWS PCS releases new major versions of Slurm as soon as possible but there might be a delay between SchedMD's release and its availability in AWS PCS.

**How do my clusters get new Slurm patch version releases?**  
 To address bugs and security fixes, AWS PCS is designed to automatically apply patches to cluster controllers that run in internal service-owned accounts. To install patches on EC2 instances in your AWS account, update the Amazon Machine Image (AMI) for your compute node groups and update the compute node groups to use the updated AMI. For more information, see [Custom Amazon Machine Images (AMIs) for AWS PCS](working-with_ami_custom.md).

**Note**  
 Slurm controllers are unavailable while we update them. Running jobs aren't affected. Jobs submitted before the cluster's controller became unavailable are held until the controller is available. 

**How am I informed about an upcoming Slurm version EOL event?**  
 We send you an email message 6 months before the EOL date. We send you an email message each month before the EOL, with a final email message 1 week before the EOL date. After the EOL date, we send monthly email messages for 12 months to customers running AWS PCS clusters with EOL Slurm versions. We might suspend a cluster with an EOL Slurm version if security vulnerabilities are identified for that version.

**How can I determine if the Slurm version used by my cluster is running an EOL Slurm version?**  
We send you an email message to notify you that you have a running cluster with an EOL Slurm version. We post an alert to the AWS Health Dashboard alerts that contains the details of your clusters with EOL Slurm versions. You can also use the AWS PCS console to identify the clusters with EOL Slurm versions. 

**What do I have to do if my Slurm version is near or beyond EOL?**  
Create a new cluster with a newer supported version of Slurm and update the Slurm version in your compute node group AMIs. The Slurm version in your AMIs and running EC2 instances can’t be more than 2 versions behind the cluster’s Slurm version. For more information, see [Custom Amazon Machine Images (AMIs) for AWS PCS](working-with_ami_custom.md). 

**What will happen if I don’t switch to a newer version of Slurm by the EOL date?**  
You can’t create new clusters with an EOL Slurm version. Existing clusters can operate for up to 12 months without AWS support, and no immediate action is required to maintain their operation. After the EOL date, support, security updates, and availability are not guaranteed. We might suspend a cluster for security reasons. We strongly recommend you use a supported Slurm version to maintain security and support for your AWS PCS clusters. 

**What are the risks of operating a cluster with EOL Slurm versions?**  
Clusters with EOL Slurm versions present significant security and operational risks. Without SchedMD's active monitoring, security vulnerabilities might remain undetected or unaddressed. If critical vulnerabilities are discovered, we might suspend your clusters immediately.

**What happens to my jobs, cluster compute, storage and networking resources when my cluster is suspended?**  
 All resources managed by AWS PCS are terminated. This includes the Slurm controller, compute node groups, and EC2 instances. Any jobs running on compute instances are immediately terminated, and the cluster enters a suspended state. Customer-managed resources, such as external file systems, remain intact. You can use the AWS PCS console and API actions to access the cluster's configuration.

**Can I restart a suspended cluster to resume its remaining jobs?**  
No, you can’t restart a suspended cluster. You can use your suspended cluster’s configuration to create a new cluster with a supported Slurm version. You can run your remaining jobs if you saved them in an external file system.

**Can I request an extension beyond the 12-month grace period?**  
No, you can’t request an extension to run your cluster beyond the 12-month grace period. We provide the extended time to help you switch to a supported Slurm version. To avoid disruption to your cluster operations, we recommend you switch before your Slurm version reaches EOL.