

# Schedulers supported by AWS ParallelCluster


 AWS ParallelCluster supports Slurm and AWS Batch schedulers, which are set using the [`Scheduler`](Scheduling-v3.md#yaml-Scheduling-Scheduler) setting. The following topics will describe each scheduler and how to use them.

**Topics**
+ [

# Slurm Workload Manager (`slurm`)
](slurm-workload-manager-v3.md)
+ [

# Using AWS Batch (`awsbatch`) scheduler with AWS ParallelCluster
](awsbatchcli-v3.md)

# Slurm Workload Manager (`slurm`)
Slurm Workload Manager

## Cluster capacity size and update


The capacity of the cluster is defined by the number of compute nodes the cluster can scale. Compute nodes are backed by Amazon EC2 instances defined within compute resources in the AWS ParallelCluster configuration `(Scheduling/SlurmQueues/`ComputeResources`)`, and are organized into queues `(Scheduling/SlurmQueues)` that map 1:1 to Slurm partitions. 

Within a compute resource it’s possible to configure the minimum number of compute nodes (instances) that must always be kept running in the cluster (`MinCount`), and the maximum number of instances the compute resource can scale to ([`MaxCount`3](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount)).

At cluster creation time, or upon a cluster update, AWS ParallelCluster launches as many Amazon EC2 instances as configured in `MinCount` for each compute resource (`Scheduling/SlurmQueues/ ComputeResources`) defined in the cluster. The instances launched to cover the minimal amount of nodes for a compute resources in the cluster are called ***static nodes**.* Once started, static nodes are meant to be persistent in the cluster and they are not terminated by the system, unless a particular event or condition occurs. Such events include, for example, the failure of Slurm or Amazon EC2 health checks and the change of the Slurm node status to DRAIN or DOWN. 

The Amazon EC2 instances, in the range of `1` to `‘MaxCount - MinCount’` (`MaxCount ` *minus*` MinCount)`, launched on-demand to deal with the increased load of the cluster, are referred to as ***dynamic nodes***. Their nature is ephemeral, they are launched to serve pending jobs and are terminated once they stay idle for a period of time defined by `Scheduling/SlurmSettings/ScaledownIdletime` in the cluster configuration (default: 10 minutes).

Static nodes and dynamic node comply to the following naming schema:
+ Static nodes `<Queue/Name>-st-<ComputeResource/Name>-<num>` where `<num> = 1..ComputeResource/MinCount`
+ Dynamic nodes `<Queue/Name>-dy-<ComputeResource/Name>-<num>` where `<num> = 1..(ComputeResource/MaxCount - ComputeResource/MinCount)`

For example given the following AWS ParallelCluster configuration: 

```
Scheduling:  
    Scheduler: Slurm  
    SlurmQueues:    
        - Name: queue1      
            ComputeResources:        
                - Name: c5xlarge          
                    Instances:            
                        - InstanceType: c5.xlarge          
                        MinCount: 100          
                        MaxCount: 150
```

The following nodes will be defined in Slurm

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
```

When a compute resource has `MinCount == MaxCount`, all the corresponding compute nodes will be static and all the instances will be launched at cluster creation/update time and kept up and running. For example: 

```
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue1
      ComputeResources:
        - Name: c5xlarge
          Instances:
            - InstanceType: c5.xlarge
          MinCount: 100
          MaxCount: 100
```

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
```

## Cluster capacity update


The update of the cluster capacity includes adding or removing queues, compute resources or changing the `MinCount/MaxCount` of a compute resource. Starting from AWS ParallelCluster version 3.9.0, reducing the size of a queue requires the compute fleet to be stopped or [QueueUpdateStrategy](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-QueueUpdateStrategy) set to TERMINATE for before a cluster update to take place. It’s not required to stop the compute fleet or to set [QueueUpdateStrategy](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-QueueUpdateStrategy) to TERMINATE when: 
+ Adding new queues to Scheduling/[`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues)

   
+ Adding new compute resources `Scheduling/SlurmQueues/ComputeResources` to a queue
+ Increasing the `MaxCount` of a compute resource
+ Increasing MinCount of a compute resource and increasing MaxCount of the same compute resource of at least the same amount

## Considerations and limitations


This section is meant to outline any important factors, constraints, or limitations that should be taken into account when resizing the cluster capacity.
+ When removing a queue from `Scheduling/SlurmQueues` all the compute nodes with name `<Queue/Name>-*` , both static and dynamic, will be removed from the Slurm configuration and the corresponding Amazon EC2 instances will be terminated.
+ When removing a compute resource `Scheduling/SlurmQueues/ComputeResources` from a queue, all the compute nodes with name `<Queue/Name>-*-<ComputeResource/Name>-*` , both static and dynamic, will be removed from the Slurm configuration and the corresponding Amazon EC2 instances will be terminated.

When changing the `MinCount` parameter of a compute resource we can distinguish two different scenarios, if `MaxCount` is kept equal to `MinCount` (static capacity only), and if `MaxCount` is greater than `MinCount` (mixed static and dynamic capacity).

### Capacity changes with static nodes only

+ If `MinCount == MaxCount` , when increasing `MinCount` (and `MaxCount` ), the cluster will be configured by extending the number of static nodes to the new value of `MinCount` `<Queue/Name>-st-<ComputeResource/Name>-<new_MinCount>` and the system will keep trying to launch Amazon EC2 instances to fulfill the new required static capacity.
+ If `MinCount == MaxCount` , when decreasing `MinCount` (and `MaxCount` ) of the amount N, the cluster will be configured by removing the last N static nodes `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount - N>...<old_MinCount>]` and the system will terminate the corresponding Amazon EC2 instances.
  + Initial state `MinCount = MaxCount = 100`
  + 

    ```
    $ sinfo
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
    ```
  + Update `-30` on `MinCount` and `MaxCount: MinCount = MaxCount = 70`
  + 

    ```
    $ sinfo
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    queue1*      up   infinite     70   idle queue1-st-c5xlarge-[1-70]
    ```

### Capacity changes with mixed nodes


If `MinCount < MaxCount`, when increasing `MinCount` by an amount N (assuming `MaxCount` will be kept unchanged), the cluster will be configured by extending the number static nodes to the new value of `MinCount` ( `old_MinCount + N` ): `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount + N>` and the system will keep trying to launch Amazon EC2 instances to fulfill the new required static capacity. Moreover, to honor the `MaxCount` capacity of the compute resource, the cluster configuration is updated by *removing the last N dynamic nodes*: `<Queue/Name>-dy-<ComputeResource/Name>-[<MaxCount - old_MinCount - N>...<MaxCount - old_MinCount>]` and the system will terminate the corresponding Amazon EC2 instances.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update \$130 to `MinCount : MinCount = 130 (MaxCount = 150)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     20  idle~ queue1-dy-c5xlarge-[1-20]
  queue1*      up   infinite    130   idle queue1-st-c5xlarge-[1-130]
  ```

If `MinCount < MaxCount`, when increasing `MinCount` and `MaxCount` of the same amount N, the cluster will be configured by extending the number static nodes to the new value of `MinCount` ( `old_MinCount + N` ): `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount + N>` and the system will keep trying to launch Amazon EC2 instances to fulfill the new required static capacity. Moreover, no changes will be done on the number of dynamic nodes to honor the new

 `MaxCount` value.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update \$130 to `MinCount : MinCount = 130 (MaxCount = 180)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     20  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    130   idle queue1-st-c5xlarge-[1-130]
  ```

If `MinCount < MaxCount`, when decreasing `MinCount` of the amount N (assuming `MaxCount` will be kept unchanged), the cluster will be configured by removing the last N static nodes static nodes `<Queue/Name>-st-<ComputeResource/Name>-[<old_MinCount - N>...<old_MinCount>`and the system will terminate the corresponding Amazon EC2 instances. Moreover, to honor the `MaxCount` capacity of the compute resource, the cluster configuration is updated by extending the number of the dynamic nodes to fill the gap `MaxCount - new_MinCount: <Queue/Name>-dy-<ComputeResource/Name>-[1..<MazCount - new_MinCount>]` In this case, since those are dynamic nodes, no new Amazon EC2 instances will be launched unless the scheduler has jobs in pending on the new nodes.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update -30 on `MinCount : MinCount = 70 (MaxCount = 120)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     80  idle~ queue1-dy-c5xlarge-[1-80]
  queue1*      up   infinite     70   idle queue1-st-c5xlarge-[1-70]
  ```

If `MinCount < MaxCount`, when decreasing `MinCount` and `MaxCount` of the same amount N, the cluster will be configured by removing the last N static nodes `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount - N>...<oldMinCount>]` and the system will terminate the corresponding Amazon EC2 instances.

 Moreover, no changes will be done on the number of dynamic nodes to honor the new `MaxCount` value.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update -30 on `MinCount : MinCount = 70 (MaxCount = 120)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     80  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite     70   idle queue1-st-c5xlarge-[1-70]
  ```

If `MinCount < MaxCount`, when decreasing `MaxCount` of the amount N (assuming `MinCount` will be kept unchanged), the cluster will be configured by removing the last N dynamic nodes `<Queue/Name>-dy-<ComputeResource/Name>-<old_MaxCount - N...<oldMaxCount>]` and the system will terminate the corresponding Amazon EC2 instances in the case they were running.No impact is expected on the static nodes.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update -30 on `MaxCount : MinCount = 100 (MaxCount = 120)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     20  idle~ queue1-dy-c5xlarge-[1-20]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```

## Impacts on the Jobs


In all the cases where nodes are removed and Amazon EC2 instances terminated, a sbatch job running on the removed nodes will be re-queued, unless there are no other nodes satisfying the job requirements. In this last case, the job fails with status NODE\$1FAIL and disappears from the queue and it must be re-submitted manually.

If you are planning to perform a cluster resize update, you can prevent jobs to go running in the nodes that are going to be removed during the planned update. This is possible by setting the nodes to be removed in maintenance. Please be aware that setting a node in maintenance would not impact jobs that are eventually already running in the node.

Suppose that with the planned cluster resize update you are going to remove the node `qeueu-st-computeresource-[9-10`]. You can create a Slurm reservation with the following command

```
sudo -i scontrol create reservation ReservationName=maint_for_update user=root starttime=now duration=infinite flags=maint,ignore_jobs nodes=qeueu-st-computeresource-[9-10]
```

This will create a Slurm reservation named `maint_for_update` on the nodes `qeueu-st-computeresource-[9-10]`. From the time when the reservation is created, no more jobs can go running into the nodes `qeueu-st-computeresource-[9-10]`. Please be aware that the reservation will not prevent jobs to be eventually allocated on the nodes `qeueu-st-computeresource-[9-10]`.

After the cluster resize update, if the Slurm reservation was set only on nodes that were removed during the resize update, the maintenance reservation will be automatically deleted. If instead you had created a Slurm reservation on nodes that are still present after the cluster resize update, we may want to remove the maintenance reservation on the nodes after the resize update is performed, by using the following command 

```
sudo -i scontrol delete ReservationName=maint_for_update
```

For additional details on Slurm reservation, see the official SchedMD doc [here](https://slurm.schedmd.com/reservations.html).

## Cluster update process on capacity changes


Upon a scheduler configuration change, the following steps are executed during the cluster update process:
+ Stop AWS ParallelCluster `clustermgtd (supervisorctl stop clustermgtd)`
+ Generate updated Slurm partitions configuration from AWS ParallelCluster configuration
+ Restart `slurmctld` (done through Chef service recipe)
+ Check `slurmctld` status `(systemctl is-active --quiet slurmctld.service)`
+ Reload Slurm configuration `(scontrol reconfigure)`
+ Start `clustermgtd (supervisorctl start clustermgtd)`

For information about Slurm, see [https://slurm.schedmd.com](https://slurm.schedmd.com). For downloads, see [https://github.com/SchedMD/slurm/tags](https://github.com/SchedMD/slurm/tags). For the source code, see [https://github.com/SchedMD/slurm](https://github.com/SchedMD/slurm).

## Supported cluster and SLURM versions


The following table lists the AWS ParallelCluster and Slurm versions that AWS supports.


| AWS ParallelCluster version(s) | Supported Slurm version | 
| --- | --- | 
|  3.13.0  |  24.05.07  | 
|  3.12.0  |  23.11.10  | 
|  3.11.0  |  23.11.10  | 
|  3.9.2, 3.9.3, 3.10.0  |  23.11.7  | 
|  3.9.0, 3.9.1  |  23.11.4  | 
|  3.8.0  |  23.02.7  | 
|  3.7.2  |  23.02.6  | 
|  3.7.1  |  23.02.5  | 
|  3.7.0  |  23.02.4  | 
|  3.6.0, 3.6.1  |  23.02.2  | 
|  3.5.0, 3.5.1  |  22.05.8  | 
|  3.4.0, 3.4.1  |  22.05.7  | 
|  3.3.0, 3.3.1  |  22.05.5  | 
|  3.1.4, 3.1.5, 3.2.0, 3.2.1  |  21.08.8-2  | 
|  3.1.2, 3.1.3  |  21.08.6  | 
|  3.1.1  |  21.08.5  | 
|  3.0.0  |  20.11.8  | 

**Topics**
+ [

## Cluster capacity size and update
](#cluster-capacity-size-and-update)
+ [

## Cluster capacity update
](#cluster-capacity-update)
+ [

## Considerations and limitations
](#considerations-limitations)
+ [

## Impacts on the Jobs
](#impacts-on-jobs)
+ [

## Cluster update process on capacity changes
](#cluster-update-process)
+ [

## Supported cluster and SLURM versions
](#cluster-slurm-version-table)
+ [

# Configuration of multiple queues
](configuration-of-multiple-queues-v3.md)
+ [

# Slurm guide for multiple queue mode
](multiple-queue-mode-slurm-user-guide-v3.md)
+ [

# Slurm cluster protected mode
](slurm-protected-mode-v3.md)
+ [

# Slurm cluster fast insufficient capacity fail-over
](slurm-short-capacity-fail-mode-v3.md)
+ [

# Slurm memory-based scheduling
](slurm-mem-based-scheduling-v3.md)
+ [

# Multiple instance type allocation with Slurm
](slurm-multiple-instance-allocation-v3.md)
+ [

# Cluster scaling for dynamic nodes
](scheduler-node-allocation-v3.md)
+ [

# Slurm accounting with AWS ParallelCluster
](slurm-accounting-v3.md)
+ [

# Slurm configuration customization
](slurm-configuration-settings-v3.md)
+ [

# Slurm `prolog` and `epilog`
](slurm-prolog-epilog-v3.md)
+ [

# Cluster capacity size and update
](slurm-cluster-capacity-size-and-update.md)

# Configuration of multiple queues


With AWS ParallelCluster version 3, you can configure multiple queues by setting the [`Scheduler`](Scheduling-v3.md#yaml-Scheduling-Scheduler) to `slurm` and specifying more than one queue for [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) in the configuration file. In this mode, different instance types coexist in the compute nodes that are specified in the [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) section of the configuration file. [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) with different instance types are scaled up or down as needed for the [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues).

Multiple *queues* within a single cluster are generally preferred over multiple clusters when the workloads share the same underlying infrastructure and resources (like shared storage, networking, or login nodes). If workloads have similar compute, storage, and networking needs, using multiple queues within a single cluster is more efficient because it allows for resource sharing and avoids unnecessary duplication. This approach simplifies management and reduces overhead, while still allowing for efficient job scheduling and resource allocation. On the other hand, multiple *clusters* should be used when there are strong security, data, or operational isolation requirements between workloads. For example, if you need to manage and operate workloads independently, with different schedules, update cycles, or access policies, multiple clusters are more appropriate.


**Cluster queue and compute resource quotas**  

| Resource | Quota | 
| --- | --- | 
|  [`Slurm queues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues)  |  50 queues per cluster  | 
|  [`Compute resources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources)  |  50 compute resources per queue 50 compute resources per cluster  | 

**Node Counts**

Each compute resource in [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) for a queue must have a unique [`Name`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Name), [`InstanceType`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-InstanceType), [`MinCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MinCount), and [`MaxCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount). [`MinCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MinCount) and [`MaxCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount) have default values that define the range of instances for a compute resource in [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) for a queue. You can also specify your own values for [`MinCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MinCount) and [`MaxCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount). Each compute resource in [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) is composed of static nodes numbered from 1 to the value of [`MinCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MinCount) and dynamic nodes numbered from the value of [`MinCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MinCount) to the value of [`MaxCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount).

**Example Configuration**

The following is an example of a [Scheduling](Scheduling-v3.md) section for a cluster configuration file. In this configuration there are two queues named `queue1` and `queue2` and each of the queues has [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) with a specified [`MaxCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount).

```
Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: queue1
    ComputeResources:
    - InstanceType: c5.xlarge
      MaxCount: 5
      Name: c5xlarge
    - InstanceType: c4.xlarge
      MaxCount: 5
      Name: c4xlarge
  - Name: queue2
    ComputeResources:
    - InstanceType: c5.xlarge
      MaxCount: 5
      Name: c5xlarge
```

**Hostnames**

The instances that are launched into the compute fleet are dynamically assigned. Hostnames are generated for each node. By default AWS ParallelCluster will use the following format of the hostname :

 `$HOSTNAME=$QUEUE-$STATDYN-$COMPUTE_RESOURCE-$NODENUM` 
+ `$QUEUE` is the name of the queue. For example, if the [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) section has an entry with the [`Name`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-Name) set to “`queue-name`” then “`$QUEUE`” is “`queue-name`”.
+  `$STATDYN` is `st` for static nodes or `dy` for dynamic nodes. 
+  `$COMPUTE_RESOURCE` is the [`Name`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Name) of the [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) compute resource corresponding to this node.
+  `$NODENUM` is the number of the node. `$NODENUM` is between one (1) and the value of [`MinCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MinCount) for static nodes and between one (1) and [`MaxCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount)-[`MinCount`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MinCount) for dynamic nodes.

From the example configuration file above, a given node from `queue1` and compute resource `c5xlarge` has a hostname: `queue1-dy-c5xlarge-1`.

Both hostnames and fully-qualified domain names (FQDN) are created using Amazon Route 53 hosted zones. The FQDN is `$HOSTNAME.$CLUSTERNAME.pcluster`, where `$CLUSTERNAME` is the name of the cluster.

Note that the same format will be used for the Slurm node names as well.

 Users can choose to use the default Amazon EC2 hostname of the instance powering the compute node instead of the default host name format used by AWS ParallelCluster. This can be done by setting the [`UseEc2Hostnames`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Dns-UseEc2Hostnames) parameter to be true. However, Slurm node names will continue to use the default AWS ParallelCluster format.

# Slurm guide for multiple queue mode


Here you can learn how AWS ParallelCluster and Slurm manage queue (partition) nodes and how you can monitor the queue and node states.

## Overview


The scaling architecture is based on Slurm’s [ Cloud Scheduling Guide](https://slurm.schedmd.com/elastic_computing.html) and power saving plugin. For more information about the power saving plugin, see [Slurm Power Saving Guide](https://slurm.schedmd.com/power_save.html). In the architecture, resources that can potentially be made available for a cluster are typically predefined in the Slurm configuration as cloud nodes.

## Cloud node lifecycle


Throughout their lifecycle, cloud nodes enter several if not all of the following states: `POWER_SAVING`, `POWER_UP` (`pow_up`), `ALLOCATED` (`alloc`), and `POWER_DOWN` (`pow_dn`). In some cases, a cloud node might enter the `OFFLINE` state. The following list details several aspects of these states in the cloud node lifecycle.
+ **A node in a `POWER_SAVING` state** appears with a `~` suffix (for example `idle~`) in `sinfo`. In this state, no EC2 instances are backing the node. However, Slurm can still allocate jobs to the node.
+ **A node transitioning to a `POWER_UP` state** appears with a `#` suffix (for example `idle#`) in `sinfo`. A node automatically transitions to a `POWER_UP` state, when Slurm allocates a job to a node in a `POWER_SAVING` state.

  Alternatively, you can transition the nodes to the `POWER_UP` state manually as an `su` root user with the command:

  ```
  $ scontrol update nodename=nodename state=power_up
  ```

  In this stage, the `ResumeProgram` is invoked, EC2 instances are launched and configured, and the node transitions to the `POWER_UP` state.
+ **A node that is currently available for use** appears without a suffix (for example `idle`) in `sinfo`. After the node is set up and has joined the cluster, it becomes available to run jobs. In this stage, the node is properly configured and ready for use.

  As a general rule, we recommend that the number of Amazon EC2 instances be the same as the number of available nodes. In most cases, static nodes are available after the cluster is created.
+ **A node that is transitioning to a `POWER_DOWN` state** appears with a `%` suffix (for example `idle%`) in `sinfo`. Dynamic nodes automatically enter the `POWER_DOWN` state after [`ScaledownIdletime`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ScaledownIdletime). In contrast, static nodes in most cases aren't powered down. However, you can place the nodes in the `POWER_DOWN` state manually as an `su` root user with the command:

  ```
  $ scontrol update nodename=nodename state=down reason="manual draining"
  ```

  In this state, the instances associated with a node are terminated, and the node is set back to the `POWER_SAVING` state and available for use after [`ScaledownIdletime`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ScaledownIdletime).

  The [`ScaledownIdletime`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ScaledownIdletime) setting is saved to the Slurm configuration `SuspendTimeout` setting.
+ **A node that is offline** appears with a `*` suffix (for example `down*`) in `sinfo`. A node goes offline if the Slurm controller can't contact the node or if the static nodes are disabled and the backing instances are terminated.

Consider the node states shown in the following `sinfo` example.

```
$ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  efa          up   infinite      4  idle~ efa-dy-efacompute1-[1-4]
  efa          up   infinite      1   idle efa-st-efacompute1-1
  gpu          up   infinite      1  idle% gpu-dy-gpucompute1-1
  gpu          up   infinite      9  idle~ gpu-dy-gpucompute1-[2-10]
  ondemand     up   infinite      2   mix# ondemand-dy-ondemandcompute1-[1-2]
  ondemand     up   infinite     18  idle~ ondemand-dy-ondemandcompute1-[3-10],ondemand-dy-ondemandcompute2-[1-10]
  spot*        up   infinite     13  idle~ spot-dy-spotcompute1-[1-10],spot-dy-spotcompute2-[1-3]
  spot*        up   infinite      2   idle spot-st-spotcompute2-[1-2]
```

The `spot-st-spotcompute2-[1-2]` and `efa-st-efacompute1-1` nodes already have backing instances set up and are available for use. The `ondemand-dy-ondemandcompute1-[1-2]` nodes are in the `POWER_UP` state and should be available within a few minutes. The `gpu-dy-gpucompute1-1` node is in the `POWER_DOWN` state, and it transitions into `POWER_SAVING` state after [`ScaledownIdletime`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ScaledownIdletime) (defaults to 10 minutes).

All of the other nodes are in `POWER_SAVING` state with no EC2 instances backing them.

## Working with an available node


An available node is backed by an Amazon EC2 instance. By default, the node name can be used to directly SSH into the instance (for example `ssh efa-st-efacompute1-1`). The private IP address of the instance can be retrieved using the command:

```
$ scontrol show nodes nodename
```

Check for IP address in the returned `NodeAddr` field.

For nodes that aren't available, the `NodeAddr` field shouldn't point to a running Amazon EC2 instance. Rather, it should be the same as the node name.

## Job states and submission


Jobs submitted in most cases are immediately allocated to nodes in the system, or placed in pending if all the nodes are allocated.

If nodes allocated for a job include any nodes in a `POWER_SAVING` state, the job starts out with a `CF`, or `CONFIGURING` state. At this time, the job waits for the nodes in the `POWER_SAVING` state to transition to the `POWER_UP` state and become available.

After all nodes allocated for a job are available, the job enters the `RUNNING` (`R`) state.

By default, all jobs are submitted to the default queue (known as a partition in Slurm). This is signified by a `*` suffix after the queue name. You can select a queue using the `-p` job submission option.

All nodes are configured with the following features, which can be used in job submission commands:
+ An instance type (for example `c5.xlarge`)
+ A node type (This is either `dynamic` or `static`.)

You can see the features for a particular node by using the command:

```
$ scontrol show nodes nodename
```

In the return, check the `AvailableFeatures` list.

Consider the initial state of the cluster, which you can view by running the `sinfo` command.

```
$ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  efa          up   infinite      4  idle~ efa-dy-efacompute1-[1-4]
  efa          up   infinite      1   idle efa-st-efacompute1-1
  gpu          up   infinite     10  idle~ gpu-dy-gpucompute1-[1-10]
  ondemand     up   infinite     20  idle~ ondemand-dy-ondemandcompute1-[1-10],ondemand-dy-ondemandcompute2-[1-10]
  spot*        up   infinite     13  idle~ spot-dy-spotcompute1-[1-10],spot-dy-spotcompute2-[1-3]
  spot*        up   infinite      2   idle spot-st-spotcompute2-[1-2]
```

Note that `spot` is the default queue. It is indicated by the `*` suffix.

Submit a job to one static node in the default queue (`spot`).

```
$ sbatch --wrap "sleep 300" -N 1 -C static
```

Submit a job to one dynamic node in the `EFA` queue.

```
$ sbatch --wrap "sleep 300" -p efa -C dynamic
```

Submit a job to eight (8) `c5.2xlarge` nodes and two (2) `t2.xlarge` nodes in the `ondemand` queue.

```
$ sbatch --wrap "sleep 300" -p ondemand -N 10 -C "[c5.2xlarge*8&t2.xlarge*2]"
```

Submit a job to one GPU node in the `gpu` queue.

```
$ sbatch --wrap "sleep 300" -p gpu -G 1
```

Consider the state of the jobs using the `squeue` command.

```
$ squeue
 JOBID PARTITION    NAME   USER   ST       TIME  NODES NODELIST(REASON)
  12   ondemand     wrap   ubuntu CF       0:36     10 ondemand-dy-ondemandcompute1-[1-8],ondemand-dy-ondemandcompute2-[1-2]
  13        gpu     wrap   ubuntu CF       0:05      1 gpu-dy-gpucompute1-1
   7       spot     wrap   ubuntu  R       2:48      1 spot-st-spotcompute2-1
   8        efa     wrap   ubuntu  R       0:39      1 efa-dy-efacompute1-1
```

Jobs 7 and 8 (in the `spot` and `efa` queues) are already running (`R`). Jobs 12 and 13 are still configuring (`CF`), probably waiting for instances to become available.

```
# Nodes states corresponds to state of running jobs
$ sinfo
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 efa          up   infinite      3  idle~ efa-dy-efacompute1-[2-4]
 efa          up   infinite      1    mix efa-dy-efacompute1-1
 efa          up   infinite      1   idle efa-st-efacompute1-1
 gpu          up   infinite      1   mix~ gpu-dy-gpucompute1-1
 gpu          up   infinite      9  idle~ gpu-dy-gpucompute1-[2-10]
 ondemand     up   infinite     10   mix# ondemand-dy-ondemandcompute1-[1-8],ondemand-dy-ondemandcompute2-[1-2]
 ondemand     up   infinite     10  idle~ ondemand-dy-ondemandcompute1-[9-10],ondemand-dy-ondemandcompute2-[3-10]
 spot*        up   infinite     13  idle~ spot-dy-spotcompute1-[1-10],spot-dy-spotcompute2-[1-3]
 spot*        up   infinite      1    mix spot-st-spotcompute2-1
 spot*        up   infinite      1   idle spot-st-spotcompute2-2
```

## Node state and features


In most cases, node states are fully managed by AWS ParallelCluster according to the specific processes in the cloud node lifecycle described earlier in this topic.

However, AWS ParallelCluster also replaces or terminates unhealthy nodes in `DOWN` and `DRAINED` states and nodes that have unhealthy backing instances. For more information, see [`clustermgtd`](processes-v3.md#clustermgtd-v3).

## Partition states


AWS ParallelCluster supports the following partition states. A Slurm partition is a queue in AWS ParallelCluster.
+ `UP`: Indicates that the partition is in an active state. This is the default state of a partition. In this state, all nodes in the partition are active and available for use.
+ `INACTIVE`: Indicates that the partition is in the inactive state. In this state, all instances backing nodes of an inactive partition are terminated. New instances aren't launched for nodes in an inactive partition.

## pcluster update-compute-fleet

+ **Stopping the compute fleet** - When the following command is executed, all partitions transition to the `INACTIVE` state, and AWS ParallelCluster processes keep the partitions in the `INACTIVE` state.

  ```
  $ pcluster update-compute-fleet --cluster-name testSlurm \
     --region eu-west-1 --status STOP_REQUESTED
  ```
+ **Starting the compute fleet** - When the following command is executed, all partitions initially transition to the `UP` state. However, AWS ParallelCluster processes don't keep the partition in an `UP` state. You need to change partition states manually. All static nodes become available after a few minutes. Note that setting a partition to `UP` doesn't power up any dynamic capacity.

  ```
  $ pcluster update-compute-fleet --cluster-name testSlurm \
     --region eu-west-1 --status START_REQUESTED
  ```

When `update-compute-fleet` is run, you can check the state of the cluster by running the `pcluster describe-compute-fleet` command and checking the `Status`. The following lists possible states:
+ `STOP_REQUESTED`: The stop compute fleet request is sent to the cluster.
+ `STOPPING`: The `pcluster` process is currently stopping the compute fleet.
+ `STOPPED`: The `pcluster` process finished the stopping process, all partitions are in `INACTIVE` state, and all compute instances are terminated.
+ `START_REQUESTED`: The start compute fleet request is sent to the cluster.
+ `STARTING`: The `pcluster` process is currently starting the cluster.
+ `RUNNING`: The `pcluster` process finished the starting process, all partitions are in the `UP` state, and static nodes are available after a few minutes.
+  `PROTECTED`: This status indicates that some partitions have consistent bootstrap failures. Affected partitions are inactive. Please investigate the issue and then run `update-compute-fleet` to re-enable the fleet.

## Manual control of queues


In some cases, you might want to have some manual control over the nodes or queue (known as a partition in Slurm) in a cluster. You can manage nodes in a cluster through the following common procedures using the `scontrol` command.
+ **Power up dynamic nodes in `POWER_SAVING` state**

  Run the command as an `su` root user:

  ```
  $ scontrol update nodename=nodename state=power_up
  ```

  You can also submit a placeholder `sleep 1` job requesting a certain number of nodes and then rely on Slurm to power up the required number of nodes.
+ **Power down dynamic nodes before [`ScaledownIdletime`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ScaledownIdletime)**

  We recommend that you set dynamic nodes to `DOWN` as an `su` root user with the command:

  ```
  $ scontrol update nodename=nodename state=down reason="manually draining"
  ```

  AWS ParallelCluster automatically terminates and resets the downed dynamic nodes.

  In general, we don't recommend that you set nodes to `POWER_DOWN` directly using the `scontrol update nodename=nodename state=power_down` command. This is because AWS ParallelCluster automatically handles the power down process.
+ **Disable a queue (partition) or stop all static nodes in specific partition**

  Set a specific queue to `INACTIVE` as an `su` root user with the command:

  ```
  $ scontrol update partition=queuename state=inactive
  ```

  Doing this terminates all instances backing nodes in the partition.
+ **Enable a queue (partition)**

  Set a specific queue to `UP` an `su` root user with the command:

  ```
  $ scontrol update partition=queuename state=up
  ```

## Scaling behavior and adjustments


**Here is an example of the normal scaling workflow:**
+ The scheduler receives a job that requires two nodes.
+ The scheduler transitions two nodes to a `POWER_UP` state, and calls `ResumeProgram` with the node names (for example `queue1-dy-spotcompute1-[1-2]`).
+ `ResumeProgram` launches two Amazon EC2 instances and assigns the private IP addresses and hostnames of `queue1-dy-spotcompute1-[1-2]`, waiting for `ResumeTimeout` (the default period is 30 minutes before resetting the nodes.
+ Instances are configured and join the cluster. A job starts running on instances.
+ The job completes and stops running.
+ After the configured `SuspendTime` has elapsed (which is set to [`ScaledownIdletime`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ScaledownIdletime)), the scheduler sets the instances to the `POWER_SAVING` state. The scheduler then sets `queue1-dy-spotcompute1-[1-2]` to the `POWER_DOWN` state and calls `SuspendProgram` with the node names.
+ `SuspendProgram` is called for two nodes. Nodes remain in the `POWER_DOWN` state, for example, by remaining `idle%` for a `SuspendTimeout` (the default period is 120 seconds (2 minutes)). After `clustermgtd` detects that nodes are powering down, it terminates the backing instances. Then, it transitions `queue1-dy-spotcompute1-[1-2]` to the idle state and resets the private IP address and hostname so it is ready to power up for future jobs.

**If things go wrong and an instance for a particular node can't be launched for some reason, then the following happens:**
+ The scheduler receives a job that requires two nodes.
+ The scheduler transitions two cloud bursting nodes to the `POWER_UP` state and calls `ResumeProgram` with the nodenames, (for example `queue1-dy-spotcompute1-[1-2]`).
+ `ResumeProgram` launches only one (1) Amazon EC2 instance and configures `queue1-dy-spotcompute1-1`, with one (1) instance, `queue1-dy-spotcompute1-2`, failing to launch.
+ `queue1-dy-spotcompute1-1` isn't impacted and comes online after reaching the `POWER_UP` state.
+ `queue1-dy-spotcompute1-2` transitions to the `POWER_DOWN` state, and the job is requeued automatically because Slurm detects a node failure.
+ `queue1-dy-spotcompute1-2` becomes available after `SuspendTimeout` (the default is 120 seconds (2 minutes)). In the meantime, the job is requeued and can start running on another node.
+ The above process repeats until the job can run on an available node without a failure occurring.

**There are two timing parameters that can be adjusted if needed:**
+ **`ResumeTimeout` (the default is 30 minutes)**: `ResumeTimeout` controls the time Slurm waits before transitioning the node to the down state.
  + It might be useful to extend `ResumeTimeout` if your pre/post installation process takes nearly that long.
  + `ResumeTimeout` is also the maximum time that AWS ParallelCluster waits before replacing or resetting a node if there is an issue. Compute nodes self-terminate if any error occurs during launch or setup. AWS ParallelCluster processes replace a node upon detection of a terminated instance.
+ **`SuspendTimeout` (the default is 120 seconds (2 minutes))**: `SuspendTimeout` controls how quickly nodes get placed back into the system and are ready for use again.
  + A shorter `SuspendTimeout` means that nodes are reset more quickly, and Slurm can try to launch instances more frequently.
  + A longer `SuspendTimeout` means that failed nodes are reset more slowly. In the meantime, Slurm tries to use other nodes. If `SuspendTimeout` is more than a few minutes, Slurm tries to cycle through all nodes in the system. A longer `SuspendTimeout` might be beneficial for large-scale systems (over1,000 nodes) to reduce stress on Slurm when it tries to frequently re-queue failing jobs.
  + Note that `SuspendTimeout` doesn't refer to the time AWS ParallelCluster waits to terminate a backing instance for a node. Backing instances for `POWER_DOWN` nodes are immediately terminated. The terminate process usually is finished in a few minutes. However, during this time, the node remains in the `POWER_DOWN` state and isn't available for the scheduler's use.

## Logs for the architecture


The following list contains the key logs. The log stream name used with Amazon CloudWatch Logs has the format `{hostname}.{instance_id}.{logIdentifier}`, where *logIdentifier* follows the log names. 
+ `ResumeProgram`: `/var/log/parallelcluster/slurm_resume.log` (`slurm_resume`)
+ `SuspendProgram`: `/var/log/parallelcluster/slurm_suspend.log` (`slurm_suspend`)
+ `clustermgtd`: `/var/log/parallelcluster/clustermgtd.log` (`clustermgtd`)
+ `computemgtd`: `/var/log/parallelcluster/computemgtd.log` (`computemgtd`)
+ `slurmctld`: `/var/log/slurmctld.log` (`slurmctld`)
+ `slurmd`: `/var/log/slurmd.log` (`slurmd`)

## Common issues and how to debug:


**Nodes that failed to launch, power up, or join the cluster**
+ Dynamic nodes:
  + Check the `ResumeProgram` log to see if `ResumeProgram` was called with the node. If not, check the `slurmctld` log to determine if Slurm tried to call `ResumeProgram` with the node. Note that incorrect permissions on `ResumeProgram` might cause it to fail silently.
  + If `ResumeProgram` is called, check to see if an instance was launched for the node. If the instance didn't launch, there should be clear error message as to why the instance failed to launch.
  + If an instance was launched, there may have been some problem during the bootstrap process. Find the corresponding private IP address and instance ID from the `ResumeProgram` log and look at corresponding bootstrap logs for the specific instance in CloudWatch Logs.
+ Static nodes:
  + Check the `clustermgtd` log to see if instances were launched for the node. If instances didn't launch, there should be clear errors on why the instances failed to launch.
  + If an instance was launched, there is some problem with the bootstrap process. Find the corresponding private IP and instance ID from the `clustermgtd` log and look at corresponding bootstrap logs for the specific instance in CloudWatch Logs.

**Nodes replaced or terminated unexpectedly, and node failures**
+ Nodes replaced/terminated unexpectedly:
  + In most cases, `clustermgtd` handles all node maintenance actions. To check if `clustermgtd` replaced or terminated a node, check the `clustermgtd` log.
  + If `clustermgtd` replaced or terminated the node, there should be a message indicating the reason for the action. If the reason is scheduler related (for example, the node was `DOWN`), check in the `slurmctld` log for more details. If the reason is Amazon EC2 related, use tools such as Amazon CloudWatch or the Amazon EC2 console, CLI, or SDKs, to check status or logs for that instance. For example, you can check if the instance had scheduled events or failed Amazon EC2 health status checks.
  + If `clustermgtd` didn't terminate the node, check if `computemgtd` terminated the node or if EC2 terminated the instance to reclaim a Spot Instance.
+ Node failures:
  + In most cases, jobs are automatically requeued if a node failed. Look in the `slurmctld` log to see why a job or a node failed and assess the situation from there.

**Failure when replacing or terminating instances, failure when powering down nodes**
+ In general, `clustermgtd` handles all expected instance termination actions. Look in the `clustermgtd` log to see why it failed to replace or terminate a node.
+ For dynamic nodes failing [`ScaledownIdletime`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ScaledownIdletime), look in the `SuspendProgram` log to see if `slurmctld` processes made calls with the specific node as argument. Note `SuspendProgram` doesn't actually perform any specific action. Rather, it only logs when it’s called. All instance termination and `NodeAddr` resets are completed by `clustermgtd`. Slurm transitions nodes to `IDLE` after `SuspendTimeout`.

**Other issues:**
+ AWS ParallelCluster doesn't make job allocation or scaling decisions. It only tries to launch, terminate, and maintain resources according to Slurm’s instructions.

  For issues regarding job allocations, node allocation and scaling decision, look at the `slurmctld` log for errors. 

# Slurm cluster protected mode


When a cluster runs with protected mode enabled, AWS ParallelCluster monitors and tracks compute node bootstrap failures as the compute nodes are being launched. It does this to detect whether these failures are occurring continuously.

If the following is detected in a queue (partition), the cluster enters protected status:

1. Consecutive compute node bootstrap failures occur continuously with no successful compute node launches.

1. The failure count reaches a predefined threshold.

After the cluster enters protected status, AWS ParallelCluster disables queues with failures at or above the predefined threshold.

Slurm cluster protected mode was added in AWS ParallelCluster version 3.0.0.

You can use protected mode to reduce the time and resources spent on compute node bootstrap failure cycling.

## Protected mode parameter


**`protected_failure_count`**

`protected_failure_count` specifies the number of consecutive failures in a queue (partition) that activate cluster protected status.

The default `protected_failure_count` is 10 and protected mode is enabled.

If `protected_failure_count` is greater than zero, protected mode is enabled.

If `protected_failure_count` is less than or equal to zero, protected mode is disabled.

You can change the `protected_failure_count` value by adding the parameter in the `clustermgtd` config file that's located at `/etc/parallelcluster/slurm_plugin/parallelcluster_clustermgtd.conf` in the `HeadNode`.

You can update this parameter anytime and you don't need to stop the compute fleet to do so. If a launch succeeds in a queue before the failure count reaches `protected_failure_count`, the failure count is reset to zero.

## Check cluster status in protected status


When a cluster is in protected status, you can check the compute fleet status and node states.

### Compute fleet status


The status of the compute fleet is `PROTECTED` in a cluster running in protected status.

```
$ pcluster describe-compute-fleet --cluster-name <cluster-name> --region <region-id>
{
   "status": "PROTECTED",
   "lastStatusUpdatedTime": "2022-04-22T00:31:24.000Z"
}
```

### Node status


To learn which queues (partitions) have bootstrap failures that have activated protected status, log in to the cluster and run the `sinfo` command. Partitions with bootstrap failures at or above `protected_failure_count` are in the `INACTIVE` state. Partitions without bootstrap failures at or above `protected_failure_count` are in the `UP` state and work as expected.

`PROTECTED` status doesn't impact running jobs. If jobs are running on a partition with bootstrap failures at or above `protected_failure_count`, the partition is set to `INACTIVE` after the running jobs complete.

Consider the node states shown in the following example.

```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
queue1* inact infinite 10 down% queue1-dy-c5xlarge-[1-10]
queue1* inact infinite 3490 idle~ queue1-dy-c5xlarge-[11-3500]
queue2 up infinite 10 idle~ queue2-dy-c5xlarge-[1-10]
```

Partition `queue1` is `INACTIVE` because 10 consecutive compute node bootstrap failures were detected.

Instances behind nodes `queue1-dy-c5xlarge-[1-10]` launched but failed to join the cluster because of an unhealthy status.

The cluster is in protected status.

Partition `queue2` isn't impacted by the bootstrap failures in `queue1`. It's in the `UP` state and can still run jobs.

## How to deactivate protected status


After the bootstrap error has been resolved, you can run the following command to take the cluster out of protected status.

```
$ pcluster update-compute-fleet --cluster-name <cluster-name> \
  --region <region-id> \
  --status START_REQUESTED
```

## Bootstrap failures that activate protected status


Bootstrap errors that activate protected status are subdivided into the following three types. To identify the type and issue, you can check if AWS ParallelCluster generated logs. If logs were generated, you can check them for error details. For more information, see [Retrieving and preserving logs](troubleshooting-v3-get-logs.md).

1. **Bootstrap error that causes an instance to self-terminate**.

   An instance fails early in the bootstrap process, such as an instance that self-terminates because of errors in the [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) \$1 [`CustomActions`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-CustomActions) \$1 [`OnNodeStart`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-CustomActions-OnNodeStart) \$1 [`OnNodeConfigured`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-CustomActions-OnNodeConfigured) script.

   For dynamic nodes, look for errors similar to the following:

   ```
   Node bootstrap error: Node ... is in power up state without valid backing instance
   ```

   For static nodes, look in the `clustermgtd` log (`/var/log/parallelcluster/clustermgtd`) for errors similar to the following:

   ```
   Node bootstrap error: Node ... is in power up state without valid backing instance
   ```

1. **Nodes `resume_timeout` or `node_replacement_timeout` expires**.

   An instance can't join the cluster within the `resume_timeout` (for dynamic nodes) or `node_replacement_timeout` (for static nodes). It doesn't self-terminate before the timeout. For example, networking isn't set up correctly for the cluster and the node is set to the `DOWN` state by Slurm after the timeout expires.

   For dynamic nodes, look for errors similar to the following:

   ```
   Node bootstrap error: Resume timeout expires for node
   ```

   For static nodes, look in the `clustermgtd` log (`/var/log/parallelcluster/clustermgtd`) for errors similar to the following:

   ```
   Node bootstrap error: Replacement timeout expires for node ... in replacement.
   ```

1. **Nodes fail health check**.

   An instance behind the node fails an Amazon EC2 health check or scheduled event health check, and the nodes are treated as bootstrap failure nodes. In this case, the instance terminates for a reason outside the control of AWS ParallelCluster.

   Look in the `clustermgtd` log (`/var/log/parallelcluster/clustermgtd`) for errors similar to the following:

   ```
   Node bootstrap error: Node %s failed during bootstrap when performing health check.
   ```

1. **Compute nodes fail Slurm registration**.

   The registration of the `slurmd` daemon with the Slurm control daemon (`slurmctld`) fails and causes the compute node state to change to the `INVALID_REG` state. Incorrectly configured Slurm compute nodes can cause this error, such as computed nodes configured with [`CustomSlurmSettings`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-CustomSlurmSettings) compute node specification errors.

   Look in the `slurmctld` log file (`/var/log/slurmctld.log`) on the head node, or look in the `slurmd` log file (`/var/log/slurmd.log`) of the failed compute node for errors similar to the following:

   ```
   Setting node %s to INVAL with reason: ...
   ```

## How to debug protected mode


If your cluster is in protected status, and if AWS ParallelCluster generated `clustermgtd` logs from the `HeadNode` and the `cloud-init-output` logs from problematic compute nodes, then you can check the logs for error details. For more information about how to retrieve logs, see [Retrieving and preserving logs](troubleshooting-v3-get-logs.md).

**`clustermgtd` log(`/var/log/parallelcluster/clustermgtd`) on the head node**

Log messages show which partitions have bootstrap failures and the corresponding bootstrap failure count.

```
[slurm_plugin.clustermgtd:_handle_protected_mode_process] - INFO - Partitions  
bootstrap failure count: {'queue1': 2}, cluster will be set into protected mode if protected failure count reach threshold.
```

In the `clustermgtd` log, search for `Found the following bootstrap failure nodes` to find which node failed to bootstrap.

```
[slurm_plugin.clustermgtd:_handle_protected_mode_process] - WARNING - 
Found the following bootstrap failure nodes: (x2)  ['queue1-st-c5large-1(192.168.110.155)',  'broken-st-c5large-2(192.168.65.215)']
```

In the `clustermgtd` log, search for `Node bootstrap error` to find the reason for the failure.

```
[slurm_plugin.clustermgtd:_is_node_bootstrap_failure] - WARNING - Node bootstrap error: 
Node broken-st-c5large-2(192.168.65.215) is currently in  replacement and no backing instance
```

**`cloud-init-output` log(`/var/log/cloud-init-output.log`) on the compute nodes**

After obtaining the bootstrap failure node private IP address in the `clustermgtd` log, you can find the corresponding compute node log by either logging into the compute node or by following the guidance in [Retrieving and preserving logs](troubleshooting-v3-get-logs.md) to retrieve logs. In most cases, the `/var/log/cloud-init-output` log from the problematic node shows the step that caused the compute node bootstrap failure.

# Slurm cluster fast insufficient capacity fail-over


Starting with AWS ParallelCluster version 3.2.0, clusters run with the fast insufficient capacity fail-over mode enabled by default. This minimizes the time spent retrying to queue a job when Amazon EC2 insufficient capacity errors are detected. This is particularly effective when you configure your queue with multiple compute resources that use different instance types.

**Amazon EC2 detected insufficient capacity failures:**
+ `InsufficientInstanceCapacity`
+ `InsufficientHostCapacity`
+ `InsufficientReservedInstanceCapacity`
+ `MaxSpotInstanceCountExceeded`
+ `SpotMaxPriceTooLow`: Activated if the Spot request price is lower than the minimum required Spot request fulfillment price.
+ `Unsupported`: Activated with the use of an instance type that isn't supported in a specific AWS Region.

In fast insufficient capacity failure-over mode, if an insufficient capacity error is detected when a job is assigned to a [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) / [`compute resource`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources), AWS ParallelCluster does the following:

1. It sets the compute resource to a disabled (`DOWN`) state for a predefined period of time.

1. It uses `POWER_DOWN_FORCE` to cancel the compute resource failing node jobs and to suspend the failing node. It sets the failing node to the `IDLE` and `POWER_DOWN (!)` state, and then to `POWERING_DOWN (%)`.

1. It requeues the job to another compute resource.

The static and powered up nodes of the disabled compute resource aren't impacted. Jobs can complete on these nodes.

This cycle repeats until the job is successfully assigned to a compute resource node or nodes. For information about node states, see the [Slurm guide for multiple queue mode](multiple-queue-mode-slurm-user-guide-v3.md).

If no compute resources are found to run the job, the job is set to the `PENDING` state until the predefined period of time elapses. In this case, you can modify the predefined period of time as described in the following section.

## Insufficient capacity timeout parameter


**`insufficient_capacity_timeout`**

`insufficient_capacity_timeout` specifies the period of time (in seconds) that the compute resource is kept in the disabled (`down`) state when an insufficient capacity error is detected.

By default, `insufficient_capacity_timeout` is enabled.

The default `insufficient_capacity_timeout` is 600 seconds (10 minutes).

If the `insufficient_capacity_timeout` value is less than or equal to zero, fast insufficient capacity failure-over mode is disabled.

You can change the `insufficient_capacity_timeout` value by adding the parameter in the `clustermgtd` config file located at `/etc/parallelcluster/slurm_plugin/parallelcluster_clustermgtd.conf` in the `HeadNode`.

The parameter can be updated at any time without stopping the compute fleet.

For example:
+ `insufficient_capacity_timeout=600`:

  If an insufficient capacity error is detected, the compute resource is set to a disabled (`DOWN`). After 10 minutes, its failed node is set to the `idle~` (`POWER_SAVING`) state.
+ `insufficient_capacity_timeout=60`:

  If an insufficient capacity error is detected, the compute resource is in a disabled (`DOWN`). After 1 minute, its failed node is set to the `idle~` state.
+ `insufficient_capacity_timeout=0`:

  Fast insufficient capacity failure-over mode is disabled. The compute resource isn't disabled.

**Note**  
There might be a delay of up to one minute between the time when nodes fail with insufficient capacity errors and the time when the cluster management daemon detects the node failures. This is because the cluster management daemon checks for node insufficient capacity failures and sets the compute resources to the `down` state at one minute intervals.

## Fast insufficient capacity fail-over mode status


When a cluster is in fast insufficient capacity fail-over mode, you can check its status and node states.

### Node states


When a job is submitted to a compute resource dynamic node and an insufficient capacity error is detected, the node is placed in the `down#` state with reason.

```
(Code:InsufficientInstanceCapacity)Failure when resuming nodes.
```

Then powered off nodes (nodes in `idle~` state) are set to `down~` with reason.

```
(Code:InsufficientInstanceCapacity)Temporarily disabling node due to insufficient capacity.
```

The job is requeued to other compute resources in the queue.

The compute resource static nodes and nodes that are `UP` aren't impacted by fast insufficient capacity fail-over mode.

Consider the node states shown in the following example.

```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
queue1*   up    infinite    30  idle~ queue1-dy-c-1-[1-15],queue1-dy-c-2-[1-15]
queue2    up    infinite    30  idle~ queue2-dy-c-1-[1-15],queue2-dy-c-2-[1-15]
```

We submit a job to queue1 that requires one node.

```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
queue1*   up   infinite  1   down# queue1-dy-c-1-1
queue1*   up   infinite  15  idle~ queue1-dy-c-2-[1-15]
queue1*   up   infinite  14  down~ queue1-dy-c-1-[2-15]
queue2    up   infinite  30  idle~ queue2-dy-c-1-[1-15],queue2-dy-c-2-[1-15]
```

Node `queue1-dy-c-1-1` is launched to run the job. However, the instance failed to launch due to an insufficient capacity error. Node `queue1-dy-c-1-1` is set to `down`. The powered off dynamic node within the compute resource (`queue2-dy-c-1`) is set to `down`.

You can check the node reason with `scontrol show nodes`.

```
$ scontrol show nodes queue1-dy-c-1-1
NodeName=broken-dy-c-2-1 Arch=x86_64 CoresPerSocket=1 
CPUAlloc=0 CPUTot=96 CPULoad=0.00
...
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=(Code:InsufficientInstanceCapacity)Failure when resuming nodes [root@2022-03-10T22:17:50]
   
$ scontrol show nodes queue1-dy-c-1-2
NodeName=broken-dy-c-2-1 Arch=x86_64 CoresPerSocket=1 
CPUAlloc=0 CPUTot=96 CPULoad=0.00
...
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=(Code:InsufficientInstanceCapacity)Temporarily disabling node due to insufficient capacity [root@2022-03-10T22:17:50]
```

The job is queued to another instance type within the queue compute resources.

After the `insufficient_capacity_timeout` elapses, nodes in the compute resource are reset to the `idle~` state.

```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
queue1*   up    infinite    30  idle~ queue1-dy-c-1-[1-15],queue1-dy-c-2-[1-15]
queue2    up    infinite    30  idle~ queue2-dy-c-1-[1-15],queue2-dy-c-2-[1-15]
```

After the `insufficient_capacity_timeout` elapses and nodes in the compute resource are reset to the `idle~` state, the Slurm scheduler gives the nodes lower priority. The scheduler keeps selecting nodes from other queue compute resources with higher weights unless one of the following occurs:
+ A job's submission requirements match the recovered compute resource.
+ No other compute resources are available because they are at capacity.
+ `slurmctld` is restarted.
+ The AWS ParallelCluster compute fleet is stopped and started to power down and power up all nodes.

### Related logs


Logs related to insufficient capacity errors and fast insufficient capacity fail-over mode can be found in Slurm's `resume` log and `clustermgtd` log in the head node.

**Slurm `resume` (`/var/log/parallelcluster/slurm_resume.log`)**  
Error messages when a node fails to launch because of insufficient capacity.  

```
[slurm_plugin.instance_manager:_launch_ec2_instances] - ERROR - Failed RunInstances request: dcd0c252-90d4-44a7-9c79-ef740f7ecd87
[slurm_plugin.instance_manager:add_instances_for_nodes] - ERROR - Encountered exception when launching instances for nodes (x1) ['queue1-dy-c-1-1']: An error occurred 
(InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 1): We currently do not have sufficient p4d.24xlarge capacity in the 
Availability Zone you requested (us-west-2b). Our system will be working on provisioning additional capacity. You can currently get p4d.24xlarge capacity by not 
specifying an Availability Zone in your request or choosing us-west-2a, us-west-2c.
```

**Slurm `clustermgtd` (`/var/log/parallelcluster/clustermgtd`)**  
Compute resource c-1 in queue1 is disabled because of insufficient capacity.  

```
[slurm_plugin.clustermgtd:_reset_timeout_expired_compute_resources] - INFO - The following compute resources are in down state 
due to insufficient capacity: {'queue1': {'c-1': ComputeResourceFailureEvent(timestamp=datetime.datetime(2022, 4, 14, 23, 0, 4, 769380, tzinfo=datetime.timezone.utc), 
error_code='InsufficientInstanceCapacity')}}, compute resources are reset after insufficient capacity timeout (600 seconds) expired
```
After the insufficient capacity timeout expires, the compute resource is reset, nodes within the compute resources are set to `idle~`.  

```
[root:_reset_insufficient_capacity_timeout_expired_nodes] - INFO - Reset the following compute resources because insufficient capacity 
timeout expired: {'queue1': ['c-1']}
```

# Slurm memory-based scheduling


Starting with version 3.2.0, AWS ParallelCluster supports Slurm memory-based scheduling with the [`SlurmSettings`](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [`EnableMemoryBasedScheduling`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-EnableMemoryBasedScheduling) cluster configuration parameter.

**Note**  
Starting with AWS ParallelCluster version 3.7.0, `EnableMemoryBasedScheduling` can be enabled if you configure multiple instance types in [Instances](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Instances).  
For AWS ParallelCluster versions 3.2.0 to 3.6.*x*, `EnableMemoryBasedScheduling` can't be enabled if you configure multiple instance types in [Instances](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Instances).

**Warning**  
When you specify multiple instances types in a Slurm queue compute resource with `EnableMemoryBasedScheduling` enabled, the `RealMemory` value is the minimum amount of memory made available to all instance types. This might lead to significant amounts of unused memory if you specify instance types with very different memory capacities.

With `EnableMemoryBasedScheduling: true`, the Slurm scheduler tracks the amount of memory that each job requires on each node. Then, the Slurm scheduler uses this information to schedule multiple jobs on the same compute node. The total amount of memory that jobs require on a node can't be larger than the available node memory. The scheduler prevents a job from using more memory than what was requested when the job was submitted.

With `EnableMemoryBasedScheduling: false`, jobs might compete for memory on a shared node and cause job failures and `out-of-memory` events.

**Warning**  
Slurm uses a power of 2 notation for its labels, such as MB or GB. Read these labels as MiB and GiB, respectively.

## Slurm configuration and memory-based scheduling


With `EnableMemoryBasedScheduling: true`, Slurm sets the following Slurm configuration parameters:
+ [https://slurm.schedmd.com/slurm.conf.html#OPT_CR_CPU_Memory](https://slurm.schedmd.com/slurm.conf.html#OPT_CR_CPU_Memory) in the `slurm.conf`. This option configures node memory to be a consumable resource in Slurm.
+ [https://slurm.schedmd.com/cgroup.conf.html#OPT_ConstrainRAMSpace](https://slurm.schedmd.com/cgroup.conf.html#OPT_ConstrainRAMSpace) in the Slurm `cgroup.conf`. With this option, a job's access to memory is limited to the amount of memory that the job requested when submitted.

**Note**  
Several other Slurm configuration parameters can impact the behavior of the Slurm scheduler and resource manager when these two options are set. For more information, see the [Slurm Documentation](https://slurm.schedmd.com/documentation.html).

## Slurm scheduler and memory-based scheduling


**`EnableMemoryBasedScheduling: false` (default)**

By default, `EnableMemoryBasedScheduling` is set to false. When false, Slurm doesn't include memory as a resource in its scheduling algorithm and doesn't track the memory that jobs use. Users can specify the `--mem MEM_PER_NODE` option to set the minimum amount of memory per node that a job requires. This forces the scheduler to choose nodes with a `RealMemory` value of at least `MEM_PER_NODE` when scheduling the job.

For example, suppose that a user submits two jobs with `--mem=5GB`. If requested resources such as CPUs or GPUs are available, the jobs can run at the same time on a node with 8 GiB of memory. The two jobs aren't scheduled on compute nodes with less than 5 GiB of `RealMemory`.

**Warning**  
When memory-based scheduling is disabled, Slurm doesn't track the amount of memory that jobs use. Jobs that run on the same node might compete for memory resources and cause the other job to fail.  
When memory-based scheduling is disabled, we recommend that users don't specify the [https://slurm.schedmd.com/srun.html#OPT_mem-per-cpu](https://slurm.schedmd.com/srun.html#OPT_mem-per-cpu) or [https://slurm.schedmd.com/srun.html#OPT_mem-per-gpu](https://slurm.schedmd.com/srun.html#OPT_mem-per-gpu) options. These options might cause behavior that differs from what's described in the [Slurm Documentation](https://slurm.schedmd.com/documentation.html).

**`EnableMemoryBasedScheduling: true`**

When `EnableMemoryBasedScheduling` is set to true, Slurm tracks the memory usage of each job and prevents jobs from using more memory than requested with the `--mem` submission options.

Using the previous example, a user submits two jobs with `--mem=5GB`. The jobs can't run at the same time on a node with 8 GiB of memory. This is because the total amount of memory that's required is greater than the memory that's available on the node.

With memory-based scheduling enabled, `--mem-per-cpu` and `--mem-per-gpu` behave consistently with what's described in the Slurm documentation. For example, a job is submitted with `--ntasks-per-node=2 -c 1 --mem-per-cpu=2GB`. In this case, Slurm assigns the job a total of 4 GiB for each node.

**Warning**  
When memory-based scheduling is enabled, we recommend that users include a `--mem` specification when submitting a job. With the default Slurm configuration that's included with AWS ParallelCluster, if no memory option is included (`--mem`, `--mem-per-cpu`, or `--mem-per-gpu`), Slurm assigns entire memory of the allocated nodes to the job, even if it requests only a portion of the other resources, such as CPUs or GPUs. This effectively prevents node sharing until the job is finished because no memory is available to other jobs. This happens because Slurm sets the memory per node for the job to [https://slurm.schedmd.com/slurm.conf.html#OPT_DefMemPerNode](https://slurm.schedmd.com/slurm.conf.html#OPT_DefMemPerNode) when no memory specifications are provided at job submission time. The default value for this parameter is 0 and specifies unlimited access to a node’s memory.  
If multiple types of compute resources with different amounts of memory are available in the same queue, a job submitted without memory options might be assigned different amounts of memory on different nodes. This depends on which nodes the scheduler makes available to the job. Users can define a custom value for options, such as `DefMemPerNode` or [https://slurm.schedmd.com/slurm.conf.html#OPT_DefMemPerCPU](https://slurm.schedmd.com/slurm.conf.html#OPT_DefMemPerCPU), at the cluster or partition level in the Slurm configuration files to prevent this behavior.

## Slurm RealMemory and AWS ParallelCluster SchedulableMemory


With the Slurm configuration that's shipped with AWS ParallelCluster, Slurm interprets [RealMemory](https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory) to be the amount of memory per node that's available to jobs. Starting with version 3.2.0, by default, AWS ParallelCluster sets `RealMemory` to 95 percent of the memory listed in [Amazon EC2 Instance Types](https://aws.amazon.com/ec2/instance-types) and returned by the Amazon EC2 API [DescribeInstanceTypes](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypes.html).

When memory-based scheduling is disabled, the Slurm scheduler uses `RealMemory` to filter nodes when users submit a job with `--mem` specified.

When memory-based scheduling is enabled, the Slurm scheduler interprets `RealMemory` to be the maximum amount of memory that's available to jobs that are running on the compute node.

The default setting might not be optimal for all instance types:
+ This setting might be higher than the amount of memory that nodes can actually access. This can happen when compute nodes are small instance types.
+ This setting might be lower than the amount of memory that nodes can actually access. This can happen when compute nodes are large instance types and can lead to a significant amount of unused memory.

You can use [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) / [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) / [`SchedulableMemory`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-SchedulableMemory) to fine-tune the value of `RealMemory` configured by AWS ParallelCluster for compute nodes. To override the default, define a custom value for `SchedulableMemory` specifically for your cluster configuration.

To check a compute node's actual available memory, run the `/opt/slurm/sbin/slurmd -C` command on the node. This command returns the hardware configuration of the node, including the [https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory](https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory) value. For more information, see [https://slurm.schedmd.com/slurmd.html#OPT_-C](https://slurm.schedmd.com/slurmd.html#OPT_-C).

Make sure that the compute node's operating system processes have sufficient memory. To do this, limit the memory available to jobs by setting the `SchedulableMemory` value to lower than the `RealMemory` value that the `slurmd -C` command returned.

# Multiple instance type allocation with Slurm


Starting with AWS ParallelCluster version 3.3.0, you can configure your cluster to allocate from a compute resource's set of defined instance types. Allocation can be based on Amazon EC2 fleet low cost or optimal capacity strategies.

This set of defined instance types must either all have the same number of vCPUs or, if multithreading is disabled, the same number of cores. Moreover, this set of instance types must have the same number of accelerators of the same manufacturers. If [`Efa`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Efa) / [`Enabled`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Efa-Enabled) is set to `true`, the instances must have EFA supported. For more information and requirements, see [`Scheduling`](Scheduling-v3.md) / [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) / [`AllocationStrategy`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-AllocationStrategy) and [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) / [`Instances`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Instances).

You can set [`AllocationStrategy`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-AllocationStrategy) to `lowest-price` or `capacity-optimized` depending on your [CapacityType](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-CapacityType) configuration.

In [`Instances`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Instances), you can configure a set of instance types.

**Note**  
Starting with AWS ParallelCluster version 3.7.0, `EnableMemoryBasedScheduling` can be enabled if you configure multiple instance types in [Instances](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Instances).  
For AWS ParallelCluster versions 3.2.0 to 3.6.*x*, `EnableMemoryBasedScheduling` can't be enabled if you configure multiple instance types in [Instances](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Instances).

The following examples show how you can query instance types for vCPUs, EFA support, and architecture.

Query InstanceTypes with 96 vCPUs and x86\$164 architecture.

```
$ aws ec2 describe-instance-types --region region-id \
  --filters "Name=vcpu-info.default-vcpus,Values=96" "Name=processor-info.supported-architecture,Values=x86_64" \
  --query "sort_by(InstanceTypes[*].{InstanceType:InstanceType,MemoryMiB:MemoryInfo.SizeInMiB,CurrentGeneration:CurrentGeneration,VCpus:VCpuInfo.DefaultVCpus,Cores:VCpuInfo.DefaultCores,Architecture:ProcessorInfo.SupportedArchitectures[0],MaxNetworkCards:NetworkInfo.MaximumNetworkCards,EfaSupported:NetworkInfo.EfaSupported,GpuCount:GpuInfo.Gpus[0].Count,GpuManufacturer:GpuInfo.Gpus[0].Manufacturer}, &InstanceType)" \
  --output table
```

Query InstanceTypes with 64 cores, EFA support, and arm64 architecture.

```
$ aws ec2 describe-instance-types --region region-id \
  --filters "Name=vcpu-info.default-cores,Values=64" "Name=processor-info.supported-architecture,Values=arm64" "Name=network-info.efa-supported,Values=true" --query "sort_by(InstanceTypes[*].{InstanceType:InstanceType,MemoryMiB:MemoryInfo.SizeInMiB,CurrentGeneration:CurrentGeneration,VCpus:VCpuInfo.DefaultVCpus,Cores:VCpuInfo.DefaultCores,Architecture:ProcessorInfo.SupportedArchitectures[0],MaxNetworkCards:NetworkInfo.MaximumNetworkCards,EfaSupported:NetworkInfo.EfaSupported,GpuCount:GpuInfo.Gpus[0].Count,GpuManufacturer:GpuInfo.Gpus[0].Manufacturer}, &InstanceType)" \
  --output table
```

The next example cluster configuration snippet shows how you can use these InstanceType and AllocationStrategy properties.

```
...
 Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue-1
      CapacityType: ONDEMAND
      AllocationStrategy: lowest-price
      ...
      ComputeResources:
        - Name: computeresource1
          Instances:
            - InstanceType: r6g.2xlarge
            - InstanceType: m6g.2xlarge
            - InstanceType: c6g.2xlarge
          MinCount: 0
          MaxCount: 500
        - Name: computeresource2
          Instances:
            - InstanceType: m6g.12xlarge
            - InstanceType: x2gd.12xlarge
          MinCount: 0
          MaxCount: 500
...
```

# Cluster scaling for dynamic nodes


ParallelCluster supports Slurm's methods to dynamically scale clusters by using Slurm's power saver plugin. For more information, see the [Cloud Scheduling Guide](https://slurm.schedmd.com/elastic_computing.html) and the [Slurm Power Saving Guide](https://slurm.schedmd.com/power_save.html) in the Slurm documentation. The following topics describe the Slurm strategies for each version.

**Topics**
+ [

# Slurm dynamic node allocation strategies in version 3.8.0
](scheduler-node-allocation-v3-3.8.0.md)
+ [

# Slurm dynamic node allocation strategies in version 3.7.x
](scheduler-dynamic-node-allocation-v3-3.7.x.md)
+ [

# Slurm dynamic node allocation strategies in version 3.6.x and previous
](scheduler-dynamic-node-allocation-v3-3.6.x.md)

# Slurm dynamic node allocation strategies in version 3.8.0
Version 3.8.0

Starting with ParallelCluster version 3.8.0, ParallelCluster uses **Job-level resume** or **job-level scaling** as the default dynamic node allocation strategy to scale the cluster: ParallelCluster scales up the cluster based on the requirements of each job, the number of nodes allocated to the job, and which nodes need to be resumed. ParallelCluster gets this information from the SLURM\$1RESUME\$1FILE environment variable.

The scaling for dynamic nodes is a two steps process, which involves the launch of the EC2 instances and the assignment of the launched Amazon EC2 instances to the Slurm nodes. Each of these two steps can be done using an **all-or-nothing** or **best-effort** logic. 

For launch of the Amazon EC2 instances:
+ **all-or-nothing** calls the launch Amazon EC2 API with minimum target equals to the total target capacity
+ **best-effort **calls the launch Amazon EC2 API with minimum target equals to 1 and the total target capacity equals to the requested capacity

For assignment of the Amazon EC2 instances to Slurm nodes:
+ **all-or-nothing** assigns Amazon EC2 instances to Slurm nodes only if it's possible to assign an Amazon EC2 instance to every requested node
+ **best-effort **assigns Amazon EC2 instances to Slurm nodes even if all the requested nodes are not covered by Amazon EC2 instance capacity

  The possible combinations of the above strategies translates into the ParallelCluster launch strategies.

**Example**  [ ScalingStrategy](Scheduling-v3.md#yaml-Scheduling-ScalingStrategy)

**all-or-nothing** scaling:

This strategy involves AWS ParallelCluster initiating an Amazon EC2 launch instance API call for each job, that requires all instances necessary for the requested compute nodes to be successfully launched. This ensures that the cluster scales only when the required capacity per job is available, avoiding idle instances left at the end of the scaling process. 

The strategy uses an **all-or-nothing** logic for the launch of the Amazon EC2 instances for each job plus and **all-or-nothing** logic for the assignment of the Amazon EC2 instances to Slurm nodes.

The strategy groups launch requests into batches, one for each compute resource requested and up to 500 nodes each. For requests spanning multiple compute resources or exceeding 500 nodes, ParallelCluster sequentially processes multiple batches.

The failure of any single resource's batch results in the termination of all associated unused capacity, ensuring that no idle instances will be left at the end of the scaling process.

Limitations
+ The time taken for scaling is directly proportional to the number of jobs submitted per execution of the Slurm resume program.
+ The scaling operation is limited by the RunInstances resource account limit, set at 1000 instances by default. This limitation is in accordance with AWS's EC2 API throttling policies, for more details refer to [Amazon EC2 API throttling documentation](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/throttling.html) 
+ When you submit a job in a compute resource with a single instance type, in a queue that spans multiple Availability Zones, the **all-or-nothing** EC2 launch API call only succeeds if all of the capacity can be provided in a single Availability Zone.
+ When you submit a job in a compute resource with multiple instance types, in a queue with a single Availability Zone, the **all-or-nothing** Amazon EC2 launch API call only succeeds if all of the capacity can be provided by a single instance type.
+ When you submit a job in a compute resource with multiple instance types, in a queue spanning multiple Availability Zones, the **all-or-nothing** Amazon EC2 launch API call isn't supported and ParallelCluster performs **best-effort **scaling instead.

**greedy-all-or-nothing** scaling:

This variant of the all-or-nothing strategy still ensures that the cluster scales only when the required capacity per job is available, avoiding idle instances at the end of the scaling process, but it involves ParallelCluster initiating an Amazon EC2 launch instance API call that aims for a minimum target capacity of 1, attempting to maximize the number of nodes launched up to the requested capacity. The strategy uses a best-effort logic for the launch of the EC2 instances for all the jobs plus the **all-or-nothing** logic for the assignment of the Amazon EC2 instances to Slurm nodes for each job.

The strategy groups launch requests into batches, one for each compute resource requested and up to 500 nodes each. For requests spanning multiple compute resources or exceeding 500 nodes, ParellelCluster sequentially processes multiple batches.

It ensure that no idle instances will be left at the end of the scaling process, by maximizing the throughput at the cost of temporary over-scaling during the scaling process.

Limitations
+ Temporary over-scaling is possible, leading to additional costs for instances that transition to a running state before scaling completion.
+ The same instance limit as in the all-or-nothing strategy applies, subject to AWS's RunInstances resource account limit.

**best-effort** scaling:

This strategy calls Amazon EC2 launch instance API call by targeting a minimum capacity of 1 and aiming to achieve the total requested capacity at the cost of leaving idle instances after the scaling process execution if not all the requested capacity is available. The strategy uses a best-effort logic for the launch of the Amazon EC2 instances for all the jobs plus the **best-effort** logic for the assignment of the Amazon EC2 instances to Slurm nodes for each job.

The strategy groups launch requests into batches, one for each compute resource requested and up to 500 nodes each. For requests spanning multiple compute resources or exceeding 500 nodes, ParallelCluster sequentially processes multiple batches.

This strategy allows for scaling far beyond the default 1000 instances limit over multiple scaling process executions, at the cost of having idle instances across the different scaling processes.

Limitations
+ Possible idle running instances at the end of the scaling process, for the case when it’s not possible to allocate all the nodes requested by the jobs.

The following is an example that shows how the scaling of dynamic nodes behave using the different **ParallelCluster launch strategies**. Suppose you have submitted two jobs requesting 20 nodes each, for a total of 40 nodes of the same type, but there are only 30 Amazon EC2 instances available to cover the requested capacity on EC2.

**all-or-nothing** scaling: 
+ For the first job, an** all-or-nothing** Amazon EC2 launch instance API is called, requesting 20 instances. A successful call has results in the launch of 20 instances
+ **all-or-nothing **assignment of the 20 launched instances to Slurm nodes for the first job is successful
+ Another **all-or-nothing** Amazon EC2 launch instance API is called, requesting 20 instances for the second job. The call is not successful, since there is only capacity for another 10 instances. No instances are launched at this time

**greedy-all-or-nothing** scaling:
+ A **best-effort** Amazon EC2 launch instance API is called, requesting 40 instances, which is the total capacity requested by all the jobs. This results in the launch of 30 instances
+ An **all-or-nothing** assignment of 20 of the launched instances to Slurm nodes for the first job is successful
+ Another **all-or-nothing** assignment of the remaining launched instances to Slurm nodes for the second job is tried, but since there are only 10 available instances out of the total 20 requested by the job, the assignment is not successful
+ The 10 unassigned launched instances are terminated

**best-effort** scaling:
+ A **best-effort** Amazon EC2 launch instance API is called, requesting 40 instances, which is the total capacity requested by all the jobs. This results in the launch of 30 instances.
+ A **best-effort** assignment of 20 of the launched instances to Slurm nodes for the first job is successful.
+ Another **best-effort** assignment of the remaining 10 launched instances to Slurm nodes for the second job is successful, even if the total requested capacity was 20. But since the job was requesting the 20 nodes, and it was possible to assign Amazon EC2 instances to only 10 of them, the job cannot start and the instances are left running idle, until enough capacity is found to start the missing 10 instances at a later call of the scaling process, or the scheduler schedules the job on other, already running, compute nodes.

# Slurm dynamic node allocation strategies in version 3.7.x
Version 3.7.x

ParallelCluster uses 2 types of dynamic node allocation strategies to scale the cluster:
+ 

**Allocation based on available requested node information:**
  + **All-nodes resume** or **node-list** scaling:

    ParallelCluster scales up the cluster based only on Slurm's requested node list names when Slurm's `ResumeProgram` runs. It allocates compute resources to nodes only by node name. The list of node names can span multiple jobs.
  + **Job-level resume** or **job-level** scaling:

    ParallelCluster scales up the cluster based on the requirements of each job, the current number of nodes that are allocated to the job, and which nodes need to be resumed. ParallelCluster gets this information from the `SLURM_RESUME_FILE` environment variable.
+ 

**Allocation with an Amazon EC2 launch strategy:**
  + **Best-effort** scaling:

    ParallelCluster scales up the cluster by using an Amazon EC2 launch instance API call with the minimum target capacity equal to 1, to launch some, but not necessarily all of instances needed to support the requested nodes.
  + **All-or-nothing** scaling:

    ParallelCluster scales up the cluster by using an Amazon EC2 launch instance API call that only succeeds if all of the instances needed to support the requested nodes are launched. In this case, it calls the Amazon EC2 launch instance API with the minimum target capacity equal to the total requested capacity.

By default, ParallelCluster uses **node-list** scaling with a **best-effort** Amazon EC2 launch strategy to launch some, but not necessarily all of instances needed to support the requested nodes. It tries to provision as much capacity as possible to serve the submitted workload.

Starting with ParallelCluster version 3.7.0, ParallelCluster uses **job-level** scaling with an **all-or-nothing** EC2 launch strategy for jobs submitted in **exclusive mode**. When you submit a job in exclusive mode, the job has exclusive access to its allocated nodes. For more information, see [EXCLUSIVE](https://slurm.schedmd.com/slurm.conf.html#OPT_EXCLUSIVE) in the Slurm documentation.

To submit a job in exclusive mode:
+ Pass the exclusive flag when submitting a Slurm job to the cluster. For example, `sbatch ... --exclusive`.

  OR
+ Submit a job to a cluster queue that has been configured with [`JobExclusiveAllocation`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-JobExclusiveAllocation) set to `true`.

When submitting a job in exclusive mode:
+ ParallelCluster currently batches launch requests to include up to 500 nodes. If a job requests more than 500 nodes, ParallelCluster makes an **all-or-nothing** launch request for each set of 500 nodes and an additional launch request for the remainder of nodes.
+ If node allocation is in a single compute resource, ParallelCluster makes an **all-or-nothing** launch request for each set of 500 nodes and an additional launch request for the remainder of nodes. If a launch request fails, ParallelCluster terminates the unused capacity created by all of the launch requests.
+ If node allocation spans multiple compute resources, ParallelCluster needs to make an **all-or-nothing** launch request for each compute resource. These requests are also batched. If a launch request fails for one of the compute resources, ParallelCluster terminates the unused capacity created by all of the compute resource launch requests.

**job-level** scaling with **all-or-nothing** launch strategy known limitations:
+ When you submit a job in a compute resource with a single instance type, in a queue that spans multiple Availability Zones, the **all-or-nothing** EC2 launch API call only succeeds if all of the capacity can be provided in a single Availability Zone.
+ When you submit a job in a compute resource with multiple instance types, in a queue with a single Availability Zone, the **all-or-nothing** Amazon EC2 launch API call only succeeds if all of the capacity can be provided by a single instance type.
+ When you submit a job in a compute resource with multiple instance types, in a queue spanning multiple Availability Zones, the **all-or-nothing** Amazon EC2 launch API call isn't supported and ParallelCluster performs **best-effort** scaling instead.

# Slurm dynamic node allocation strategies in version 3.6.x and previous
Version 3.6.x and previous

AWS ParallelCluster uses only one type of dynamic node allocation strategy to scale the cluster:
+ Allocation based on available requested node information:
  + **All-nodes resume** or **node-list** scaling: ParallelCluster scales up the cluster based only on Slurm's requested node list names when Slurm's`ResumeProgram` runs. It allocates compute resources to nodes only by node name. The list of node names can span multiple jobs.
+ Allocation with an Amazon EC2 launch strategy:
  + **Best-effort** scaling: ParallelCluster scales up the cluster by using an Amazon EC2 launch instance API call with the minimum target capacity equal to 1, to launch some, but not necessarily all of instances needed to support the requested nodes.

 ParallelCluster uses**node-list** scaling with a **best-effort** Amazon EC2 launch strategy to launch some, but not necessarily all of instances needed to support the requested nodes. It tries to provision as much capacity as possible to serve the submitted workload. 

Limitations
+ Possible idle running instances at the end of the scaling process, for the case when it’s not possible to allocate all the nodes requested by the jobs.

# Slurm accounting with AWS ParallelCluster


Starting with version 3.3.0, AWS ParallelCluster supports Slurm accounting with the cluster configuration parameter [SlurmSettings](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [Database](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database).

Starting with version 3.10.0, AWS ParallelCluster supports Slurm accounting with an external Slurmdbd with the cluster conﬁguration parameter [SlurmSettings](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [ExternalSlurmdbd](Scheduling-v3.md#Scheduling-v3-SlurmSettings-ExternalSlurmdbd). Using an external Slurmdbd is recommended if multiple clusters share the same database.

With Slurm accounting, you can integrate an external accounting database to do the following:
+ Manage cluster users or groups of users and other entities. With this capability, you can use Slurm's more advanced features, such as resource limit enforcement, fair-share, and QOSs.
+ Collect and save job data, such as the user that ran the job, the job's duration, and the resources it uses. You can view the saved data with the `sacct` utility.

**Note**  
AWS ParallelCluster supports Slurm accounting for [Slurm supported MySQL database servers](https://slurm.schedmd.com/accounting.html#mysql-configuration).

## Working with Slurm accounting using external Slurmdbd in AWS ParallelCluster v3.10.0 and later


Before you configure Slurm accounting, you must have an existing external Slurmdbd database server, which connects to an existing external database server.

To configure this, define the following:
+ The address of the external Slurmdbd server in [ExternalSlurmdbd](Scheduling-v3.md#Scheduling-v3-SlurmSettings-ExternalSlurmdbd) / [Host](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ExternalSlurmdbd-Host). The server must exist and be reachable from the head node.
+ The munge key to communicate with the external Slurmdbd server in [MungeKeySecretArn](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-MungeKeySecretArn).

To step through a tutorial, see [Creating a cluster with an external Slurmdbd accounting](external-slurmdb-accounting.md).

**Note**  
You are responsible to manage the Slurm database accounting entities.

The architecture of the AWS ParallelCluster external SlurmDB support feature enables multiple clusters sharing the same SlurmDB and the same database.

 ![\[\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/External_Slurmdbd_Architecture_ASG.png)

**Warning**  
Traffic between AWS ParallelCluster and the external SlurmDB is not encrypted. It is recommended to run the cluster and the external SlurmDB in a trusted network.

## Working with Slurm accounting using head node Slurmdbd in AWS ParallelCluster v3.3.0 and later


Before you configure Slurm accounting, you must have an existing external database server and database that uses `mysql` protocol.

To configure Slurm accounting with AWS ParallelCluster, you must define the following:
+ The URI for the external database server in [Database](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database) / [Uri](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-Uri). The server must exist and be reachable from the head node.
+ Credentials to access the external database that are defined in [Database](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database) / [PasswordSecretArn](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-PasswordSecretArn) and [Database](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database) / [UserName](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-UserName). AWS ParallelCluster uses this information to configure accounting at the Slurm level and the `slurmdbd` service on the head node. `slurmdbd` is the daemon that manages communication between the cluster and the database server.

To step through a tutorial, see [Creating a cluster with Slurm accounting](tutorials_07_slurm-accounting-v3.md).

**Note**  
AWS ParallelCluster performs a basic bootstrap of the Slurm accounting database by setting the default cluster user as database admin in the Slurm database. AWS ParallelCluster doesn't add any other user to the accounting database. The customer is responsible for managing the accounting entities in the Slurm database.

AWS ParallelCluster configures [https://slurm.schedmd.com/slurmdbd.html](https://slurm.schedmd.com/slurmdbd.html) to ensure that a cluster has its own Slurm database on the database server. The same database server can be used across multiple clusters, but each cluster has its own separate database. AWS ParallelCluster uses the cluster name to define the name for the database in the `slurmdbd` configuration file [https://slurm.schedmd.com/slurmdbd.conf.html#OPT_StorageLoc](https://slurm.schedmd.com/slurmdbd.conf.html#OPT_StorageLoc) parameter. Consider the following situation. A database that's present on the database server includes a cluster name that doesn't map to an active cluster name. In this case, you can create a new cluster with that cluster name to map to that database. Slurm reuses the database for the new cluster.

**Warning**  
We don't recommend setting up more than one cluster to use the same database at once. Doing so can lead to performance issues or even database deadlock situations.
If Slurm accounting is enabled on the head node of a cluster, we recommend using an instance type with a powerful CPU, more memory, and higher network bandwidth. Slurm accounting can add strain on the head node of the cluster.

In the current architecture of the AWS ParallelCluster Slurm accounting feature, each cluster has its own instance of the `slurmdbd` daemon as shown in the following diagram example configurations.

 ![\[\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/slurm-acct-arch.png)

If you're adding custom Slurm multi-cluster or federation functionalities to your cluster environment, all clusters must reference the same `slurmdbd` instance. For this alternative, we recommend that you enable AWS ParallelCluster Slurm accounting on one cluster and manually configure the other clusters to connect to the `slurmdbd` that are hosted on the first cluster.

If you're using AWS ParallelCluster versions prior to version 3.3.0, refer to the alternative method to implement Slurm accounting that's described in this [HPC Blog Post](https://aws.amazon.com/blogs/compute/enabling-job-accounting-for-hpc-with-aws-parallelcluster-and-amazon-rds/).

## Slurm accounting considerations


### Database and cluster on different VPCs


To enable Slurm accounting, a database server is needed to serve as a backend for the read and write operations that the `slurmdbd` daemon performs. Before the cluster is created or updated to enable Slurm accounting, the head node must be able to reach the database server.

If you need to deploy the database server on a VPC other than the one that the cluster uses, consider the following:
+ To enable communication between the `slurmdbd` on the cluster side and the database server, you must set up connectivity between the two VPCs. For more information, see [VPC Peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) in the *Amazon Virtual Private Cloud User Guide*.
+ You must create the security group that you want to attach to the head node on the VPC of the cluster. After the two VPCs have been peered, cross-linking between the database side and the cluster side security groups is available. For more information, see [Security Group Rules](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html#SecurityGroupRules) in the *Amazon Virtual Private Cloud User Guide*.

### Configuring TLS encryption between `slurmdbd` and the database server


With the default Slurm accounting configuration that AWS ParallelCluster provides, `slurmdbd` establishes a TLS encrypted connection to the database server, if the server supports TLS encryption. AWS database services such as Amazon RDS and Amazon Aurora support TLS encryption by default.

You can require secure connections on the server side by setting the `require_secure_transport` parameter on the database server. This is configured in the provided CloudFormation template.

Following security best practice, we recommend that you also enable server identity verification on the `slurmdbd` client. To do this, configure the [StorageParameters](https://slurm.schedmd.com/slurmdbd.conf.html#OPT_StorageParameters) in the `slurmdbd.conf`. Upload the server CA certificate to the head node of the cluster. Next, set the [SSL\$1CA](https://slurm.schedmd.com/slurmdbd.conf.html#OPT_SSL_CA) option of `StorageParameters` in `slurmdbd.conf` to the path of the server CA certificate on the head node. Doing this enables server identity verification on the `slurmdbd` side. After you make these changes, restart the `slurmdbd` service to re-establish connectivity to the database server with identity verification enabled.

### Updating the database credentials


To update the values for [Database](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database) / [UserName](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-UserName) or [PasswordSecretArn](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-PasswordSecretArn), you must first stop the compute fleet. Suppose that the secret value that's stored in the AWS Secrets Manager secret is changed and its ARN isn't changed. In this situation, the cluster doesn't automatically update the database password to the new value. To update the cluster for the new secret value, run the following command from the head node.

```
$ sudo /opt/parallelcluster/scripts/slurm/update_slurm_database_password.sh
```

**Warning**  
To avoid losing accounting data, we recommend that you only change the database password when the compute fleet is stopped.

### Database monitoring


We recommend that you enable the monitoring features of the AWS database services. For more information, see [Amazon RDS monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html) or [Amazon Aurora monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/MonitoringAurora.html) documentation. 

# Slurm configuration customization


Starting with AWS ParallelCluster version 3.6.0, you can customize the `slurm.conf` Slurm configuration in an AWS ParallelCluster cluster configuration.

In the cluster configuration, you can customize Slurm configuration parameters by using the following cluster configuration settings:
+ Customize Slurm parameters for the entire cluster by using either the [`SlurmSettings`](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [`CustomSlurmSettings`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-CustomSlurmSettings) or the [`CustomSlurmSettingsIncludeFile`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-CustomSlurmSettingsIncludeFile) parameter. AWS ParallelCluster fails if you specify both.
+ Customize Slurm parameters for a queue by using [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) / [`CustomSlurmSettings`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-CustomSlurmSettings) (mapped to Slurm partitions).
+ Customize Slurm parameters for a compute resource by using [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) / [`ComputeResources`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-ComputeResources) / [`CustomSlurmSettings`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-CustomSlurmSettings) (mapped to Slurm nodes).

## Slurm configuration customization limits and considerations when using AWS ParallelCluster



+ For `CustomSlurmSettings` and `CustomSlurmSettingsIncludeFile` settings, you can only specify and update `slurm.conf` parameters that are included in the [Slurm version](slurm-workload-manager-v3.md) that's supported by the AWS ParallelCluster version that you are using to configure a cluster.
+ If you specify custom Slurm configurations in any of the `CustomSlurmSettings` parameters, AWS ParallelCluster performs validation checks and prevents setting or updating Slurm configuration parameters that conflict with AWS ParallelCluster logic. The Slurm configuration parameters that are known to conflict with AWS ParallelCluster are identified in deny lists. The deny lists can change in future AWS ParallelCluster versions if other Slurm features are added. For more information, see [Deny-listed Slurm configuration parameters for `CustomSlurmSettings`](#slurm-configuration-denylists-v3).
+ AWS ParallelCluster only checks whether a parameter is in a deny list. AWS ParallelCluster doesn't validate your custom Slurm configuration parameter syntax or semantics. You are responsible for validating your custom Slurm configuration parameters. Invalid custom Slurm configuration parameters can cause Slurm daemon failures that can lead to cluster create and update failures.
+ If you specify custom Slurm configurations in `CustomSlurmSettingsIncludeFile`, AWS ParallelCluster doesn't perform any validation.
+ You can update `CustomSlurmSettings` and `CustomSlurmSettingsIncludeFile` without stopping and starting the compute fleet. In this case, AWS ParallelCluster restarts the `slurmctld` daemon and runs the `scontrol reconfigure` command.

  Some Slurm configuration parameters might require different operations before a change is registered in the entire cluster. For example, they might require a restart of all daemons in the cluster. You are responsible for verifying whether AWS ParallelCluster operations are sufficient for propagating your custom Slurm configuration parameter settings during updates. If you find that AWS ParallelCluster operations aren't sufficient, it's your responsibility to provide the additional actions required to propagate the updated settings as recommended in the [Slurm documentation](https://slurm.schedmd.com/documentation.html).

## Deny-listed Slurm configuration parameters for `CustomSlurmSettings`


The following tables list the parameters with the AWS ParallelCluster versions that deny their use, starting with version 3.6.0. `CustomSlurmSettings` isn't supported for AWS ParallelCluster versions earlier than version 3.6.0.


**Deny-listed parameters at cluster level:**  

| Slurm parameter | Deny-listed in AWS ParallelCluster versions | 
| --- | --- | 
|  CommunicationParameters  |  3.6.0  | 
|  Epilog  |  3.6.0  | 
|  GresTypes  |  3.6.0  | 
|  LaunchParameters  |  3.6.0  | 
|  Prolog  |  3.6.0  | 
|  ReconfigFlags  |  3.6.0  | 
|  ResumeFailProgram  |  3.6.0  | 
|  ResumeProgram  |  3.6.0  | 
|  ResumeTimeout  |  3.6.0  | 
|  SlurmctldHost  |  3.6.0  | 
|  SlurmctldLogFile  |  3.6.0  | 
|  SlurmctldParameters  |  3.6.0  | 
|  SlurmdLogfile  |  3.6.0  | 
|  SlurmUser  |  3.6.0  | 
|  SuspendExcNodes  |  3.6.0  | 
|  SuspendProgram  |  3.6.0  | 
|  SuspendTime  |  3.6.0  | 
|  TaskPlugin  |  3.6.0  | 
|  TreeWidth  |  3.6.0  | 


**Deny-listed parameters at cluster level when the [native Slurm accounting integration](slurm-accounting-v3.md) is configured in the cluster configuration:**  

| Slurm parameter | Deny-listed in AWS ParallelCluster versions | 
| --- | --- | 
|  AccountingStorageType  |  3.6.0  | 
|  AccountingStorageHost  |  3.6.0  | 
|  AccountingStoragePort  |  3.6.0  | 
|  AccountingStorageUser  |  3.6.0  | 
|  JobAcctGatherType  |  3.6.0  | 


**Deny-listed parameters at the queue (partition) level for queues managed by AWS ParallelCluster:**  

| Slurm parameter | Deny-listed in AWS ParallelCluster versions | 
| --- | --- | 
|  Nodes  |  3.6.0  | 
|  PartitionName  |  3.6.0  | 
|  ResumeTimeout  |  3.6.0  | 
|  State  |  3.6.0  | 
|  SuspendTime  |  3.6.0  | 


**Deny-listed parameters at the compute resource (node) level for compute resource managed by AWS ParallelCluster:**  

| Slurm parameter | Deny-listed in AWS ParallelCluster version and later versions | 
| --- | --- | 
|  CPUs  |  3.6.0  | 
|  Features  |  3.6.0  | 
|  Gres  |  3.6.0  | 
|  NodeAddr  |  3.6.0  | 
|  NodeHostname  |  3.6.0  | 
|  NodeName  |  3.6.0  | 
|  Weight  |  3.7.0  | 

# Slurm `prolog` and `epilog`


Starting with AWS ParallelCluster version 3.6.0, the Slurm configuration that's deployed with AWS ParallelCluster includes the `Prolog` and `Epilog` configuration parameters:

```
# PROLOG AND EPILOG
Prolog=/opt/slurm/etc/scripts/prolog.d/*
Epilog=/opt/slurm/etc/scripts/epilog.d/*
SchedulerParameters=nohold_on_prolog_fail
BatchStartTimeout=180
```

For more information, see the [Prolog and Epilog Guide](https://slurm.schedmd.com/prolog_epilog.html) in the Slurm documentation.

AWS ParallelCluster includes the following prolog and epilog scripts:
+ `90_plcuster_health_check_manager` (in the `Prolog` folder)
+ `90_pcluster_noop` (in the `Epilog` folder)

**Note**  
Both the `Prolog` and `Epilog` folder must contain at least one file.

You can use your own custom `prolog` or `epilog` scripts by adding them to the corresponding `Prolog` and `Epilog` folders.

**Warning**  
Slurm runs every script in the folders, in reverse alphabetical order.

The run time duration of the `prolog` and `epilog` scripts impact the time needed to run a job. Update the `BatchStartTimeout` configuration setting when running multiple or long running `prolog` scripts. The default is 3 minutes.

If you are using custom `prolog` and `epilog` scripts, locate the scripts in the respective `Prolog` and `Epilog` folders. We recommend that you keep the `90_plcuster_health_check_manager` script that runs before every custom script. For more information, see [Slurm configuration customization](slurm-configuration-settings-v3.md).

# Cluster capacity size and update


The capacity of the cluster is defined by the number of compute nodes the cluster can scale. Compute nodes are backed by Amazon EC2 instances defined within compute resources in the AWS ParallelCluster configuration `(Scheduling/SlurmQueues/ ComputeResources)`, and are organized into queues `(Scheduling/SlurmQueues)` that map 1:1 to Slurm partitions. 

Within a compute resource it’s possible to configure the minimum number of compute nodes (instances) that must always be kept running in the cluster ( `MinCount` ), and the maximum number of instances the compute resource can scale to ([`MaxCount`3 ](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-MaxCount)).

At cluster creation time, or upon a cluster update, AWS ParallelCluster launches as many Amazon EC2 instances as configured in `MinCount` for each compute resource (`Scheduling/SlurmQueues/ ComputeResources`) defined in the cluster. The instances launched to cover the minimal amount of nodes for a compute resources in the cluster are called ***static nodes**.* Once started, static nodes are meant to be persistent in the cluster and they are not terminated by the system, unless a particular event or condition occurs. Such events include, for example, the failure of Slurm or Amazon EC2 health checks and the change of the Slurm node status to DRAIN or DOWN. 

The Amazon EC2 instances, in the range of `1` to `‘MaxCount - MinCount’` (`MaxCount ` *minus*` MinCount)`, launched on-demand to deal with the increased load of the cluster, are referred to as ***dynamic nodes***. Their nature is ephemeral, they are launched to serve pending jobs and are terminated once they stay idle for a period of time defined by `Scheduling/SlurmSettings/ScaledownIdletime` in the cluster configuration (default: 10 minutes).

Static nodes and dynamic node comply to the following naming schema:
+ Static nodes `<Queue/Name>-st-<ComputeResource/Name>-<num>` where `<num> = 1..ComputeResource/MinCount`
+ Dynamic nodes `<Queue/Name>-dy-<ComputeResource/Name>-<num>` where `<num> = 1..(ComputeResource/MaxCount - ComputeResource/MinCount)`

For example given the following AWS ParallelCluster configuration: 

```
Scheduling:  
    Scheduler: Slurm  
    SlurmQueues:    
        - Name: queue1      
            ComputeResources:        
                - Name: c5xlarge          
                    Instances:            
                        - InstanceType: c5.xlarge          
                        MinCount: 100          
                        MaxCount: 150
```

The following nodes will be defined in Slurm

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
```

When a compute resource has `MinCount == MaxCount`, all the corresponding compute nodes will be static and all the instances will be launched at cluster creation/update time and kept up and running. For example: 

```
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue1
      ComputeResources:
        - Name: c5xlarge
          Instances:
            - InstanceType: c5.xlarge
          MinCount: 100
          MaxCount: 100
```

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
```

## Cluster capacity update


The update of the cluster capacity includes adding or removing queues, compute resources or changing the `MinCount/MaxCount` of a compute resource. Starting from AWS ParallelCluster version 3.9.0, reducing the size of a queue requires the compute fleet to be stopped or [QueueUpdateStrategy](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-QueueUpdateStrategy) set to TERMINATE for before a cluster update to take place. It’s not required to stop the compute fleet or to set [QueueUpdateStrategy](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-QueueUpdateStrategy) to TERMINATE when: 
+ Adding new queues to Scheduling/[`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues)

   
+ Adding new compute resources `Scheduling/SlurmQueues/ComputeResources` to a queue
+ Increasing the `MaxCount` of a compute resource
+ Increasing MinCount of a compute resource and increasing MaxCount of the same compute resource of at least the same amount

## Considerations and limitations


This section is meant to outline any important factors, constraints, or limitations that should be taken into account when resizing the cluster capacity.
+ When removing a queue from `Scheduling/SlurmQueues` all the compute nodes with name `<Queue/Name>-*` , both static and dynamic, will be removed from the Slurm configuration and the corresponding Amazon EC2 instances will be terminated.
+ When removing a compute resource `Scheduling/SlurmQueues/ComputeResources` from a queue, all the compute nodes with name `<Queue/Name>-*-<ComputeResource/Name>-*` , both static and dynamic, will be removed from the Slurm configuration and the corresponding Amazon EC2 instances will be terminated.

When changing the `MinCount` parameter of a compute resource we can distinguish two different scenarios, if `MaxCount` is kept equal to `MinCount` (static capacity only), and if `MaxCount` is greater than `MinCount` (mixed static and dynamic capacity).

### Capacity changes with static nodes only

+ If `MinCount == MaxCount` , when increasing `MinCount` (and `MaxCount` ), the cluster will be configured by extending the number of static nodes to the new value of `MinCount` `<Queue/Name>-st-<ComputeResource/Name>-<new_MinCount>` and the system will keep trying to launch Amazon EC2 instances to fulfill the new required static capacity.
+ If `MinCount == MaxCount` , when decreasing `MinCount` (and `MaxCount` ) of the amount N, the cluster will be configured by removing the last N static nodes `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount - N>...<old_MinCount>]` and the system will terminate the corresponding Amazon EC2 instances.
  + Initial state `MinCount = MaxCount = 100`
  + 

    ```
    $ sinfo
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
    ```
  + Update `-30` on `MinCount` and `MaxCount: MinCount = MaxCount = 70`
  + 

    ```
    $ sinfo
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    queue1*      up   infinite     70   idle queue1-st-c5xlarge-[1-70]
    ```

### Capacity changes with mixed nodes


If `MinCount < MaxCount`, when increasing `MinCount` by an amount N (assuming `MaxCount` will be kept unchanged), the cluster will be configured by extending the number static nodes to the new value of `MinCount` ( `old_MinCount + N` ): `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount + N>` and the system will keep trying to launch Amazon EC2 instances to fulfill the new required static capacity. Moreover, to honor the `MaxCount` capacity of the compute resource, the cluster configuration is updated by *removing the last N dynamic nodes*: `<Queue/Name>-dy-<ComputeResource/Name>-[<MaxCount - old_MinCount - N>...<MaxCount - old_MinCount>]` and the system will terminate the corresponding Amazon EC2 instances.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update \$130 to `MinCount : MinCount = 130 (MaxCount = 150)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     20  idle~ queue1-dy-c5xlarge-[1-20]
  queue1*      up   infinite    130   idle queue1-st-c5xlarge-[1-130]
  ```

If `MinCount < MaxCount`, when increasing `MinCount` and `MaxCount` of the same amount N, the cluster will be configured by extending the number static nodes to the new value of `MinCount` ( `old_MinCount + N` ): `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount + N>` and the system will keep trying to launch Amazon EC2 instances to fulfill the new required static capacity. Moreover, no changes will be done on the number of dynamic nodes to honor the new

 `MaxCount` value.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update \$130 to `MinCount : MinCount = 130 (MaxCount = 180)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     20  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    130   idle queue1-st-c5xlarge-[1-130]
  ```

If `MinCount < MaxCount`, when decreasing `MinCount` of the amount N (assuming `MaxCount` will be kept unchanged), the cluster will be configured by removing the last N static nodes static nodes `<Queue/Name>-st-<ComputeResource/Name>-[<old_MinCount - N>...<old_MinCount>`and the system will terminate the corresponding Amazon EC2 instances. Moreover, to honor the `MaxCount` capacity of the compute resource, the cluster configuration is updated by extending the number of the dynamic nodes to fill the gap `MaxCount - new_MinCount: <Queue/Name>-dy-<ComputeResource/Name>-[1..<MazCount - new_MinCount>]` In this case, since those are dynamic nodes, no new Amazon EC2 instances will be launched unless the scheduler has jobs in pending on the new nodes.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update -30 on `MinCount : MinCount = 70 (MaxCount = 120)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     80  idle~ queue1-dy-c5xlarge-[1-80]
  queue1*      up   infinite     70   idle queue1-st-c5xlarge-[1-70]
  ```

If `MinCount < MaxCount`, when decreasing `MinCount` and `MaxCount` of the same amount N, the cluster will be configured by removing the last N static nodes `<Queue/Name>-st-<ComputeResource/Name>-<old_MinCount - N>...<oldMinCount>]` and the system will terminate the corresponding Amazon EC2 instances.

 Moreover, no changes will be done on the number of dynamic nodes to honor the new `MaxCount` value.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update -30 on `MinCount : MinCount = 70 (MaxCount = 120)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     80  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite     70   idle queue1-st-c5xlarge-[1-70]
  ```

If `MinCount < MaxCount`, when decreasing `MaxCount` of the amount N (assuming `MinCount` will be kept unchanged), the cluster will be configured by removing the last N dynamic nodes `<Queue/Name>-dy-<ComputeResource/Name>-<old_MaxCount - N...<oldMaxCount>]` and the system will terminate the corresponding Amazon EC2 instances in the case they were running.No impact is expected on the static nodes.
+ Initial state: `MinCount = 100; MaxCount = 150`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     50  idle~ queue1-dy-c5xlarge-[1-50]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```
+ Update -30 on `MaxCount : MinCount = 100 (MaxCount = 120)`
+ 

  ```
  $ sinfo
  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  queue1*      up   infinite     20  idle~ queue1-dy-c5xlarge-[1-20]
  queue1*      up   infinite    100   idle queue1-st-c5xlarge-[1-100]
  ```

## Impacts on the Jobs


In all the cases where nodes are removed and Amazon EC2 instances terminated, a sbatch job running on the removed nodes will be re-queued, unless there are no other nodes satisfying the job requirements. In this last case the job fails with status NODE\$1FAIL and disappears from the queue, and it must be re-submitted manually.

If you are planning to perform a cluster resize update, you can prevent jobs to go running in the nodes that are going to be removed during the planned update. This is possible by setting the nodes to be removed in maintenance. Please be aware that setting a node in maintenance would not impact jobs that are eventually already running in the node.

Suppose that with the planned cluster resize update you are going to remove the node `qeueu-st-computeresource-[9-10`]. You can create a Slurm reservation with the following command

```
sudo -i scontrol create reservation ReservationName=maint_for_update user=root starttime=now duration=infinite flags=maint,ignore_jobs nodes=qeueu-st-computeresource-[9-10]
```

This will create a Slurm reservation named `maint_for_update` on the nodes `qeueu-st-computeresource-[9-10]`. From the time when the reservation is created, no more jobs can go running into the nodes `qeueu-st-computeresource-[9-10]`. Please be aware that the reservation will not prevent jobs to be eventually allocated on the nodes `qeueu-st-computeresource-[9-10]`.

After the cluster resize update, if the Slurm reservation was set only on nodes that were removed during the resize update, the maintenance reservation will be automatically deleted. If instead you had created a Slurm reservation on nodes that are still present after the cluster resize update, we may want to remove the maintenance reservation on the nodes after the resize update is performed, by using the following command 

```
sudo -i scontrol delete ReservationName=maint_for_update
```

For additional details on Slurm reservation, see the official SchedMD doc [here](https://slurm.schedmd.com/reservations.html).

## Cluster update process on capacity changes


Upon a scheduler configuration change, the following steps are executed during the cluster update process:
+ Stop AWS ParallelCluster `clustermgtd (supervisorctl stop clustermgtd)`
+ Generate updated Slurm partitions configuration from AWS ParallelCluster configuration
+ Restart `slurmctld` (done through Chef service recipe)
+ Check `slurmctld` status `(systemctl is-active --quiet slurmctld.service)`
+ Reload Slurm configuration `(scontrol reconfigure)`
+ Start `clustermgtd (supervisorctl start clustermgtd)`

# Using AWS Batch (`awsbatch`) scheduler with AWS ParallelCluster
AWS Batch

**Warning**  
AWS CodeBuild is not supported in Asia Pacific (Malaysia) (`ap-southeast-5`) and Asia Pacific (Thailand) (`ap-southeast-7`) regions. Therefore, ParallelCluster AWS Batch integration is not supported in those regions.

AWS ParallelCluster also supports AWS Batch schedulers. The following topics describe how to use AWS Batch. For information about AWS Batch, see [AWS Batch](https://aws.amazon.com/batch/). For documentation, see the [AWS Batch User Guide](https://docs.aws.amazon.com/batch/latest/userguide/).

**AWS ParallelCluster CLI commands for AWS Batch**

When you use the `awsbatch` scheduler, the AWS ParallelCluster CLI commands for AWS Batch are automatically installed in the AWS ParallelCluster head node. The CLI uses AWS Batch API operations and permits the following operations:
+ Submit and manage jobs.
+ Monitor jobs, queues, and hosts.
+ Mirror traditional scheduler commands.

**Important**  
AWS ParallelCluster doesn't support GPU jobs for AWS Batch. For more information, see [GPU jobs](https://docs.aws.amazon.com/batch/latest/userguide/gpu-jobs.html).

This CLI is distributed as a separate package. For more information, see [Scheduler support](moving-from-v2-to-v3.md#scheduler_support).

**Topics**
+ [

# `awsbsub`
](awsbatchcli.awsbsub-v3.md)
+ [

# `awsbstat`
](awsbatchcli.awsbstat-v3.md)
+ [

# `awsbout`
](awsbatchcli.awsbout-v3.md)
+ [

# `awsbkill`
](awsbatchcli.awsbkill-v3.md)
+ [

# `awsbqueues`
](awsbatchcli.awsbqueues-v3.md)
+ [

# `awsbhosts`
](awsbatchcli.awsbhosts-v3.md)

# `awsbsub`


Submits jobs to the job queue of the cluster.

```
awsbsub [-h] [-jn JOB_NAME] [-c CLUSTER] [-cf] [-w WORKING_DIR]
        [-pw PARENT_WORKING_DIR] [-if INPUT_FILE] [-p VCPUS] [-m MEMORY]
        [-e ENV] [-eb ENV_DENYLIST] [-r RETRY_ATTEMPTS] [-t TIMEOUT]
        [-n NODES] [-a ARRAY_SIZE] [-d DEPENDS_ON]
        [command] [arguments [arguments ...]]
```

**Important**  
AWS ParallelCluster doesn't support GPU jobs for AWS Batch. For more information, see [GPU jobs](https://docs.aws.amazon.com/batch/latest/userguide/gpu-jobs.html).

## Positional Arguments


***command***  
Submits the job (the command specified must be available on the compute instances) or the file name to be transferred. See also `--command-file`.

**arguments**  
(Optional) Specifies arguments for the command or the command-file.

## Named Arguments


**-jn *JOB\$1NAME*, --job-name *JOB\$1NAME***  
Names the job. The first character must be either a letter or number. The job name can contain letters (both uppercase and lowercase), numbers, hyphens, and underscores, and be up to 128 characters in length. 

**-c *CLUSTER*, --cluster *CLUSTER***  
Specifies the cluster to use.

**-cf, --command-file**  
Indicates that the command is a file to be transferred to the compute instances.  
Default: False

**-w *WORKING\$1DIR*, --working-dir *WORKING\$1DIR***  
Specifies the folder to use as the job's working directory. If a working directory isn't specified, the job is run in the `job-<AWS_BATCH_JOB_ID>` subfolder of the user’s home directory. You can use either this parameter or the `--parent-working-dir` parameter.

**-pw *PARENT\$1WORKING\$1DIR*, --parent-working-dir *PARENT\$1WORKING\$1DIR***  
Specifies the parent folder of the job's working directory. If a parent working directory isn't specified, it defaults to the user’s home directory. A subfolder named `job-<AWS_BATCH_JOB_ID>` is created in the parent working directory. You can use either this parameter or the `--working-dir` parameter.

**-if *INPUT\$1FILE*, --input-file *INPUT\$1FILE***  
Specifies the file to be transferred to the compute instances, in the job's working directory. You can specify multiple input file parameters.

**-p *VCPUS*, --vcpus *VCPUS***  
Specifies the number of vCPUs to reserve for the container. When used together with `–nodes`, it identifies the number of vCPUs for each node.  
Default: 1

**-m *MEMORY*, --memory *MEMORY***  
Specifies the hard limit of memory (in MiB) to provide for the job. If your job attempts to exceed the memory limit specified here, the job is ended.  
Default: 128

**-e *ENV*, --env *ENV***  
Specifies a comma-separated list of environment variable names to export to the job environment. To export all environment variables, specify ‘all’. Note that a list of 'all' environment variables doesn't include those listed in the `–env-blacklist` parameter, or variables starting with the `PCLUSTER_*` or `AWS_*` prefix.

**-eb *ENV\$1DENYLIST*, --env-blacklist *ENV\$1DENYLIST***  
Specifies a comma-separated list of environment variable names to **not** export to the job environment. By default, `HOME`, `PWD`, `USER`, `PATH`, `LD_LIBRARY_PATH`, `TERM`, and `TERMCAP` are not exported.

**-r *RETRY\$1ATTEMPTS*, --retry-attempts *RETRY\$1ATTEMPTS***  
Specifies the number of times to move a job to the `RUNNABLE` status. You can specify between 1 and 10 attempts. If the value of attempts is greater than 1, the job is retried if it fails, until it has moved to a `RUNNABLE` status for the specified number of times.  
Default: 1

**-t *TIMEOUT*, --timeout *TIMEOUT***  
Specifies the time duration in seconds (measured from the job attempt’s `startedAt` timestamp) after which AWS Batch terminates your job if it hasn't finished. The timeout value must be at least 60 seconds.

**-n *NODES*, --nodes *NODES***  
Specifies the number of nodes to reserve for the job. Specify a value for this parameter to enable multi-node parallel submission.  
When the [`Scheduler`](Scheduling-v3.md#yaml-Scheduling-Scheduler) / [`AwsBatchQueues`](Scheduling-v3.md#Scheduling-v3-AwsBatchQueues) / [`CapacityType`](Scheduling-v3.md#yaml-Scheduling-AwsBatchQueues-CapacityType) parameter is set to `SPOT`, multi-node parallel jobs *aren't* supported. Additionally, there must be an `AWSServiceRoleForEC2Spot` service-linked role in your account. You can create this role with the following AWS CLI command:  

```
$ aws iam create-service-linked-role --aws-service-name spot.amazonaws.com
```
For more information, see [Service-linked role for Spot Instance requests](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html#service-linked-roles-spot-instance-requests) in the *Amazon Elastic Compute Cloud User Guide for Linux Instances*.

**-a *ARRAY\$1SIZE*, --array-size *ARRAY\$1SIZE***  
Indicates the size of the array. You can specify a value between 2 and 10,000. If you specify array properties for a job, it becomes an array job.

**-d *DEPENDS\$1ON*, --depends-on *DEPENDS\$1ON***  
Specifies a semicolon-separated list of dependencies for a job. A job can depend upon a maximum of 20 jobs. You can specify a `SEQUENTIAL` type dependency without specifying a job ID for array jobs. A sequential dependency allows each child array job to complete sequentially, starting at index 0. You can also specify an N\$1TO\$1N type dependency with a job ID for array jobs. An N\$1TO\$1N dependency means that each index child of this job must wait for the corresponding index child of each dependency to complete before it can begin. The syntax for this parameter is "jobId=*<string>*,type=*<string>*;...".

# `awsbstat`


Shows the jobs that are submitted in the cluster’s job queue.

```
awsbstat [-h] [-c CLUSTER] [-s STATUS] [-e] [-d] [job_ids [job_ids ...]]
```

## Positional Arguments


***job\$1ids***  
Specifies the space-separated list of job IDs to show in the output. If the job is a job array, all of the child jobs are displayed. If a single job is requested, it is shown in a detailed version.

## Named Arguments


**-c *CLUSTER*, --cluster *CLUSTER***  
Indicates the cluster to use.

**-s *STATUS*, --status *STATUS***  
Specifies a comma-separated list of job statuses to include. The default job status is “active.”. Accepted values are: `SUBMITTED`, `PENDING`, `RUNNABLE`, `STARTING`, `RUNNING`, `SUCCEEDED`, `FAILED`, and `ALL`.  
Default: “`SUBMITTED`,`PENDING`,`RUNNABLE`,`STARTING`,`RUNNING`”

**-e, --expand-children**  
Expands jobs with children (both array and multi-node parallel).  
Default: False

**-d, --details**  
Shows jobs details.  
Default: False

# `awsbout`


Shows the output of a given job.

```
awsbout [-h] [-c CLUSTER] [-hd HEAD] [-t TAIL] [-s] [-sp STREAM_PERIOD] job_id
```

## Positional Arguments


***job\$1id***  
Specifies the job ID.

## Named Arguments


**-c *CLUSTER*, --cluster *CLUSTER***  
Indicates the cluster to use.

**-hd *HEAD*, --head *HEAD***  
Gets the first *HEAD* lines of the job output.

**-t *TAIL*, --tail *TAIL***  
Gets the last <tail> lines of the job output.

**-s, --stream**  
Gets the job output, and then waits for additional output to be produced. This argument can be used together with –tail to start from the latest <tail> lines of the job output.  
Default: False

**-sp *STREAM\$1PERIOD*, --stream-period *STREAM\$1PERIOD***  
Sets the streaming period.  
Default: 5

# `awsbkill`


Cancels or terminates jobs submitted in the cluster.

```
awsbkill [-h] [-c CLUSTER] [-r REASON] job_ids [job_ids ... ]
```

## Positional Arguments


***job\$1ids***  
Specifies the space-separated list of job IDs to cancel or terminate.

## Named Arguments


**-c *CLUSTER*, --cluster *CLUSTER***  
Indicates the name of the cluster to use.

**-r *REASON*, --reason *REASON***  
Indicates the message to attach to a job, explaining the reason for canceling it.  
Default: “Terminated by the user”

# `awsbqueues`


Shows the job queue that is associated with the cluster.

```
awsbqueues [-h] [-c CLUSTER] [-d] [job_queues [job_queues ... ]]
```

## Positional arguments


***job\$1queues***  
Specifies the space-separated list of queue names to show. If a single queue is requested, it is shown in a detailed version.

## Named arguments


**-c *CLUSTER*, --cluster *CLUSTER***  
Specifies the name of the cluster to use.

**-d, --details**  
Indicates whether to show the details of the queues.  
Default: False

# `awsbhosts`


Shows the hosts that belong to the cluster’s compute environment.

```
awsbhosts [-h] [-c CLUSTER] [-d] [instance_ids [instance_ids ... ]]
```

## Positional Arguments


***instance\$1ids***  
Specifies a space-separated list of instances IDs. If a single instance is requested, it is shown in a detailed version.

## Named Arguments


**-c *CLUSTER*, --cluster *CLUSTER***  
Specifies the name of the cluster to use.

**-d, --details**  
Indicates whether to show the details of the hosts.  
Default: False