# Using Amazon EC2 launch templates with AWS PCS
<a name="working-with_launch-templates"></a>

In Amazon EC2, a launch template can store a set of preferences so that you don't have to specify them individually when you launch instances. AWS PCS incorporates launch templates as a flexible way to configure compute node groups. When you create a node group, you provide a launch template. AWS PCS creates a derived launch template from it that includes transformations to help ensure it works with the service. 

Understanding what the options and considerations are when writing a custom launch template can help you write one for use with AWS PCS. For more information on launch templates, see Launching an Instance from a [Launch an instance from a launch template](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html) in the *Amazon EC2 User Guide*.

**Topics**
+ [Overview of launch templates in AWS PCS](working-with_launch-templates_overview.md)
+ [Create a basic launch template](working-with_launch-templates_create.md)
+ [Working with Amazon EC2 user data for AWS PCS](working-with_ec2-user-data.md)
+ [Capacity Reservations in AWS PCS](working-with_capacity-reservations.md)
+ [Useful launch template parameters](working-with_launch-templates_parameters.md)

# Overview of launch templates in AWS PCS
<a name="working-with_launch-templates_overview"></a>

There are [over 30 parameters available](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RequestLaunchTemplateData.html) you can include in an EC2 launch template, controlling many aspects of how instances are configured. Most are fully compatible with AWS PCS, but there are some exceptions.

The following parameters of EC2 Launch template will be ignored by AWS PCS as these properties have to be directly managed by the service: 
+ **Instance type/Specify instance type attributes** (`InstanceRequirements`) – AWS PCS does not support attribute-based instance selection.
+ **Instance type** (`InstanceType`) – Specify instance types when you create a node group.
+ **Advanced details/IAM instance profile** (`IamInstanceProfile`) – You provide this when you create or update the node group.
+ **Advanced details/Disable API termination** (`DisableApiTermination`) – AWS PCS must control the lifecycle of node group instances it launches.
+ **Advanced details/Disable API stop** (`DisableApiStop`) – AWS PCS must control the lifecycle of node group instances it launches.
+ **Advanced details/Stop – Hibernate behavior** (`HibernationOptions`) – AWS PCS does not support instance hibernation.
+ **Advanced details/Elastic GPU** (`ElasticGpuSpecifications`) – Amazon Elastic Graphics reached end of life on January 8, 2024.
+ **Advanced details/Elastic inference** (`ElasticInferenceAccelerators`) – Amazon Elastic Inference is no longer available to new customers.
+ **AAdvanced details/Specify CPU options/Threads per core** (`ThreadsPerCore`) – AWS PCS sets the number of threads per core to 1.

These parameters have special requirements that support compatibility with AWS PCS:
+ **User data**(`UserData`) – This must be multi-part encoded. See [Working with Amazon EC2 user data for AWS PCS](working-with_ec2-user-data.md).
+ **Application and OS Images**(`ImageId`) – You can include this. However, if you specify an AMI ID when you create or update the node group, it will override the value in the launch template. The AMI you provide must be compatible with AWS PCS. For more information, see "[Amazon Machine Images (AMIs) for AWS PCS](working-with_ami.md).
+ **Network settings/Firewall (security groups)**(`SecurityGroups`) – A list of security group names can’t be set in an AWS PCS launch template. You can set a list of security group IDs (`SecurityGroupIds`), unless you define network interfaces in the launch template. Then, you must specify security group IDs for each interface. For more information, see [Security groups in AWS PCS](working-with_networking_sg.md).
+ **Network settings/Advanced network configuration**(`NetworkInterfaces`) – If you use EC2 instances with a single network card, and don't require any specialized networking configuration, AWS PCS can configure instance networking for you. To configure multiple network cards or to enable Elastic Fabric Adapter on your instances, use `NetworkInterfaces`. Each network interface must have a list of security group IDs under `Groups`. For more information, see [Multiple network interfaces in AWS PCS](working-with_networking_multi-nic.md).
+ **Advanced details/Capacity reservation**(`CapacityReservationSpecification`) – This can be set, but cannot reference a specific `CapacityReservationId` when working with AWS PCS. You can, however, reference a capacity reservation group, where that group contains one or more capacity reservations. For more information, see [Capacity Reservations in AWS PCS](working-with_capacity-reservations.md).

# Create a basic launch template
<a name="working-with_launch-templates_create"></a>

You can create a launch template using the AWS Management Console or the AWS CLI.

------
#### [ AWS Management Console ]

**To create a launch template**

1. Open the [Amazon EC2 console](https://console.aws.amazon.com/ec2/home) and select **Launch templates**.

1. Choose **Create launch template**.

1. Under **Launch template name and description** enter a unique, distinctive name for **Launch template name**

1. Under Key pair (login) at Key pair name, select the SSH key pair that will be used to log into EC2 instances managed by AWS PCS. This is optional, but recommended. 

1. Under **Network settings**, then **Firewall (security groups)**, choose security groups to attach to the network interface. All security groups in the launch template must be from your AWS PCS cluster VPC. At minimum, choose:
   + A security group that allows communication with the AWS PCS cluster
   + A security group that allows communication between EC2 instances launched by AWS PCS
   + (Optional) A security group that allows inbound SSH access to interactive instances
   + (Optional) A security group that allows compute nodes to make outgoing connections to the Internet
   + (Optional) Security group(s) that allow access to networked resources such as shared file systems or a database server. 

1. Your new launch template ID will be accessible in the Amazon EC2 console under **Launch templates**. The launch template ID will have the form `lt-0123456789abcdef01`.

**Recommended next step**
+ Use the new launch template to create or update an AWS PCS compute node group.

------
#### [ AWS CLI ]

**To create a launch template**

Create your launch template with the command that follows.
+ Before running the command, make the following replacements:

  1. Replace *region-code* with the AWS Region where you are working with AWS PCS

  1. Replace *my-launch-template-name* with a name for your template. It must be unique to the AWS account and AWS Region you are using. 

  1. Replace *my-ssh-key-name* with name of your preferred SSH key.

  1. Replace *sg-ExampleID1* and *sg-ExampleID2* with security group IDs that allow communication between your EC2 instances and the scheduler and communication between EC2 instances. If you only have one security group that enables all this traffic, you can remove `sg-ExampleID2` and its preceding comma character. You can also add more security group IDs. All security groups you include in the launch template must be from your AWS PCS cluster VPC.

  ```
  aws ec2 create-launch-template --region region-code \
      --launch-template-name my-template-name \
      --launch-template-data '{"KeyName":"my-ssh-key-name","SecurityGroupIds": ["sg-ExampleID1","sg-ExampleID2"]}'
  ```

The AWS CLI will output text resembling the following. The launch template ID is found in `LaunchTemplateId`.

```
{
    "LaunchTemplate": {
        "LatestVersionNumber": 1,
        "LaunchTemplateId": "lt-0123456789abcdef01",
        "LaunchTemplateName": "my-launch-template-name",
        "DefaultVersionNumber": 1,
        "CreatedBy": "arn:aws:iam::123456789012:user/Bob",
        "CreateTime": "2019-04-30T18:16:06.000Z"
    }
}
```

**Recommended next step**
+ Use the new launch template to create or update an AWS PCS compute node group.

------

# Working with Amazon EC2 user data for AWS PCS
<a name="working-with_ec2-user-data"></a>

You can supply EC2 user data in your launch template that `cloud-init` runs when your instances launch. User data blocks with the content type `cloud-config` run before the instance registers with the AWS PCS API, while user data blocks with content type `text/x‑shellscript` run after registration completes, but before the Slurm daemon starts. For more information about content types, see the [cloud-init documentation](https://cloudinit.readthedocs.io/en/latest/explanation/format.html). 

our user data can perform common configuration scenarios, including but not limited to the following:
+  [ Including users or groups ](https://cloudinit.readthedocs.io/en/latest/topics/examples.html#including-users-and-groups) 
+  [ Installing packages ](https://cloudinit.readthedocs.io/en/latest/topics/examples.html#install-arbitrary-packages) 
+  [ Creating partitions and file systems ](https://cloudinit.readthedocs.io/en/latest/topics/examples.html#create-partitions-and-filesystems) 
+  Mounting network file systems 

 User data in launch templates must be in the [MIME multi-part archive](https://cloudinit.readthedocs.io/en/latest/topics/format.html#mime-multi-part-archive) format. This is because your user data is merged with other AWS PCS user data that is required to configure nodes in your node group. You can combine multiple user data blocks together into a single MIME multi-part file. 

 A MIME multi-part file consists of the following components: 
+  The content type and part boundary declaration: `Content-Type: multipart/mixed; boundary="==BOUNDARY=="` 
+  The MIME version declaration: `MIME-Version: 1.0` 
+  One or more user data blocks that contain the following components: 
  +  The opening boundary that signals the beginning of a user data block: `--==BOUNDARY==`. You must keep the line before this boundary blank. 
  +  The content type declaration for the block: `Content-Type: text/cloud-config; charset="us-ascii"` or `Content-Type: text/x-shellscript; charset="us-ascii"`. You must keep the line after the content type declaration blank. 
  +  The content of the user data, such as a list of shell commands or `cloud-config` directives. 
+  The closing boundary that signals the end of the MIME multi-part file: `--==BOUNDARY==--`. You must keep the line before the closing boundary blank. 

**Note**  
 If you add user data to a launch template in the Amazon EC2 console, you can paste it in as plain text. Or, you can upload it from a file. If you use the AWS CLI or an AWS SDK, you must first base64 encode the user data and submit that string as the value of the `UserData` parameter when you call [CreateLaunchTemplate](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateLaunchTemplate.html), as shown in this JSON file. 

```
{
    "LaunchTemplateName": "base64-user-data",
    "LaunchTemplateData": {
        "UserData": "ewogICAgIkxhdW5jaFRlbXBsYXRlTmFtZSI6ICJpbmNyZWFzZS1jb250YWluZXItdm9sdW..."
    }
}
```

**Examples**
+ [Example: Install software from a package repository](working-with_ec2-user-data_repo.md)
+ [Example: Run scripts from an S3 bucket](working-with_ec2-user-data_s3.md)
+ [Example: Set global environment variables](working-with_ec2-user-data_env.md)
+ [Using network file systems with AWS PCS](working-with_file-systems.md)
+ [Example: Use an EFS file system as a shared home directory](working-with_ec2-user-data_efs.md)

# Example: Install software for AWS PCS from a package repository
<a name="working-with_ec2-user-data_repo"></a>

 Provide this script as the value of `"userData"` in your launch template. For more information, see [Working with Amazon EC2 user data for AWS PCS](working-with_ec2-user-data.md). 

This script uses **cloud-config** to install software packages on node group instances at launch. For more information, see the [User data formats](https://cloudinit.readthedocs.io/en/latest/explanation/format.html) in the *cloud-init documentation*. This example installs `curl` and `llvm`.

**Note**  
Your instances must be able to connect to their configured package repositories.

```
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"

packages:
- python3-devel
- rust
- golang

--==MYBOUNDARY==--
```

# Example: Run additional scripts for AWS PCS from an S3 bucket
<a name="working-with_ec2-user-data_s3"></a>

 Provide this script as the value of `"userData"` in your launch template. For more information, see [Working with Amazon EC2 user data for AWS PCS](working-with_ec2-user-data.md). 

The following user data script uses **cloud-config** to import a script from an S3 bucket and run it on node group instances at launch. For more information, see the [User data formats](https://cloudinit.readthedocs.io/en/latest/explanation/format.html) in the *cloud-init documentation*.

Replace the following values with your own details:
+ *amzn-s3-demo-bucket* – The name of an S3 bucket your account can read from.
+ *object-key* – The S3 object key of the script to import. This includes the name of the script and its location in the folder structure of the bucket. For example, `scripts/script.sh`. For more information, see [Organizing objects in the Amazon S3 console by using folders](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) in the *Amazon Simple Storage Service User Guide*.
+ *shell* – The Linux shell to use to run the script, such as `bash`.

```
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"

runcmd:
- aws s3 cp s3://amzn-s3-demo-bucket/object-key /tmp/script.sh
- /usr/bin/shell /tmp/script.sh

--==MYBOUNDARY==--
```

The IAM instance profile for the node group must have access to the bucket. The following IAM policy is an example for the bucket in the user data script above.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket",
                "arn:aws:s3:::amzn-s3-demo-bucket/*"
            ]
        }
    ]
}
```

------

# Example: Set global environment variables for AWS PCS
<a name="working-with_ec2-user-data_env"></a>

 Provide this script as the value of `"userData"` in your launch template. For more information, see [Working with Amazon EC2 user data for AWS PCS](working-with_ec2-user-data.md). 

The following example uses `/etc/profile.d` to set global variables on node group instances.

```
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
touch /etc/profile.d/awspcs-userdata-vars.sh
echo MY_GLOBAL_VAR1=100 >> /etc/profile.d/awspcs-userdata-vars.sh
echo MY_GLOBAL_VAR2=abc >> /etc/profile.d/awspcs-userdata-vars.sh

--==MYBOUNDARY==--
```

# Example: Use an EFS file system as a shared home directory for AWS PCS
<a name="working-with_ec2-user-data_efs"></a>

 Provide this script as the value of `"userData"` in your launch template. For more information, see [Working with Amazon EC2 user data for AWS PCS](working-with_ec2-user-data.md). 

This example extends the example EFS mount in [Using network file systems with AWS PCS](working-with_file-systems.md) to implement a shared home directory. The contents of /home are backed up before the EFS file system is mounted. The contents are then quickly copied into place on the shared storage after the mount completes.

Replace the following values in this script with your own details:
+ */mount-point-directory* – The path on an instance where you want to mount the EFS file system.
+ *filesystem-id* – The file system ID for the EFS file system.

```
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"

packages:
  - amazon-efs-utils

runcmd:
  - mkdir -p /tmp/home
  - rsync -a /home/ /tmp/home
  - echo "filesystem-id:/ /mount-point-directory efs tls,_netdev" >> /etc/fstab
  - mount -a -t efs defaults
  - rsync -a --ignore-existing /tmp/home/ /home
  - rm -rf /tmp/home/

--==MYBOUNDARY==--
```

# Example: Enabling passwordless SSH
<a name="working-with_ec2-user-data_efs_ssh"></a>

You can build on the shared home directory example to implement SSH connections between cluster instances using SSH keys. For each user using the shared home file system, run a script that resembles the following: 

```
#!/bin/bash

mkdir -p $HOME/.ssh && chmod 700 $HOME/.ssh
touch $HOME/.ssh/authorized_keys
chmod 600 $HOME/.ssh/authorized_keys

if [ ! -f "$HOME/.ssh/id_rsa" ]; then
    ssh-keygen -t rsa -b 4096 -f $HOME/.ssh/id_rsa -N ""
    cat ~/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
fi
```

**Note**  
The instances must use a security group that allows SSH connections between cluster nodes.

# Capacity Reservations in AWS PCS
<a name="working-with_capacity-reservations"></a>

 You can reserve Amazon EC2 capacity in a specific Availability Zone and for a specific duration using On-Demand Capacity Reservations or Amazon EC2 Capacity Blocks for ML to make sure that you have the necessary compute capacity available when you need it. 

 **On-Demand Capacity Reservations (ODCRs)** let you reserve compute capacity for your Amazon EC2 instances in a specific Availability Zone for any duration. You can create and cancel reservations at any time, with no long-term commitments or upfront payments. ODCRs are ideal when you need flexible capacity reservations that you can modify as your requirements change. For more information, see [On-Demand Capacity Reservations](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) in the *Amazon Elastic Compute Cloud User Guide*. 

 **Amazon EC2 Capacity Blocks for ML** let you reserve GPU-based accelerated computing instances for future use, up to 8 weeks in advance. You can reserve blocks of 1-64 instances for durations from 1 day to 6 months. Capacity Blocks are ideal for machine learning workloads that require guaranteed access to GPU capacity at specific times. For more information, see [Capacity Blocks for ML](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html) in the *Amazon Elastic Compute Cloud User Guide*. 

**Topics**
+ [Using ODCRs with AWS PCS](capacity-reservations-odcr.md)
+ [Using Amazon EC2 Capacity Blocks for ML with AWS PCS](capacity-blocks.md)

# Using ODCRs with AWS PCS
<a name="capacity-reservations-odcr"></a>

 You can choose how AWS PCS consumes your reserved instances. If you create an **open** ODCR, any matching instances launched by AWS PCS or other processes in your account count against the reservation. With a **targeted** ODCR, only instances launched with the specific reservation ID count against the reservation. For time-sensitive workloads, targeted ODCRs are more common. 

 You can configure an AWS PCS compute node group to use a targeted ODCR by adding it to a launch template. Here are the steps to do so: 

1.  Create a targeted On-Demand Capacity Reservation (ODCR) using the [ Amazon EC2 Create a Capacity Reservation User Guide ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-reservations-create.html). 

1.  Associate the ODCR with a launch template. There are two ways to do that: 

   1.  **Direct ODCR association:** Reference the ODCR ID directly in the launch template. This approach provides strict capacity control and does not support instance backfilling (If the compute node group requests more instances than available in the ODCR, no additional instances will be launched). 

   1.  **Capacity Reservation group association:** Add the ODCR to a Capacity Reservation group and reference the group in the launch template. This approach supports instance backfilling, allowing AWS PCS to launch additional On-Demand instances if the reservation capacity is exceeded. 

1.  Create or update an AWS PCS compute node group to use the launch template. For more information, see [AWS PCS Compute Node Groups User Guide](https://docs.aws.amazon.com/pcs/latest/userguide/working-with_cng.html). 

   1. Set the `purchaseOption` of the compute node group to `ONDEMAND`.

## Example: Reserve and use hpc6a.48xlarge instances with a targeted ODCR
<a name="capacity-reservations-odcr-example"></a>

 This example command creates a targeted ODCR for 32 hpc6a.48xlarge instances. To launch the reserved instances in a placement group, add `--placement-group-arn` to the command. You can define a stop date with `--end-date` and `--end-date-type`, otherwise the reservation will continue until it is manually terminated. 

```
aws ec2 create-capacity-reservation \
    --instance-type hpc6a.48xlarge \
    --instance-platform Linux/UNIX \
    --availability-zone us-east-2a \
    --instance-count 32 \
    --instance-match-criteria targeted
```

 The result from this command will be an ARN for the new ODCR. The ODCR ID can be retrieved from the ARN `"arn:aws:ec2:us-east-2:123456789012:capacity-reservation/ODCR-ID"` or by using the [ Amazon EC2 DescribeCapacityReservations](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeCapacityReservations.html). 

 **Direct ODCR association:** Add the ODCR ID to the launch template. Here is an example launch template that references the ODCR ID. 

```
{
  "CapacityReservationSpecification": {
    "CapacityReservationTarget": {
      "CapacityReservationId": "cr-1234567890abcdef1"
    }
  }
}
```

 **Capacity Reservation group association:** Create a Capacity Reservation group and add the group to the launch template. The following command creates a Capacity Reservation group named `EXAMPLE-CR-GROUP`. 

```
aws resource-groups create-group \
    --name EXAMPLE-CR-GROUP \
    --configuration \
        '{"Type": "AWS::EC2::CapacityReservationPool"}' \
        '{"Type": "AWS::ResourceGroups::Generic", "Parameters": [{"Name": "allowed-resource-types", "Values": ["AWS::EC2::CapacityReservation"]}]}'
```

 The following command adds the ODCR to the Capacity Reservation group. 

```
aws resource-groups group-resources --group EXAMPLE-CR-GROUP \
    --resource-arns arn:aws:ec2:us-east-2:123456789012:capacity-reservation/cr-1234567890abcdef1
```

 With the ODCR created and added to a Capacity Reservation group, it can now be connected to an AWS PCS compute node group by adding it to a launch template. Here is an example launch template that references the Capacity Reservation group. 

```
{
  "CapacityReservationSpecification": {
    "CapacityReservationResourceGroupArn": "arn:aws:resource-groups:us-east-2:123456789012:group/EXAMPLE-CR-GROUP"
  }
}
```

 Finally, create or update an AWS PCS compute node group to use hpc6a.48xlarge instances and use the launch template that references the ODCR. For a static node group, set minimum and maximum instances to the size of the reservation (32). For a dynamic node group, set the minimum instances to 0 and the maximum to your desired instance size. 

 This example is a simple implementation of a single ODCR that is provisioned for one compute node group. But, AWS PCS supports many other designs. For example, you can subdivide a large ODCR or Capacity Reservation group among multiple compute node groups. Or, you can use ODCRs that another AWS account has created and shared with yours. 

 For more information, see [On-Demand Capacity Reservations and Capacity Blocks for ML](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-reservation-overview.html) in the *Amazon Elastic Compute Cloud User Guide*. 

# Using Amazon EC2 Capacity Blocks for ML with AWS PCS
<a name="capacity-blocks"></a>

Amazon EC2 Capacity Blocks for ML is an Amazon EC2 purchasing option that enables you to pay in advance to reserve GPU-based accelerated computing instances within a specific date and time range to support short duration workloads. Instances that run inside a Capacity Block are automatically placed close together inside Amazon EC2 UltraClusters, for low-latency, petabit-scale, non-blocking networking. For more information, see [Capacity Blocks for ML](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html) in the *Amazon Elastic Compute Cloud User Guide*.

You can use a launch template to have AWS PCS use a Capacity Block when it launches instances for a compute node group.

**Note**  
AWS PCS introduced support for Capacity Blocks since Slurm version 24.05.

## Limitations
<a name="capacity-blocks-limitations"></a>
+ AWS PCS only supports Capacity Blocks with P6-B300, P6-B200, P5en, P5e, P5, and P4d instance families.
+ You can only associate a compute node group with 1 Capacity Block at a time.
+ You can't associate a compute node group with a capacity reservation group that combines multiple Capacity Blocks.
+ Capacity Blocks must be in a `scheduled` or `active` state to use with AWS PCS. You can't use Capacity Blocks in other states, such as `payment-failed`. For more information, see [View Capacity Blocks](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-blocks-view.html) in the *Amazon Elastic Compute Cloud User Guide*.
+ For P6 and P5 instance types please refer to the relevant AWS documentation: [Software Requirements for P6 Instances](https://docs.aws.amazon.com/dlami/latest/devguide/p6-support-dlami.html#dlami-support-p6), [Maximize network bandwidth on Amazon EC2 instances with multiple network cards](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-acc-inst-types.html)

## Capacity Block expiration
<a name="capacity-blocks-expiration"></a>

Capacity Blocks are limited to a specific date and time range. When a Capacity Block expires:
+ The compute node group associated with that Capacity Block continues to exist and remains associated with the same queues.
+ All instances in the compute node group are terminated and active jobs might fail, based on your Slurm settings.
+ AWS PCS can't launch new instances in the compute node group.
+ All queued or newly submitted jobs remain in pending state until another compute node group is attached to the queue or you update the compute node group to use a new launch template that specifies a new Capacity Block.

# Configure an AWS PCS compute node group to use a Capacity Block
<a name="capacity-blocks-configure-cng"></a>

**To associate a Capacity Block with a compute node group**

1. Create an Amazon EC2 launch template for AWS PCS that specifies your Capacity Block. For more information about creating a launch template for AWS PCS, see [Using Amazon EC2 launch templates with AWS PCS](working-with_launch-templates.md).

   Your launch template must include:
   + The value `MarketType` of `InstanceMarketOptions` must be set to `capacity-block`.
   + A `CapacityReservationSpecification` with a valid `CapacityReservationId`
   + A valid `InstanceType` that matches the instance type of the Capacity Block you purchased.

1. Create a compute node group that uses the launch template. For more information, see [Creating a compute node group in AWS PCS](working-with_cng_create.md). You can also update an existing compute node group to use the launch template. For more information, see [Updating an AWS PCS compute node group](working-with_cng_update.md).

   When you create or update the compute node group:
   + The IAM identity you use to create or update the compute node group must have the following permission:

     ```
     ec2:DescribeCapacityReservations
     ```

     For more information, see [Minimum permissions for AWS PCS](security-min-permissions.md).
   + The Capacity Block must be in a `scheduled` or `active` state.
   + Set the `purchaseOption` of the compute node group to `CAPACITY_BLOCK`.
   + The `maxInstanceCount` of the compute node group must not exceed the size of the Capacity Block.
   + The availability zone of the compute node group must match 1 of the compute node group's subnet availability zones.

**Important**  
You can't change the instance type of a compute node group when you update it. You can only use a Capacity Block with the same instance type as the compute node group. If you want to use a Capacity Block with a different instance type, you must create a new compute node group.

# Frequently asked questions about using Capacity Blocks with AWS PCS
<a name="capacity-blocks-faq"></a>

**I just paid for a Capacity Block and immediately attempted to use it with AWS PCS but compute node group creation failed. What happened?**  
Your Capacity Block might not be in a `scheduled` or `active` state. Try again after the Capacity Block is `scheduled` or `active`.

**I am using a Capacity Block in AWS PCS and I purchased an extension before it expired. How do I continue using it in AWS PCS?**  
You don't have to do anything to continue using the Capacity Block in AWS PCS. The end date of your Capacity Block updates after your extension payment succeeds. As long as your Capacity Block doesn't expire, the compute node group continues to operate. If your extension payment fails, your Capacity Block remains `active` and the compute node group operates until the Capacity Block expires on its original end date.

**What happens to my queued and running jobs if my Capacity Block expires?**  
Queued jobs that didn't start before the Capacity Block expired remain pending until you attach another compute node group to the queue or you update the compute node group with a new Capacity Block. You can still submit jobs to the queue. Your Slurm settings affect active jobs. By default, active jobs are automatically re-queued, but might have errors or fail.

**My Capacity Block expired. Should I do something?**  
You don't have to do anything. You can check the Amazon EC2 console for the status of your EC2 capacity reservations. When a Capacity Block expires, the compute node group associated with that Capacity Block continues to exist and handle the same queues. The compute node group doesn't have any instances to run jobs. You can delete the compute node group or disassociate it from the queues to prevent users from submitting jobs that won't run.

**I want to use a new Capacity Block with my AWS PCS compute node group. What should I do?**  
We recommend you create a new compute node group to use the new Capacity Block. For more information, see [Configure an AWS PCS compute node group to use a Capacity Block](capacity-blocks-configure-cng.md).

**How can I share 1 Capacity Block across clusters and services?**  
You can split a Capacity Block across multiple clusters and services. For example, to split a Capacity Block with 64 `p5.48xlarge` instances with 20 nodes on PCS-Cluster-1, 16 nodes on PCS-Cluster-2, and the remaining nodes for other services, set both `minInstanceCount` and `maxInstanceCount` to 20 for PCS-Cluster-1 and 16 for PCS-Cluster-2.

**Can I use more than 1 Capacity Block or combined capacity with 1 compute node group?**  
No. Only 1 Capacity Block can be associated with a single compute node group. AWS PCS doesn't support capacity reservation groups that combine multiple Capacity Blocks.

**How do I know when my Capacity Blocks start or expire?**  
Independent from AWS PCS, Amazon EC2 sends a `Capacity Block Reservation Delivered` event through EventBridge when a Capacity Block reservation starts and a `Capacity Block Reservation Expiration Warning` event 40 minutes before the Capacity Block reservation expires. For more information, see [Monitor Capacity Blocks using EventBridge](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-blocks-monitor.html) in the *Amazon Elastic Compute Cloud User Guide*.

**How does Slurm track the state of my Capacity Block?**  
You can run `sinfo` to understand how AWS PCS uses the Capacity Block. In the following example output, a queue is associated with a compute node group that runs 4 instances from an `active` Capacity Block. The nodes are in the `idle` Slurm state (available for use and not yet allocated to any jobs).  

```
$ sinfo  
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST  
fanout up infinite 4 idle node-fanout-[1-4]
```
If the nodes are instead in `maint` state, you can run `scontrol show res` to see details about the Slurm reservation that controls this state. In the following example output, the Capacity Block is `scheduled` with a future start date.  

```
$ scontrol show res                                                                                                  
ReservationName=node-fanout-scheduled StartTime=2025-10-14T13:09:17 EndTime=2025-10-14T13:11:17 Duration=00:02:00    
   Nodes=node-fanout-[1-4] NodeCnt=4 CoreCnt=16 Features=(null) PartitionName=(null) Flags=MAINT,SPEC_NODES          
   TRES=cpu=16                                                                                                       
   Users=root Groups=(null) Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null)                          
   MaxStartDelay=(null)                                                                                              
   Comment=node-fanout Scheduled
```

**How can I tell if the errors I'm getting while launching capacity are because my Capacity Block is shared?**  
Check **Capacity Reservations** in the Amazon EC2 console to find how many instances from the Capacity Block are actively provisioned. Check the tags of each instance to find which service or cluster uses it. For example, all instances for AWS PCS have AWS PCS tags such as `aws:pcs:cluster-id = pcs_l0mizqyk5o | aws:pcs:compute-node-group-id = pcs_ic7onkmfqk` that indicate which clusters and compute node groups the instance belongs to. You can then check if the Capacity Block is at maximum capacity.  
You use `scontrol show nodes` to check if a Capacity Block node in an AWS PCS cluster is triggering `ReservationCapacityExceeded`:  

```
[root@ip-172-16-10-54 ~]# scontrol show nodes test-node-8-gamma-cb-2  
NodeName=test-8-gamma-cb-2 CoresPerSocket=1  
   CPUAlloc=0 CPUEfctv=8 CPUTot=8 CPULoad=0.00  
   AvailableFeatures=test-8-gamma-cb,gpu  
   ActiveFeatures=test-8-gamma-cb,gpu  
   Gres=gpu:H100:1  
   NodeAddr=test-8-gamma-cb-2 NodeHostName=test-8-gamma-cb-2  
   RealMemory=249036 AllocMem=0 FreeMem=N/A Sockets=8 Boards=1  
   State=IDLE+CLOUD+POWERING_DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A  
   Partitions=my-q  
   BootTime=None SlurmdStartTime=None  
   LastBusyTime=Unknown ResumeAfterTime=None  
   CfgTRES=cpu=8,mem=249036M,billing=8  
   AllocTRES=  
   CurrentWatts=0 AveWatts=0  
   Reason=Failed to launch backing instance (Error Code: ReservationCapacityExceeded) [root@2025-08-28T15:15:33]
```

**When multiple compute node groups are attached to the same queue, how can I force a job to run on Capacity Block-backed instances?**  
You can use Slurm features and constraints to lock a job to a certain set of nodes. We recommend that you don't set Slurm weights for each compute node group because that only works with nodes that aren't in the `maint` state.

# Useful launch template parameters
<a name="working-with_launch-templates_parameters"></a>

This section describes some launch template parameters that may be broadly useful with AWS PCS.

## Turn on detailed CloudWatch monitoring
<a name="working-with_launch-templates_parameters_cw"></a>

You can enable collection of CloudWatch metrics at a shorter interval using a launch template parameter.

------
#### [ AWS Management Console ]

On the console pages for creating or editing launch templates, this option is found under the **Advanced details** section. Set **Detailed CloudWatch monitoring** to *Enable.*

------
#### [ YAML ]

```
Monitoring:
    Enabled: True
```

------
#### [ JSON ]

```
{"Monitoring": {"Enabled": "True"}}
```

------

For more information, see [Enable or turn off detailed monitoring for your instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html) in the *Amazon Elastic Compute Cloud User Guide for Linux Instances*.

## Instance Metadata Service Version 2 (IMDS v2)
<a name="working-with_launch-templates_parameters_imds2"></a>

Using IMDS v2 with EC2 instances offers significant security enhancements and helps mitigate potential risks associated with accessing instance metadata in AWS environments. 

------
#### [ AWS Management Console ]

On the console pages for creating or editing launch templates, this option is found under the **Advanced details** section. Set **Metadata accessible** to *Enabled*, **Metadata version** to *V2 only (token required)*, and **Metadata response hop limit** to *4*.

------
#### [ YAML ]

```
MetadataOptions:
  HttpEndpoint: enabled
  HttpTokens: required
  HttpPutResponseHopLimit: 4
```

------
#### [ JSON ]

```
{
    "MetadataOptions": {
        "HttpEndpoint": "enabled",
        "HttpPutResponseHopLimit": 4,
        "HttpTokens": "required"
    }
}
```

------

.