

# Slurm REST API in AWS PCS
<a name="slurm-rest-api"></a>

AWS PCS provides managed support for Slurm's native REST API through `slurmrestd`, delivering an HTTP interface for programmatic cluster interaction. You can submit jobs, monitor cluster status, and manage resources through standard HTTP requests without requiring direct shell access to your cluster.

## Common use cases
<a name="slurm-rest-api-use-cases"></a>

The Slurm REST API supports various integration scenarios:
+ **Web Application Integration**: Build custom frontends and web applications that submit and manage jobs directly.
+ **Jupyter Notebook Integration**: Allows researchers to submit jobs from notebook environments without leaving their development workflow.
+ **Partner Solution Integration**: Connect third-party HPC tools and workflow managers to your AWS PCS clusters.
+ **Programmatic Cluster Management**: Automate job submission, monitoring, and resource management workflows.
+ **Research Computing Workflows**: Support academic and enterprise research enviornments that require API-driven job management.

## Requirements and limitations
<a name="slurm-rest-api-requirements"></a>

Before using the Slurm REST API, review these details:
+ Your cluster must use Slurm version 25.05 or higher.
+ The API endpoint will only be accessible via private IP address within your cluster's VPC.
+ Your cluster security group must allow HTTP traffic on port 6820.
+ Authentication requires JWT tokens with specific user identity claims.

Current limitations include:
+ Tokens generated by `scontrol token` are not supported.
+ `X-SLURM-USER-NAME` header impersonation is not available.
+ Some functionality requires Slurm accounting to be enabled.
+ Not compatible with the Slurm CLI filter plugin mechanism.
+ Connections to the REST API endpoint are not encrypted with TLS.

**Topics**
+ [Common use cases](#slurm-rest-api-use-cases)
+ [Requirements and limitations](#slurm-rest-api-requirements)
+ [Enabling Slurm REST API in AWS PCS](slurm-rest-api-enable.md)
+ [Authenticating with Slurm REST API in AWS PCS](slurm-rest-api-authenticate.md)
+ [Using Slurm REST API for job management in AWS PCS](slurm-rest-api-use.md)
+ [Slurm REST API frequently asked questions in AWS PCS](slurm-rest-api-faq.md)

# Enabling Slurm REST API in AWS PCS
<a name="slurm-rest-api-enable"></a>

Enable the Slurm REST API to access your cluster's HTTP interface for programmatic job management and monitoring. You can enable this feature during cluster creation or update an existing cluster that meets the requirements.

## Prerequisites
<a name="slurm-rest-api-enable-prerequisites"></a>

Before enabling the Slurm REST API, ensure you have:
+ **Cluster version**: Slurm version 25.05 or higher.
+ **Security group**: Rules allowing HTTP traffic on port 6820 from your desired sources.

## Procedure
<a name="slurm-rest-api-enable-procedure"></a>

**To enable Slurm REST API on a new cluster**

------
#### [ AWS Management Console ]

1. Open the AWS PCS console at [https://console.aws.amazon.com/pcs/](https://console.aws.amazon.com/pcs/).

1. Choose **Create cluster**.

1. Under **Cluster details**, choose Slurm version 25.05 or higher.

1. Configure the other cluster settings as needed.

1. In the **Scheduler configuration** section, set **REST API** to **Enabled**.

1. Configure your cluster security group to allow HTTP traffic on port 6820 from your desired sources.

1. Complete the cluster creation process.

------
#### [ AWS CLI ]

1. Add a Slurm REST configuration when creating your cluster.

   ```
   aws pcs create-cluster --region region \
       --cluster-name my-cluster \
       --scheduler type=SLURM, version=25.05 \
       --size SMALL \
       --networking subnetIds=subnet-ExampleId1,securityGroupIds=sg-ExampleId1 \
       --slurm-configuration slurmRest='{mode=STANDARD}'
   ```

1. Configure your cluster security group to allow HTTP traffic on port 6820 from your desired sources.

------

**To enable Slurm REST API on an existing cluster**

------
#### [ AWS Management Console ]

1. Open the AWS PCS console at [https://console.aws.amazon.com/pcs/](https://console.aws.amazon.com/pcs/).

1. Choose your cluster from the list.

1. Verify that your cluster uses Slurm version 25.05 or higher in the cluster details.

1. Choose **Edit cluster**.

1. In the **Scheduler configuration** section, set **REST API** to **Enabled**.

1. Choose **Update cluster** to apply the changes.

1. Configure your cluster security group to allow HTTP traffic on port 6820 from your desired sources.

------
#### [ AWS CLI ]

1. Update your cluster with a Slurm REST configuration, as in this example.

   ```
   aws pcs update-cluster --cluster-identifier my-cluster \
       --slurm-configuration 'slurmRest={mode=STANDARD}'
   ```

1. Configure your cluster security group to allow HTTP traffic on port 6820 from your desired sources.

------

## What happens after enabling
<a name="slurm-rest-api-enable-results"></a>

When you enable the REST API, AWS PCS automatically:
+ Generates a JWT signing key and stores it in AWS Secrets Manager.
+ Exposes the API endpoint at `https://<clusterPrivateIpAddress>:6820` within your VPC.
+ Updates your cluster configuration to show the REST API endpoint details.

You can now authenticate and use the REST API for job management and cluster operations.

# Authenticating with Slurm REST API in AWS PCS
<a name="slurm-rest-api-authenticate"></a>

The Slurm REST API in AWS PCS uses JSON Web Token (JWT) authentication to ensure secure access to your cluster resources. AWS PCS provides a managed signing key stored in AWS Secrets Manager, which you use to generate JWT tokens containing specific user identity claims.

## Prerequisites
<a name="slurm-rest-api-authenticate-prerequisites"></a>

Before authenticating with the Slurm REST API, ensure you have:
+ **Cluster configuration**: AWS PCS cluster with Slurm 25.05\$1 and REST API enabled.
+ **AWS permissions**: Access to AWS Secrets Manager for the JWT signing key.
+ **User information**: Username, POSIX user ID, and one or more POSIX group IDs for your cluster account.
+ **Network access**: Connectivity within your cluster's VPC with security group allowing port 6820.

## Procedure
<a name="slurm-rest-api-authenticate-procedure"></a>

**To retrieve the Slurm REST API endpoint address**

------
#### [ AWS Management Console ]

1. Open the AWS PCS console at [https://console.aws.amazon.com/pcs/](https://console.aws.amazon.com/pcs/).

1. Choose your cluster from the list.

1. In the cluster configuration details, locate the **Endpoints** section.

1. Note the private IP address and port for **Slurm REST API (slurmrestd)**.

1. You can make API calls by sending properly formatted HTTP requests to this address.

------
#### [ AWS CLI ]

1. Query your cluster status with `aws pcs get-cluster`. Look for the `SLURMRESTD` endpoint in the `endpoints` field in the response. Here is an example:

   ```
   "endpoints": [
         {
             "type": "SLURMCTLD",
             "privateIpAddress": "192.0.2.1",
             "port": "6817"
         },
         {
             "type": "SLURMRESTD",
             "privateIpAddress": "192.0.2.1",
             "port": "6820"
         }
     ]
   ```

1. You can make API calls by sending properly formatted HTTP requests to `http://<privateIpAddress>:<port>/`

------

**To retrieve the JWT signing key**

1. Open the AWS PCS console at [https://console.aws.amazon.com/pcs/](https://console.aws.amazon.com/pcs/).

1. Choose your cluster from the list.

1. In the cluster configuration details, locate the **Scheduler Authentication** section.

1. Note the **JSON Web Token (JWT) key** ARN and version.

1. Use the AWS CLI to retrieve the signing key from Secrets Manager:

   ```
   aws secretsmanager get-secret-value --secret-id arn:aws:secretsmanager:region:account:secret:name --version-id version
   ```

**To generate a JWT token**

1. Create a JWT with the following required claims:
   + `exp` – Expiration time in seconds since 1970 for the JWT
   + `iat` – Current time in seconds since 1970
   + `sun` – The username for authentication
   + `uid` – The POSIX user ID
   + `gid` – The POSIX group ID
   + `id` – Additional POSIX identity properties
     + `gecos` – User comment field, often used to store a human-readable name
     + `dir` – User's home directory
     + `shell` – User's default shell
     + `gids` – List of additional POSIX group IDs the user is in

1. Sign the JWT using the signing key retrieved from Secrets Manager.

1. Set an appropriate expiration time for the token.

**Note**  
As an alternative to the `sun` claim, you can provide any of the following:  
`username`
A custom field name that you define via the `userclaimfield` in the `AuthAltParameters Slurm custom settings`
A `name` field within the `id` claim

**To authenticate API requests**

1. Include the JWT token in your HTTP requests using one of these methods:
   + **Bearer token** – Add `Authorization: Bearer <jwt>` header
   + **Slurm header** – Add `X-SLURM-USER-TOKEN: <jwt>` header

1. Make HTTP requests to the REST API endpoint:

   Here is an example of accessing the `/ping` API using curl and the `Authorized: Bearer` header.

   ```
   curl -X GET -H "Authorization: Bearer <jwt>" \
         http://<privateIpAddress>:6820/slurm/v0.0.43/ping
   ```

## Example JWT generation
<a name="slurm-rest-api-authenticate-example"></a>

Fetch the AWS PCS cluster JWT signing key and store it as a local file. Replace values for **aws-region**, **secret-arn**, and **secret version** with values appropriate for your cluster.

```
#!/bin/bash
SECRET_KEY=$(aws secretsmanager get-secret-value \
  --region aws-region \
  --secret-id secret-arn \
  --version-stage secret-version \
  --query 'SecretString' \
  --output text)
echo "$SECRET_KEY" | base64 --decode > jwt.key
```

This Python example illustrates how to use the signing key to generate a JWT token:

```
#!/usr/bin/env python3

import sys
import os
import pprint
import json
import time
from datetime import datetime, timedelta, timezone
from jwt import JWT
from jwt.jwa import HS256
from jwt.jwk import jwk_from_dict
from jwt.utils import b64decode,b64encode
if len(sys.argv) != 3:
    sys.exit("Usage: gen_jwt.py [jwt_key_file] [expiration_time_seconds]")
SIGNING_KEY = sys.argv[1]
EXPIRATION_TIME = int(sys.argv[2])
with open(SIGNING_KEY, "rb") as f:
    priv_key = f.read()
signing_key = jwk_from_dict({
    'kty': 'oct',
    'k': b64encode(priv_key)
})
message = {
    "exp": int(time.time() + EXPIRATION_TIME),
    "iat": int(time.time()),
    "sun": "ec2-user",
    "uid": 1000,
    "gid": 1000,
    "id": {
        "gecos": "EC2 User",
        "dir": "/home/ec2-user",
        "gids": [1000],
        "shell": "/bin/bash"
    }
}
a = JWT()
compact_jws = a.encode(message, signing_key, alg='HS256')
print(compact_jws)
```

The script will print a JWT to the screen.

```
abcdefgtjwttoken...
```

# Using Slurm REST API for job management in AWS PCS
<a name="slurm-rest-api-use"></a>

## Slurm REST API overview
<a name="slurm-rest-api-use-overview"></a>

The Slurm REST API provides programmatic access to cluster management functions through HTTP requests. Understanding these key characteristics will help you effectively use the API with AWS PCS:
+ **Access Protocol**: The API uses HTTP (not HTTPS) for communication within your cluster's private network.
+ **Connection Details**: Access the API using your cluster's private IP address and the `slurmrestd` port (typically 6820). The full base URL format is `http://<privateIpAddress>:6820`.
+ **API Versioning**: The API version corresponds to your Slurm installation. For Slurm 25.05, use version **v0.0.43**. The version number changes with each Slurm release. You can find the currently supported API versions in the [Slurm release notes](https://slurm.schedmd.com/release_notes.html).
+ **URL Structure**: The URL structure for the Slurm REST API is `http://<privateIpAddress>:<port>/<api-version>/<endpoint>`. Detailed usage information for REST API endpoints can be found in the [Slurm documentation](https://slurm.schedmd.com/rest_api.html).

For specific information on working with the Slurm REST API, see the REST client [Slurm documentation](https://slurm.schedmd.com/rest_clients.html).

## Prerequisites
<a name="slurm-rest-api-use-prerequisites"></a>

Before using the Slurm REST API, ensure you have:
+ **Cluster configuration**: AWS PCS cluster with Slurm 25.05\$1 and REST API enabled.
+ **Authentication**: Valid JWT token with proper user identity claims.
+ **Network access**: Connectivity within your cluster's VPC with a security group allowing port 6820.

## Procedure
<a name="slurm-rest-api-use-procedure"></a>

**To submit a job using the REST API**

1. Create a job submission request with the required parameters:

   ```
   {
     "job": {
       "name": "my-job",
       "partition": "compute",
       "nodes": 1,
       "tasks": 1,
       "script": "#!/bin/bash\necho 'Hello from Slurm REST API'",
       "environment": ["PATH=/usr/local/bin:/usr/bin:/bin"]
     }
   }
   ```

1. Submit the job using an HTTP POST request:

   ```
   curl -X POST \
     -H "Authorization: Bearer <jwt>" \
     -H "Content-Type: application/json" \
     -d '<job-json>' \
     https://<privateIpAddress>:6820/slurm/v0.0.43/job/submit
   ```

1. Note the job ID returned in the response for monitoring purposes.

**To monitor job status**

1. Get information about a specific job:

   ```
   curl -X GET -H "Authorization: Bearer <jwt>" \
       https://<privateIpAddress>:6820/slurm/v0.0.43/job/<job-id>
   ```

1. List all jobs for the authenticated user:

   ```
   curl -X GET -H "Authorization: Bearer <jwt>" \
       https://<privateIpAddress>:6820/slurm/v0.0.43/jobs
   ```

**To cancel a job**
+ Send a DELETE request to cancel a specific job:

  ```
  curl -X DELETE -H "Authorization: Bearer <jwt>" \
      https://<privateIpAddress>:6820/slurm/v0.0.43/job/<job-id>
  ```

# Slurm REST API frequently asked questions in AWS PCS
<a name="slurm-rest-api-faq"></a>

This section answers frequently asked questions about the Slurm REST API in AWS PCS.

**What is the Slurm REST API?**  
The Slurm REST API is an HTTP interface that allows you to interact with the Slurm workload manager programmatically. You can use standard HTTP methods like GET, POST, and DELETE to submit jobs, monitor cluster status, and manage resources without requiring command-line access to the cluster.

**Can I use tokens generated by `scontrol token`?**  
No, standard `scontrol token` output is not compatible with AWS PCS. The PCS Slurm REST API requires enriched JWT tokens containing specific identity claims that include username(`sun`), POSIX user ID(`uid`), and group IDs(`gids`). Standard Slurm tokens lack these required claims and will be rejected by the API.

**Can I access the API from outside my VPC?**  
No, the REST API endpoint is only accessible from within your VPC using the Slurm controller's private IP address. To enable external access, implement AWS services such as Application Load Balancer with VPC Link, API Gateway, or establish VPC peering or VPN connections for secure connectivity.

**Why does the API use HTTP instead of HTTPS?**  
The Slurm REST API is intended to be an internal endpoint within your cluster's private network. For production deployments requiring encryption, you can implement SSL/TLS termination at a higher level in your architecture, such as through an API gateway, load balancer, or reverse proxy.

**How do I control access to the REST API?**  
Configure your cluster's security group rules to restrict access to port 6820 on the Slurm controller. Set inbound rules to allow connections only from trusted IP ranges or specific sources within your VPC, blocking unauthorized access to the API endpoint.

**How do I rotate the JWT signing key?**  
Put your cluster in maintenance mode with no active instances, then initiate key rotation through AWS Secrets Manager. After rotation completes, re-enable the queues. All existing JWT tokens will become invalid and must be regenerated using the new signing key from Secrets Manager.

**Do I need Slurm accounting enabled to use the REST API?**  
No, Slurm accounting is not required for basic REST API operations like job submission and monitoring. However, the entire `/slurmdb` endpoint requires accounting to be active.

**What third-party tools work with the AWS PCS REST API?**  
Many existing Slurm REST API clients should work with AWS PCS, including Slurm Exporter for Prometheus, SlurmWeb, and custom applications that follow the standard Slurm REST API format. However, tools that rely on `scontrol token` for authentication will need modification to work with AWS PCS JWT requirements.

**Are there any additional costs for using the REST API?**  
No, there are no additional charges for enabling or using the Slurm REST API feature. You only pay for the underlying cluster resources as usual.

**How can I troubleshoot the REST API?**  
+ **Network connectivity issues**

  If you cannot reach the API endpoint, you'll see connection timeouts or "connection refused" errors when making HTTP requests to the cluster controller.

  **What to do**: Verify your client is in the same VPC or has proper network routing, and confirm your security group allows HTTP traffic on port 6820 from your source IP or subnet.
+ **Slurm REST authentication issues**

  If your JWT token is invalid, expired, or improperly signed, API requests will return "Protocol authentication error" in the errors field of the response.

  Example error message:

  ```
  {
  "errors": [
      {
      "description": "Batch job submission failed",
      "error_number": 1007,
      "error": "Protocol authentication error",
      "source": "slurm_submit_batch_job()"
      }
    ]
  }
  ```

  **What to do**: Check that your JWT token is properly formatted, not expired, and signed with the correct key from Secrets Manager. Verify that the token is properly formed and includes the required claims and that you're using the correct authentication header format.
+ **Job failing to run after submission**

  If your JWT token is valid but contains incorrect internal structure or content, jobs may have entered a paused (`PD`) state with reason code `JobAdminHead`. Use `scontrol show job <job-id>` to inspect the job – you'll see `JobState=PENDING, Reason=JobHeldAdmin`, and `SystemComment=slurm_cred_create failure, holding job`.

  **What to do**: The root cause may be mistaken values in JWT. Verify that the token is properly structured and includes the required claims as per the PCS documentation.
+ **Working directory permission issues**

  If the user identity specified in your JWT lacks write permissions to the job's working directory, the job will fail with permission errors, similar to using `sbatch --chdir` with an inaccessible directory.

  **What to do**: Ensure the user specified in your JWT token has appropriate permissions for the job's working directory.
+ **Still running into problems?**

  1. Check SchedMD's [documentation](https://slurm.schedmd.com/rest_clients.html) on the REST API specification.

  1. Check the Slurm controller logs for more detailed information on errors (see [Scheduler logs in AWS PCS](monitoring_scheduler-logs.md) for more details).