

# Deadline Cloud fleets
<a name="manage-fleets"></a>

This section explains how to manage service-managed fleets and customer-managed fleets (CMF) for Deadline Cloud.

You can set up two types of Deadline Cloud fleets:
+ Service-managed fleets are fleets of workers that have default settings provided by Deadline Cloud. These default settings are designed to be efficient and cost effective.
+ Customer-managed fleets (CMFs) provide you with full control over your processing pipeline. A CMF can reside within AWS infrastructure, on premises, or in a co-located data center. CMFs include provisioning, operations, management, and decommissioning workers in the fleet.

When you associate a fleet with multiple queues, it divides its workers evenly among those queues.

**Topics**
+ [

# Service-managed fleets
](smf-manage.md)
+ [

# Customer-managed fleets
](manage-cmf.md)
+ [

# Auto scaling configuration
](auto-scaling-configuration.md)

# Service-managed fleets
<a name="smf-manage"></a>

A service-managed fleet (SMF) is a fleet of workers that have default settings provided by Deadline Cloud. These default settings are designed to be efficient and cost-effective.

Some of the default settings limit the amount of time that workers and tasks can run. A worker can only run for seven days and a task can only run for five days. When the limit is reached, the task or worker stops. If this happens, you might lose work that worker or task was running. To avoid this, monitor your workers and tasks to ensure they don't exceed the maximum duration limits. To learn more about monitoring your workers, see [Using the Deadline Cloud monitor](working-with-deadline-monitor.md).

## Create a service-managed fleet
<a name="smf-create"></a>

There are 3 types of instance options you can choose for your service-managed fleet; spot, on-demand, and wait-and-save. Spot instances are unreserved capacity that you can use at a discounted price, but might be interrupted by on-demand requests. On-demand instances are priced by the second, have no long-term commitment, and will not be interrupted. Wait-and-save provides delayed job scheduling for reduced cost and can be interrupted by on-demand and spot requests.

1. From the [Deadline Cloud console](https://console.aws.amazon.com/deadlinecloud/home), navigate to the farm you want to create the fleet in.

1. Select the **Fleets** tab, and then choose **Create fleet**.

1. Enter a **Name** for your fleet.

1. (Optional) Enter a **Description**. A clear description can help you quickly identify your fleet's purpose.

1. Select **Service-managed** fleet type.

1. Choose the **Spot**, **On-demand**, or **Wait and Save** instance market option for your fleet. By default, fleets use the Spot option.

1. For service access for your fleet, select an existing role or create a new role. A service role provides credentials to instances in the fleet, granting them permission to process jobs, and to users in the monitor so that they can read log information.

1. Choose **Next**.

1. Choose between CPU only instances or GPU accelerated instances. GPU accelerated instances may be able to process your jobs faster, but can be more expensive.

1. Select the operating system for your workers. You can leave the default, **Linux** or choose **Windows**.

1. (Optional) If you selected GPU accelerated instances, set the maximum and minimum number of GPUs in each instance. For testing purposes you are limited to one GPU. To request more for your production workloads, see [Requesting a quota increase](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) in the *Service Quotas User Guide*.

1. Enter the minimum and maximum **vCPUs** that you require for you fleet.

1. Enter the minimum and maximum **memory** that you require for you fleet.

1. (Optional) You can choose to allow or exclude specific instance types from your fleet to ensure only those instance types are used for this fleet.

1. (Optional) Set the maximum number of instances to scale the fleet so that capacity is available for the jobs in the queue. We recommend that you leave the minimum number of instances at **0** to ensure the fleet releases all instances when no jobs are queued. 

1. (Optional) You can specify the size of the Amazon Elastic Block Store (Amazon EBS) gp3 volume that will be attached to the workers in this fleet. For more information, see the [ EBS user guide](https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html#gp3-ebs-volume-type). 

1. Choose **Next**.

1. (Optional) Define custom worker capabilities that define features of this fleet that can be combined with custom host capabilities specified on job submissions. One example is a particular license type if you plan to connect your fleet to your own license server.

1. Choose **Next**.

1. (Optional) To associate your fleet with a queue, select a **queue** from the dropdown. If the queue is set up with the default conda queue environment, your fleet is automatically provided with packages that support partner DCC applications and renderers. For a list of provided packages, see [Default conda queue environment](create-queue-environment.md#conda-queue-environment).

1. Choose **Next**.

1. (Optional) To add a tag to your fleet, choose **Add new tag**, and then enter the **key** and **value** for that tag.

1. Choose **Next**.

1. Review your fleet settings, and then choose **Create fleet**. 

# Use a GPU accelerator
<a name="smf-gpu"></a>

You can configure worker hosts in your service-managed fleets to use one or more GPUs to accelerate processing your jobs. Using an accelerator can reduce the time that it takes to process a job, but can increase the cost of each worker instance. You should test your workloads to understand the trade offs between a fleet using GPU accelerators and fleets that don't.

GPUs are not available for fleets with wait-and-save intances.

**Note**  
For testing purposes you are limited to one GPU. To request more for your production workloads, see [Requesting a quota increase](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) in the *Service Quotas User Guide*.

You decide whether your fleet will use GPU accelerators when you specify the worker instance capabilities. If you decide to use GPUs, you can specify the minimum and maximum number of GPUs for each instance, the types of GPU chips to use, and the runtime driver for the GPUs.

The available GPU accelerators are:
+ `T4` - NVIDIA T4 Tensor Core GPU
+ `A10G` - NVIDIA A10G Tensor Core GPU
+ `L4` - NVIDIA L4 Tensor Core GPU
+ `L40s` - NVIDIA L40S Tensor Core GPU

You can choose from the following runtime drivers:
+ `Latest` - Use the latest runtime available for the chip. If you specify `latest` and a new version of the runtime is released, the new version of the runtime is used.
+ `grid:r580` - [NVIDIA vGPU software 19](https://docs.nvidia.com/vgpu/19.0/index.html).
+ `grid:r570` - [NVIDIA vGPU software 18](https://docs.nvidia.com/vgpu/18.0/index.html).
+ `grid:r550` (deprecated) - [NVIDIA vGPU software 17](https://docs.nvidia.com/vgpu/17.0/index.html).

If you don't specify a runtime, Deadline Cloud uses `latest` as the default. However, if you have multiple accelerators and specify `latest` for some and leave others blank, Deadline Cloud raises an exception.

# Software licensing for service-managed fleets
<a name="smf-licensing"></a>

Deadline Cloud provides usage-based licensing (UBL) for commonly used software packages. Supported software packages are automatically licensed when they run on a service-managed fleet. You don't need to configure or maintain a software license server. Licenses scale so you won't run out for larger jobs. 

You can install software packages that support UBL using the built-in Deadline Cloud conda channel, or you can use your own packages. For more information about the conda channel, see [Create a queue environment](create-queue-environment.md).

For a list of supported software packages and information about pricing for UBL, see [AWS Deadline Cloud pricing](https://aws.amazon.com/deadline-cloud/pricing/). 

## Bring your own license with service-managed fleets
<a name="bring-your-own"></a>

With Deadline Cloud usage-based licensing (UBL) you don't need to manage separate licence agreements with software vendors. However, if you have existing licenses or need to use software that isn't available through UBL, you can use your own software licenses with your Deadline Cloud service-managed fleets. You connect your SMF to the software license server via the internet to check out a license for each worker in the fleet.

For an example of connecting to a license server using a proxy, see [Connect service-managed fleets to a custom license server](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/smf-byol.html) in the *Deadline Cloud Developer Guide*.

# VFX Reference Platform compatibility
<a name="smf-vfx"></a>

The VFX Reference Platform is a common target platform for the VFX industry. To use the standard service-managed fleet Amazon EC2 instance running Amazon Linux 2023 with software that supports the VFX Reference Platform, you should keep in mind the following considerations when using a service-managed fleet.

The VFX Reference Platform is updated annually. These considerations for using an AL2023 including Deadline Cloud service-managed fleets are based on the calendar year (CY) 2022 through 2024 Reference Platforms. For more information, see [https://vfxplatform.com/](https://vfxplatform.com/).

**Note**  
If you are creating a custom Amazon Machine Image (AMI) for a customer-managed fleet, you can add these requirements when you prepare the Amazon EC2 instance.

To use VFX Reference Platform supported software on an AL2023 Amazon EC2 instance, consider the following:
+ The glibc version installed with AL2023 is compatible for runtime use, but not for building software compatible with the VFX Reference Platform CY2024 or earlier.
+ Python 3.9 and 3.11 are provided with the service-managed fleet making it compatible with VFX Reference Platform CY2022 and CY2024. Python 3.7 and 3.10 are not provided in the service-managed fleet. Software requiring them must provide the Python installation in the queue or job environment.
+ Some Boost library components provided in the service-managed fleet are version 1.75, which is not compatible with the VFX Reference Platform. If your application uses Boost, you must provide your own version of the library for compatibility.
+ Intel TBB update 3 is provided in the service-managed fleet. This version is compatible with VFX Reference Platform CY2022, CY2023, and CY2024.
+ Other libraries with versions specified by the VFX Reference Platform are not provided by the service-managed fleet. You must provide the library with any application used on a service-managed fleet. For a list of libraries, see the [reference platform](https://vfxplatform.com/).

# Worker AMI software contents
<a name="ami-contents"></a>

This section provides information on software installed on AWS Deadline Cloud service-managed worker Amazon Machine Images (AMIs).

AWS Deadline Cloud service-managed worker AMIs are based on both Windows Server 2022 and Amazon Linux 2023, and include additional software specifically installed to support rendering workloads. These AMIs are continuously updated to maintain functionality.

The software on these AMIs is organized into one of the following support categories:

Service-provided software packages  
Software specifically installed and maintained for rendering workloads

Additional system software  
All other software that might change without notice

## Service-provided software packages
<a name="ami-contents-software-packages"></a>

These software packages are installed to support rendering workloads and are maintained for compatibility. You can safely take dependencies on these packages.

### Development Tools & Languages
<a name="ami-contents-development-tools-languages"></a>

**Linux (AL2023):**
+ Python 3.11
+ Git

**Windows (Server 2022):**
+ Python 3.11
+ Git for Windows

### AWS tools
<a name="ami-contents-aws-tools"></a>

**Both platforms:**
+ AWS Command Line Interface v2 (AWS CLI v2)

### System libraries & utilities
<a name="ami-contents-system-libraries-utilities"></a>

**Linux:**
+ FUSE and FUSE3 libraries for filesystem operations
+ Image Libraries
  + libpng
  + libjpeg
  + libtiff
+ OpenGL Libraries
  + mesa-libGLU
  + mesa-libGL
  + mesa-libEGL
  + libglvnd-opengl
+ Development Libraries:
  + json-c (JSON parsing)
  + libnsl (network services library)
  + libxcrypt-compat (encryption compatibility)
+ X Window Libraries
  + libXmu
  + libXpm
  + libXinerama
  + libXcomposite
  + libXrender
  + libXrandr
  + libXcursor
  + libXi
  + libxdamage
  + libXtst
  + libxkbcommon
  + libSM
+ Network and system utilities
  + tcsh

### GPU accelerated fleets
<a name="ami-contents-gpu-fleets"></a>
+ Nvidia grid drivers

### Package managers
<a name="ami-contents-package-managers"></a>

**Linux:**
+ conda/Mamba package manager (installed in `/opt/conda`)
+ DNF package manager (system packages)
+ pip (Python package installer)

**Windows:**
+ conda/Mamba package manager (installed in `C:\ProgramData\conda`)
+ pip (Python package installer)

### Additional system software
<a name="ami-contents-additional-software"></a>

All other software on the AMI can be updated, removed, or changed without notice. Do not take dependencies on any software not explicitly listed in the *Supported Software Packages* section above. This restriction includes but is not limited to:
+ Operating system packages and libraries
+ Service management components
+ Base AMI software and drivers
+ Software dependencies and runtime libraries
+ System configuration tools and utilities

#### Additional system software examples
<a name="additional-system-software-examples"></a>

**Linux:** System packages such as systemd, kernel modules, hardware drivers, networking components, and the supporting libraries installed as part of the base AL2023 distribution.

**Windows:** Windows system components, Microsoft Edge, Amazon EC2 service software, hardware drivers, and Windows runtime components.

### Best practices
<a name="ami-contents-best-practices"></a>

**Dependency management**: Only take dependencies on software listed in the *Supported software packages* section.

**Package Versions**: For specific software versions, install specific packages using package managers (such as pip, conda, and more.) rather than relying on AMI-provided versions.

**Environment Isolation**: Use virtual environments (such as Python venv, and conda environments) to isolate your specific dependencies.

### AMI update model
<a name="ami-contents-update-model"></a>

Note the following information about how the worker AMI updates.
+ Worker AMIs are continuously updated with no versioning system.
+ Updates occur automatically as part of the service operation.
+ No advance notification system is provided for AMI updates.

# Customer-managed fleets
<a name="manage-cmf"></a>

When you want to use a fleet of workers that you manage, you can create a customer-managed fleet (CMF) that Deadline Cloud uses to process your jobs. Use a CMF when:
+ You have existing on-premises workers to integrate with Deadline Cloud.
+ You have workers in a co-located data center.
+ You want direct control of Amazon Elastic Compute Cloud (Amazon EC2) workers.

When you use a CMF, you have full control over and responsibility for the fleet. This includes provisioning, operations, management, and decommissioning workers in the fleet.

For more information, see [Create and use Deadline Cloud customer-managed fleets](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/manage-cmf.html) in the *Deadline Cloud Developer Guide*.

# Auto scaling configuration
<a name="auto-scaling-configuration"></a>

Deadline Cloud provides auto scaling configuration options that allow you to customize how your fleet scales workers up and down. These settings help you balance job processing speed with cost efficiency based on your workflow requirements.

You can configure the following auto scaling settings for your fleet:
+ **Minimum worker count** – Specifies the minimum number of workers maintained in the fleet at all times.
+ **Maximum worker count** – Limits how many workers can run simultaneously.
+ **Scale out rate** – Controls how quickly workers are added to your fleet.
+ **Worker idle duration** – Controls how long workers wait for new work before shutting down.
+ **Standby worker count** – Maintains a warm standby pool of idle workers to start jobs fast.

How auto scaling works depends on your fleet type:
+ **Service-managed fleets** – Deadline Cloud automatically implements auto scaling based on your configuration. You configure the settings and the service handles worker provisioning.
+ **Customer-managed fleets** – If you have completed the auto scaling setup for your customer-managed fleet, the auto scaling configuration works the same as for service-managed fleets. The service uses the configuration to calculate desired capacity and sends recommended fleet size events to your fleet. For more information, see [Set up auto scaling for customer-managed fleets](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/create-auto-scaling.html) in the *Deadline Cloud Developer Guide*.

## Scale out rate
<a name="auto-scaling-scale-out-rate"></a>

The **scale out rate** (`scaleOutWorkersPerMinute`) setting controls how many workers start launching per minute when your fleet scales out. Because Amazon EC2 instances can take several minutes to launch, workers may not be immediately available.

Consider the following when configuring the scale out rate:
+ A higher rate launches more workers quickly, which can reduce job completion time for large jobs.
+ A higher rate may launch more workers than necessary for short-lived tasks, increasing costs.
+ A lower rate can help detect job failures earlier and reduce costs from wasted compute on failing jobs.
+ For short-lived tasks, a conservative scaling approach can be more cost-effective because workers spend less time loading environments relative to actual task execution.

**Note**  
The scale out rate is a best-effort setting. Actual scaling speed may vary based on instance availability and other system factors. In rare conditions, the actual rate may briefly exceed the configured value.

## Worker idle duration
<a name="auto-scaling-worker-idle-duration"></a>

The **worker idle duration** (`workerIdleDurationSeconds`) setting controls how long a worker remains available after it finishes processing a job, measured in seconds. The default value is 300 seconds (5 minutes).

This setting is useful for iterative workflows where artists frequently revise and resubmit jobs. By keeping workers available longer, subsequent job submissions can start processing immediately without waiting for new workers to launch.

Consider the following when configuring worker idle duration:
+ A longer duration keeps workers available for rapid iteration, reducing wait times between job submissions. However, longer durations increase costs because idle workers continue to incur charges.
+ A shorter duration reduces costs by shutting down idle workers more quickly.
+ For service-managed fleets, the maximum value is 86,400 seconds (24 hours) because workers are refreshed every 24 hours. If a worker has been running for 23 hours and you set an idle duration of 10 hours, the worker shuts down after 1 hour when it reaches the 24-hour limit.

## Standby worker count
<a name="auto-scaling-standby-worker-count"></a>

The **standby worker count** (`standbyWorkerCount`) setting specifies the number of idle workers to maintain as a warm standby pool. These workers can process new jobs without the delay of launching new instances.

This setting is useful when you want to reduce job start latency. For example, standby workers are helpful when rendering with Windows instances, when using host configuration scripts that install local dependencies, or when workers require significant setup time. The fleet attempts to maintain the configured number of idle workers, but the idle count may temporarily drop while replacement workers are launching.

Consider the following when configuring standby worker count:
+ Standby workers incur costs even when not processing jobs. Balance the number of standby workers against your budget and job start latency requirements.
+ When the fleet reaches its maximum worker count, the standby pool may not be fully maintained. For example, if all workers are busy and the fleet is at its maximum size, no additional idle workers are launched.
+ When the standby worker count exceeds the minimum worker count, the minimum worker count is effectively overridden. For example, with a minimum of 1 and a standby of 2, the fleet keeps 2 idle workers when no work is available, making the minimum setting redundant.

The following diagrams show how minimum worker count and standby worker count affect fleet scaling behavior. Choose a tab to view each scenario.

------
#### [ Minimum worker count ]

![\[Diagram showing how minimum worker count maintains a fixed total of workers regardless of workload.\]](http://docs.aws.amazon.com/deadline-cloud/latest/userguide/images/auto-scaling-min-worker-count.png)


------
#### [ Standby worker count ]

![\[Diagram showing how standby worker count maintains a fixed number of idle workers, launching replacements as they pick up jobs.\]](http://docs.aws.amazon.com/deadline-cloud/latest/userguide/images/auto-scaling-standby-worker-count.png)


------

To automatically adjust your standby worker count on a schedule, use the sample AWS CloudFormation (CloudFormation) template at [fleet\$1standby\$1scheduling](https://github.com/aws-deadline/deadline-cloud-samples/tree/mainline/cloudformation/farm_templates/fleet_standby_scheduling) on GitHub.

## Configuring auto scaling settings
<a name="auto-scaling-configure"></a>

You can configure auto scaling settings when you create a fleet or update an existing fleet.

**To configure auto scaling settings**

1. Open the [Deadline Cloud console](https://console.aws.amazon.com/deadlinecloud/home).

1. Navigate to the farm that contains your fleet.

1. Choose the **Fleets** tab.

1. Select the fleet you want to configure, then choose **Edit**.

1. In the **Auto scaling** section, configure the following settings:
   + **Minimum worker count** – Enter the minimum number of workers to maintain.
   + **Maximum worker count** – Enter the maximum number of workers allowed.
   + **Scale out rate** – Enter the number of workers to launch per minute.
   + **Worker idle duration** – Enter the number of seconds that workers remain idle before shutting down.
   + **Standby worker count** – Enter the number of standby workers to maintain.

1. Choose **Save changes**.