Request throttling for the Amazon ECS API
Amazon Elastic Container Service throttles all API requests for each AWS account on a per-Region basis. We do this to ensure consistent performance and fair usage of the service for all Amazon ECS customers. Throttling ensures that calls to the Amazon ECS API do not exceed the maximum allowed API request quotas for both Amazon ECS and the other AWS services that it integrates with. API calls are subject to the request quotas whether they originate from:
- 
   A third-party application 
- 
   A command line tool 
- 
   The Amazon ECS console 
If you exceed an API throttling quota, you get the ThrottlingException error
  code.
An error occurred (ThrottlingException) when calling the DescribeClusters operation (reached max retries: 4): Rate exceeded.
com.amazonaws.services.ecs.model.AmazonECSException: Rate exceeded (Service: AmazonECS; Status Code: 400; Error
Code: ThrottlingException; Request ID: 5ed90669-e454-464d-9b2f-6523bc86f537; Proxy: null)How throttling Is applied
Amazon ECS uses the token bucket
    algorithm
Amazon ECS examines the rate of API request submissions for all Amazon ECS APIs in your account, per Region, and applies two types of API throttling quotas: sustained and burst. The sustained rate is the average number of API requests allowed per second over time for an operation. The burst rate is the maximum number of API requests allowed in any one second. With burst, you can periodically make a higher number of API requests than the sustained rate. Following which, Amazon ECS throttles subsequent API requests until the rate of API requests allowed over time stabilizes to the sustained rate. In the token bucket algorithm, the bucket maximum capacity signifies the burst rate and the bucket refill rate is the sustained rate. We will use these terms to provide you an illustration of Amazon ECS API request throttling in the following example.
You are throttled on the number of API requests you make and each request removes one token
   from the token bucket. For example, the bucket size for Cluster read
   actions, such as the DescribeClusters API, is 50 tokens, so you can make
   up to 50 DescribeClusters requests in one second. If you exceed 50 requests in a
   second, you are throttled and the remaining requests within that second fail.
Buckets automatically refill at a set rate. If the bucket is below its maximum capacity, a
   set number of tokens is added back to it every second until it reaches its maximum capacity. If
   the bucket is full when refill tokens arrive, they are discarded. The bucket cannot hold more
   than its maximum number of tokens. For example, the bucket size for Cluster read
    actions, such as the DescribeClusters API, is 50 tokens, and the refill
   rate is 20 tokens per second. If you make 50 DescribeClusters API requests in a
   second, the bucket is immediately reduced to zero tokens. The bucket is then refilled by 20
   tokens every second, until it reaches its maximum capacity of 50 tokens. This means that the
   previously empty bucket reaches its maximum capacity after 2.5 seconds.
You do not need to wait for the bucket to be completely full before you can make API
   requests. You can use tokens as they are added to the bucket. If you immediately use the refill
   tokens, the bucket does not reach its maximum capacity. For example, the bucket size for
    Cluster read actions, such as the DescribeClusters API, is 50
   tokens, and the refill rate is 20 tokens per second. If you deplete the bucket by making 50 API
   requests in a second, you can continue to make 20 API requests per second. The bucket can refill
   to the maximum capacity only if you make fewer than 20 API requests per second.
Request Token Bucket Sizes and Refill Rates
For request rate limiting purposes, API actions are grouped into categories. All API actions
   in a category share the same token bucket. For instance, DescribeClusters and
    ListClusters APIs share the Cluster read actions bucket, for
   which capacity is 50 and refill rate is 20. This means that the cumulative number of API requests
   for all Cluster read actions is throttled by the same burst rate quota of 50
   API requests. Thus, you can make 25 DescribeClusters and 25
    ListClusters API requests in one second, or 30 DescribeClusters and 20
    ListClusters, or 50 DescribeClusters and 0 ListClusters,
   or 0 DescribeClusters and 50 ListClusters, but you cannot make 50
    DescribeClusters and 50 ListClusters requests at the same time.
   Sustained rate is similarly applied cumulatively to all API requests within a bucket.
The following table shows the bucket capacity (or burst) and refill rate (or sustained) for all AWS Regions. All API action categories enforce rate quotas for each AWS account on a per-Region basis.
| API action category | Actions | Bucket maximum capacity (or Burst rate) | Bucket refill rate (or Sustained rate) | 
|---|---|---|---|
| Cluster modify actions | 
 | 20 | 1 | 
| Cluster read actions | 
 | 50 | 20 | 
| Task definition modify actions | 
 | 20 | 1 | 
| Task definition read actions | 
 | 50 | 20 | 
| Task definition deletion actions | 
 | 5 | 1 | 
| Capacity provider modify actions | 
 | 10 | 1 | 
| Capacity provider read actions | 
 | 50 | 20 | 
| Tag modify actions | 
 | 20 | 10 | 
| Tag read actions | 
 | 50 | 20 | 
| Setting modify actions | 
 | 10 | 1 | 
| Setting read actions | 
 | 50 | 20 | 
| Cluster resource modify actions | 
 | 100 | 40 | 
| Cluster resource read actions | 
 | 100 | 20 | 
| Agent modify actions | 
 | 200 | 120 | 
| Service modify actions | 
 | 50 | 5 | 
| Service read actions | 
 | 100 | 20 | 
| Service deployment actions | 
 | 50 | 20 | 
| Service revision actions | 
 | 50 | 20 | 
| Task protection actions | 
 | 200 | 80 | 
| Cluster service resource read actions | 
 | 10 | 1 | 
1 AWS Fargate additionally throttles Amazon ECS
    RunTask API to the rates listed here in the Amazon ECS Developer Guide.
Adjusting API throttling quotas
You can request an increase for API throttling quotas for your AWS account. To request a
   quota adjustment, contact the AWS Support Center
Handling API throttling
You can implement an error retry and exponential back-off strategy to avoid the impact of throttling errors on your workloads. If you use AWS SDK, the automatic retry logic is already built-in and configurable. You can refer to the following resources for more details:
- 
    Error retries and exponential backoff in AWS in the AWS General Reference Guide 
- 
    Exponential backoff and jitter blog post 
- 
    
    Timeouts, retries, and backoff with jitter article in the Amazon Builder’s Library