Quotas for Amazon Bedrock - Amazon Bedrock

Quotas for Amazon Bedrock

Your AWS account has default quotas, formerly referred to as limits, for Amazon Bedrock. To view service quotas for Amazon Bedrock, do one of the following:

To maintain the performance of the service and to ensure appropriate usage of Amazon Bedrock, the default quotas assigned to an account might be updated depending on Regional factors, payment history, fraudulent usage, and/or approval of a quota increase request.

Request an increase for Amazon Bedrock quotas

The steps for requesting a quota increase for your account depend on the value in the Adjustable column in the quotas table in Amazon Bedrock service quotas:

  • If a quota is marked as Yes, you can adjust it by following the steps at Requesting a Quota Increase in the Service Quotas User Guide.

  • If a quota is marked as No, you can submit a request through the limit increase form to be considered for an increase.

  • For any model, you can request an increase for the following quotas together:

    • Cross-Region InvokeModel tokens per minute for ${model}

    • Cross-Region InvokeModel requests per minute for ${model}

    • On-demand InvokeModel tokens per minute for ${model}

    • On-demand InvokeModel requests per minute for ${model}

    To request an increase for any combination of these quotas, request an increase for the Cross-Region InvokeModel tokens per minute for ${model} quota by following the steps at Requesting a Quota Increase in the Service Quotas User Guide. After you do so, the support team will reach out and offer you the option of also increasing the other three quotas.

    Note

    Due to overwhelming demand, priority will be given to customers who generate traffic that consumes their existing quota allocation. Your request might be denied if you don't meet this condition.

Token burndown rate for Anthropic Claude 4 models

Inference Quotas for models with non-standard token burndown rates.

Amazon Bedrock model inference quotas are measured in three dimensions: RPM (requests per minute), TPM (tokens per minute), and TPDs (token per day). Quotas can be hit across any of the dimensions depending on what occurs first.

A burndown rate is a ratio that converts the input and output tokens to token quota usage by the throttling system. This ratio represents the rate at which input and output tokens count toward the token quotas.

Most models have a burndown rate of 1 token per 1 input or 1 output token, except for Anthropic Claude 4 models. See the tables below for Anthropic Claude 4 burndown rates. For more information on token use and pricing in Amazon Bedrock, see Amazon Bedrock Pricing.

We use the max_tokens value specified in the API request to estimate the output burndown toward token quotas when we receive your request. We adjust the output burndown to the actual usage at the completion of the request. To avoid early throttling, select a max_tokens value close to your expected output tokens.

Model token non-standard burndown rates

Model

Input token

Output token

Claude Opus 4

1 token per input token

5 tokens per output token

Claude Sonnet 4

1 token per input token

5 tokens per output token