Quotas for Amazon Bedrock
Your AWS account has default quotas, formerly referred to as limits, for Amazon Bedrock. To view service quotas for Amazon Bedrock, do one of the following:
-
Follow the steps at Viewing service quotas and select Amazon Bedrock as the service.
-
Refer to the Amazon Bedrock service quotas in the AWS General Reference.
To maintain the performance of the service and to ensure appropriate usage of Amazon Bedrock, the default quotas assigned to an account might be updated depending on Regional factors, payment history, fraudulent usage, and/or approval of a quota increase request.
Topics
Request an increase for Amazon Bedrock quotas
The steps for requesting a quota increase for your account depend on the value in the Adjustable column in the quotas table in Amazon Bedrock service quotas:
-
If a quota is marked as Yes, you can adjust it by following the steps at Requesting a Quota Increase in the Service Quotas User Guide.
-
If a quota is marked as No, you can submit a request through the limit increase form
to be considered for an increase. -
For any model, you can request an increase for the following quotas together:
-
Cross-Region InvokeModel tokens per minute for
${model}
-
Cross-Region InvokeModel requests per minute for
${model}
-
On-demand InvokeModel tokens per minute for
${model}
-
On-demand InvokeModel requests per minute for
${model}
To request an increase for any combination of these quotas, request an increase for the Cross-Region InvokeModel tokens per minute for
${model}
quota by following the steps at Requesting a Quota Increase in the Service Quotas User Guide. After you do so, the support team will reach out and offer you the option of also increasing the other three quotas.Note
Due to overwhelming demand, priority will be given to customers who generate traffic that consumes their existing quota allocation. Your request might be denied if you don't meet this condition.
-
Token burndown rate for Anthropic Claude 4 models
Inference Quotas for models with non-standard token burndown rates.
Amazon Bedrock model inference quotas are measured in three dimensions: RPM (requests per minute), TPM (tokens per minute), and TPDs (token per day). Quotas can be hit across any of the dimensions depending on what occurs first.
A burndown rate is a ratio that converts the input and output tokens to token quota usage by the throttling system. This ratio represents the rate at which input and output tokens count toward the token quotas.
Most models have a burndown rate of 1 token per 1 input or 1 output token, except for
Anthropic Claude 4 models. See the tables below for Anthropic Claude 4 burndown rates.
For more information on token use and pricing in Amazon Bedrock, see Amazon Bedrock Pricing
We use the max_tokens
value specified in the API request to estimate the output burndown toward token quotas when we receive your request. We adjust the output burndown to the actual usage at the completion of the request. To avoid early throttling, select a max_tokens
value close to your expected output tokens.
Model |
Input token |
Output token |
---|---|---|
Claude Opus 4 |
1 token per input token |
5 tokens per output token |
Claude Sonnet 4 |
1 token per input token |
5 tokens per output token |