

# Increase throughput with cross-Region inference
Cross-Region inference

With cross-Region inference, you can choose either a cross-Region inference profile tied to a specific geography (such as US or EU), or you can choose a global inference profile. When you choose an inference profile tied to a specific geography, Amazon Bedrock automatically selects the optimal commercial AWS Region within that geography to process your inference request. With global inference profiles, Amazon Bedrock automatically selects the optimal commercial AWS Region to process the request, which optimizes available resources and increases model throughput.

Both types of cross-Region inference work through [inference profiles](inference-profiles.md), which define a foundation model (FM) and the AWS Regions to which requests can be routed. When running model inference in on-demand mode, your requests might be restricted by service quotas or during peak usage times. Cross-Region inference enables you to seamlessly manage unplanned traffic bursts by utilizing compute across different AWS Regions.

You can also increase throughput for a model by purchasing [Provisioned Throughput](prov-throughput.md). Inference profiles currently don't support Provisioned Throughput.

To see the Regions and models with which you can use inference profiles to run cross-Region inference, refer to [Supported Regions and models for inference profiles](inference-profiles-support.md).

**Topics**
+ [

## Choosing between Geographic and Global cross-Region inference
](#cross-region-inference-comparison)
+ [

## General considerations
](#cross-region-inference-general-considerations)
+ [

# Geographic cross-Region inference
](geographic-cross-region-inference.md)
+ [

# Global cross-Region inference
](global-cross-region-inference.md)

## Choosing between Geographic and Global cross-Region inference


Amazon Bedrock provides two types of cross-Region inference profiles, each designed for different use cases and compliance requirements:


| Feature | Geographic Cross-Region Inference | Global Cross-Region Inference | Recommendation | 
| --- | --- | --- | --- | 
| Data residency | Within geographic boundaries (US, EU, APAC, etc.) | Any supported AWS commercial Region worldwide | Choose Geographic for compliance requirements | 
| Throughput | Higher than single-region | Highest available | Choose Global for maximum performance | 
| Cost | Standard pricing | Approximately 10% savings | Choose Global for cost optimization | 
| SCP requirements | Allow all destination Regions in profile | Allow "aws:RequestedRegion": "unspecified" | Configure based on your organizational policies | 
| Best suited for | Organizations with data residency regulations | Organizations prioritizing cost and performance | Assess your compliance and performance needs | 

Choose Geographic cross-Region inference when you have data residency requirements and need to ensure data processing remains within specific geographic boundaries. Choose Global cross-Region inference when you want maximum throughput and cost savings without geographic restrictions.

## General considerations


Note the following information about cross-Region inference:
+ There's no additional routing cost for using cross-Region inference. The price is calculated based on the Region from which you call an inference profile. For information about pricing, see [Amazon Bedrock pricing](https://aws.amazon.com/bedrock/pricing/).
+ Cross-Region inference can route requests to AWS Regions that are not manually enabled in your AWS account. Manual Region enablement is not required for cross-Region inference to function.
+ All data transmitted during cross-Region operations remains on the AWS network and does not traverse the public internet. Data is encrypted in transit between AWS Regions.
+ All cross-Region inference requests are logged in CloudTrail in your source Region. Look for the `additionalEventData.inferenceRegion` field to identify where requests were processed.
+ AWS Services powered by Amazon Bedrock may also use CRIS. See service-specific documentation for more details.

# Geographic cross-Region inference


Geographic cross-Region inference keeps data processing within specified geographic boundaries (US, EU, APAC, etc.) while providing higher throughput than single-region inference. This option is ideal for organizations with data residency requirements and compliance regulations.

## Geographic cross-Region inference considerations


Note the following information about Geographic cross-Region inference:
+ Cross-Region inference requests to an inference profile tied to a geography (e.g. US, EU and APAC) are kept within the AWS Regions that are part of the geography where the data originally resides. For example, a request made within the US is kept within the AWS Regions in the US. Although the data remains stored only in the source Region, your input prompts and output results might move outside of your source Region during cross-Region inference. All data will be transmitted encrypted across Amazon's secure network.
+ To see the default quotas for cross-Region throughput when using inference profiles tied to a geography (such as US, EU and APAC), refer to the **Cross-region model inference requests per minute for \$1\$1Model\$1** and **Cross-region model inference tokens per minute for \$1\$1Model\$1** values in [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock) in the *AWS General Reference*.

## IAM policy requirements for Geographic cross-Region inference


To allow an IAM user or role to invoke a Geographic cross-Region inference profile, you need to allow access to the following resources:

1. The geography-specific cross-Region inference profile (these profiles have geographic prefixes such as `us`, `eu`, `apac`)

1. The foundation model in the source Region

1. The foundation model in all destination Regions listed in the geographic profile

The following example policy grants the required permissions to use the Claude Sonnet 4.5 foundation model with a Geographic cross-Region inference profile for the US, where the source Region is `us-east-1` and the destination Regions are `us-east-1`, `us-east-2`, and `us-west-2`:

```
{
    "Version": "2012-10-17"		 	 	 ,
    "Statement": [
        {
            "Sid": "GrantGeoCrisInferenceProfileAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:us-east-1:<ACCOUNT_ID>:inference-profile/us.anthropic.claude-sonnet-4-5-20250929-v1:0"
            ]
        },
        {
            "Sid": "GrantGeoCrisModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0",
                "arn:aws:bedrock:us-east-2::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0",
                "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0"
            ],
            "Condition": {
                "StringEquals": {
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:us-east-1:<ACCOUNT_ID>:inference-profile/us.anthropic.claude-sonnet-4-5-20250929-v1:0"
                }
            }
        }
    ]
}
```

The first statement grants `bedrock:InvokeModel` API access to the Geographic cross-Region inference profile for requests originating from the requesting Region. The second statement grants `bedrock:InvokeModel` API access to the foundation model in both the requesting Region and all destination Regions listed in the inference profile.

## Service Control Policy requirements for Geographic cross-Region inference


Many organizations implement Regional access controls through Service Control Policies in AWS Organizations for security and compliance. If your organization's security policy uses SCPs to block unused Regions, you must ensure that your Region-specific SCP conditions allow access to all destination Regions listed in the Geographic cross-Region inference profile for your source Region.

For Geographic cross-Region inference, you need to understand the relationship between your source Region (where you make the API call) and the destination Regions (where requests can be routed). Check the inference profile documentation to identify all destination Regions for your source Region, then ensure your SCPs allow access to all those destination Regions.

For example, if you're calling from us-east-1 (source Region) using the US Anthropic Claude Sonnet 4.5 Geographic profile, requests can be routed to us-east-1, us-east-2, and us-west-2 (destination Regions). If an SCP restricts access to only us-east-1, cross-Region inference will fail when trying to route to us-east-2 or us-west-2. Therefore, you need to allow all three destination regions in your SCP, regardless of which Region you're calling from.

When configuring SCPs for Region exclusion, remember that blocking any destination Region in the inference profile will prevent cross-Region inference from functioning properly, even if your source Region remains accessible. For SCP requirements for Global cross-Region inference, see [Service Control Policy requirements for Global cross-Region inference](global-cross-region-inference.md#global-cris-scp-setup).

To improve security, consider using the `bedrock:InferenceProfileArn` condition to limit access to specific inference profiles. This allows you to grant access to the required Regions while restricting which inference profiles can be used.

## Use Geographic cross-Region inference


To use Geographic cross-Region inference, you include an [inference profile](inference-profiles.md) when running model inference in the following ways:
+ **On-demand model inference** – Specify the ID of the inference profile as the `modelId` when sending an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html), [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html), or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) request. An inference profile defines one or more Regions to which it can route inference requests originating from your source Region. Use of cross-Region inference increases throughput and performance by dynamically routing model invocation requests across the Regions defined in inference profile. Routing factors in user traffic, demand and utilization of resources. For more information, see [Submit prompts and generate responses with model inference](inference.md)
+ **Batch inference** – Submit requests asynchronously with batch inference by specifying the ID of the inference profile as the `modelId` when sending a [CreateModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelInvocationJob.html) request. Using an inference profile lets you utilize compute across multiple AWS Regions and achieve faster processing times for your batch jobs. After the job is complete, you can retrieve the output files from the Amazon S3 bucket in the source Region.
+ **Agents** – Specify the ID of the inference profile in the `foundationModel` field in a [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgent.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgent.html) request. For more information, see [Create and configure agent manually](agents-create.md).
+ **Knowledge base response generation** – You can use cross-Region inference when generating a response after querying a knowledge base. For more information, see [Test your knowledge base with queries and responses](knowledge-base-test.md).
+ **Model evaluation** – You can submit an inference profile as a model to evaluate when submitting a model evaluation job. For more information, see [Evaluate the performance of Amazon Bedrock resources](evaluation.md).
+ **Prompt management** – You can use cross-Region inference when generating a response for a prompt you created in Prompt management. For more information, see [Construct and store reusable prompts with Prompt management in Amazon Bedrock](prompt-management.md)
+ **Prompt flows** – You can use cross-Region inference when generating a response for a prompt you define inline in a prompt node in a prompt flow. For more information, see [Build an end-to-end generative AI workflow with Amazon Bedrock Flows](flows.md).

To learn how to use an inference profile to send model invocation requests across Regions, see [Use an inference profile in model invocation](inference-profiles-use.md).

To learn more about cross-Region inference, see [Getting started with cross-Region inference in Amazon Bedrock](https://aws.amazon.com/blogs/machine-learning/getting-started-with-cross-region-inference-in-amazon-bedrock/).

For detailed information about global cross-Region inference, including IAM setup and service quota management, see [Global cross-Region inference](global-cross-region-inference.md).

# Global cross-Region inference


Global cross-Region inference extends cross-Region inference beyond geographic boundaries, enabling the routing of inference requests to supported commercial AWS Regions worldwide, optimizing available resources and enabling higher model throughput.

## Benefits of global cross-Region inference


Global cross-Region inference for Anthropic's Claude Sonnet 4.5 delivers multiple advantages over traditional geographic cross-Region inference profiles:
+ **Enhanced throughput during peak demand** – Global cross-Region inference provides improved resilience during periods of peak demand by automatically routing requests to AWS Regions with available capacity. This dynamic routing happens seamlessly without additional configuration or intervention from developers. Unlike traditional approaches that might require complex client-side load balancing between AWS Regions, global cross-Region inference handles traffic spikes automatically. This is particularly important for business-critical applications where downtime or degraded performance can have significant financial or reputational impacts.
+ **Cost-efficiency** – Global cross-Region inference for Anthropic's Claude Sonnet 4.5 offers approximately 10% savings on both input and output token pricing compared to geographic cross-Region inference. The price is calculated based on the AWS Region from which the request is made (source AWS Region). This means organizations can benefit from improved resilience with even lower costs. This pricing model makes global cross-Region inference a cost-effective solution for organizations looking to optimize their generative AI deployments. By improving resource utilization and enabling higher throughput without additional costs, it helps organizations maximize the value of their investment in Amazon Bedrock.
+ **Streamlined monitoring** – When using global cross-Region inference, CloudWatch and CloudTrail continue to record log entries in your source AWS Region, simplifying observability and management. Even though your requests are processed across different AWS Regions worldwide, you maintain a centralized view of your application's performance and usage patterns through your familiar AWS monitoring tools.
+ **On-demand quota flexibility** – With global cross-Region inference, your workloads are no longer limited by individual Regional capacity. Instead of being restricted to the capacity available in a specific AWS Region, your requests can be dynamically routed across the AWS global infrastructure. This provides access to a much larger pool of resources, making it less complicated to handle high-volume workloads and sudden traffic spikes.

## Global cross-Region inference considerations


Note the following information about Global cross-Region inference:
+ Global Cross-Region inference profiles provide higher throughput than an inference profile tied to a particular geography. An inference profile tied to a particular geography offers higher throughput than single-region inference.
+ To see the default quotas for cross-Region throughput when using Global inference profiles, refer to the **Global Cross-region model inference requests per minute for \$1\$1Model\$1** and **Global Cross-region model inference tokens per minute for \$1\$1Model\$1** values in [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock) in the *AWS General Reference*.

  You can request, view, and manage quotas for the Global Cross-Region Inference Profile from the [Service Quotas console](https://console.aws.amazon.com/servicequotas/home/services/bedrock/quotas) or by using AWS CLI commands in your **source region**.

## IAM policy requirements for global cross-Region inference


To enable global cross-Region inference for your users, you must apply a three-part IAM policy to the role. The following is an example IAM policy to provide granular control. You can replace `<REQUESTING REGION>` in the example policy with the AWS Region you are operating in.

```
{
    "Version": "2012-10-17"		 	 	 ,
    "Statement": [
        {
            "Sid": "GrantGlobalCrisInferenceProfileRegionAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "<REQUESTING REGION>"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileInRegionModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:<REQUESTING REGION>::foundation-model/<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "<REQUESTING REGION>",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileGlobalModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:::foundation-model/<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "unspecified",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        }
    ]
}
```

The first part of the policy grants access to the Regional inference profile in your requesting AWS Region. The second part provides access to the Regional FM resource. The third part grants access to the global FM resource, which enables the cross-Region routing capability.

When implementing these policies, make sure all three resource Amazon Resource Names (ARNs) are included in your IAM statements:
+ The Regional inference profile ARN follows the pattern `arn:aws:bedrock:REGION:ACCOUNT:inference-profile/global.MODEL-NAME`. This is used to give access to the global inference profile in the source AWS Region.
+ The Regional FM uses `arn:aws:bedrock:REGION::foundation-model/MODEL-NAME`. This is used to give access to the FM in the source AWS Region.
+ The global FM requires `arn:aws:bedrock:::foundation-model/MODEL-NAME`. This is used to give access to the FM in different global AWS Regions.

The global FM ARN has no AWS Region or account specified, which is intentional and required for the cross-Region functionality.

### Disable global cross-Region inference


You can choose from two primary approaches to implement deny policies to global CRIS for specific IAM roles, each with different use cases and implications:
+ **Remove an IAM policy** – The first method involves removing one or more of the three required IAM policies from user permissions. Because global CRIS requires all three policies to function, removing a policy will result in denied access.
+ **Implement a deny policy** – The second approach is to implement an explicit deny policy that specifically targets global CRIS inference profiles. This method provides clear documentation of your security intent and makes sure that even if someone accidentally adds the required allow policies later, the explicit deny will take precedence. The deny policy should use a `StringEquals` condition matching the pattern `"aws:RequestedRegion": "unspecified"`. This pattern specifically targets inference profiles with the `global` prefix.

When implementing deny policies, it's crucial to understand that global CRIS changes how the `aws:RequestedRegion` field behaves. Traditional AWS Region-based deny policies that use `StringEquals` conditions with specific AWS Region names such as `"aws:RequestedRegion": "us-west-2"` will not work as expected with global CRIS because the service sets this field to `global` rather than the actual destination AWS Region. However, as mentioned earlier, `"aws:RequestedRegion": "unspecified"` will result in the deny effect.

## Service Control Policy requirements for Global cross-Region inference


For Global cross-Region inference, if your organization's security policy uses SCPs to block unused Regions, you must update your region-specific SCP conditions to allow access with `"aws:RequestedRegion": "unspecified"`. This condition is specific to Amazon Bedrock Global cross-Region inference and ensures that requests can be routed to all supported AWS commercial Regions.

The following example SCP blocks all AWS API calls outside of approved Regions while allowing Amazon Bedrock Global cross-Region inference calls that use `"unspecified"` as the Region for global routing:

```
{
    "Version": "2012-10-17"		 	 	 ,
    "Statement": [
        {
            "Sid": "DenyAllOutsideApprovedRegions",
            "Effect": "Deny",
            "Action": "*",
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "aws:RequestedRegion": [
                        "us-east-1",
                        "us-east-2",
                        "us-west-2",
                        "unspecified"
                    ]
                }
            }
        }
    ]
}
```

### Disable global cross-Region inference


Organizations with data residency or compliance requirements should assess whether Global cross-Region inference fits their compliance framework, since requests may be processed in other supported AWS commercial Regions. To explicitly disable Global cross-Region inference, implement the following SCP policy:

```
{
    "Effect": "Deny",
    "Action": "bedrock:*",
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "aws:RequestedRegion": "unspecified"
        },
        "ArnLike": {
            "bedrock:InferenceProfileArn": "arn:aws:bedrock:*:*:inference-profile/global.*"
        }
    }
}
```

This SCP explicitly denies Global cross-Region inference because the `"aws:RequestedRegion"` is `"unspecified"` and the `"ArnLike"` condition targets inference profiles with the `global` prefix in the ARN.

### AWS Control Tower implementation


Manually editing SCPs managed by AWS Control Tower is strongly discouraged as it can cause drift. Instead, use the mechanisms provided by Control Tower to manage these exceptions. The core principles involve either extending existing region-deny controls or enabling regions and then applying a custom, conditional blocking policy.

For detailed, step-by-step guidance on implementing cross-Region inference with Control Tower, see the blog post [ Enable Amazon Bedrock cross-Region inference in multi-account environments](https://aws.amazon.com/blogs/machine-learning/enable-amazon-bedrock-cross-region-inference-in-multi-account-environments/). This covers extending existing Region deny SCPs, enabling denied regions with custom SCPs, and using Customizations for AWS Control Tower (CfCT) to deploy custom SCPs as infrastructure as code.

## Request limit increases for global cross-Region inference


When using global CRIS inference profiles, you can use global CRIS from over 20 supported source AWS Regions. Because this will be a global limit, requests to view, manage, or increase quotas for global cross-Region inference profiles must be made through the Service Quotas console or AWS Command Line Interface (AWS CLI) in the requested source AWS Region.

Complete the following steps to request a limit increase:

1. Sign in to the Service Quotas console in your AWS account.

1. In the navigation pane, choose **AWS services**.

1. From the list of services, find and choose **Amazon Bedrock**.

1. In the list of quotas for Amazon Bedrock, use the search filter to find the specific global CRIS quotas. For example:
   + Global cross-Region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1

1. Select the quota you want to increase.

1. Choose **Request increase at account level**.

1. Enter your desired new quota value.

1. Choose **Request** to submit your request.

When calculating your required quota increase, remember to take into account for the burndown rate, defined as the rate at which input and output tokens are converted into token quota usage for the throttling system. The following models have a **5x burn down rate for output tokens (1 output token consumes 5 tokens from your quotas)**:
+ Anthropic Claude Opus 4
+ Anthropic Claude Sonnet 4.5
+ Anthropic Claude Sonnet 4
+ Anthropic Claude 3.7 Sonnet

For all other models, the burndown rate is **1:1** (1 output token consumes 1 token from your quota). For input tokens, the token to quota ratio is 1:1. The calculation for the total number of tokens per request is as follows:

`Input token count + Cache write input tokens + (Output token count x Burndown rate)`

## Use Global cross-Region inference


To use global cross-Region inference with Anthropic's Claude Sonnet 4.5, developers must complete the following key steps:
+ **Use the global inference profile ID** – When making API calls to Amazon Bedrock, specify the global Anthropic's Claude Sonnet 4.5 inference profile ID (`global.anthropic.claude-sonnet-4-5-20250929-v1:0`) instead of a AWS Region-specific model ID.
+ **Configure IAM permissions** – Grant appropriate IAM permissions to access the inference profile and FMs in potential destination AWS Regions.

Global cross-Region inference is supported for:
+ On-demand model inference
+ Batch inference
+ Agents
+ Model evaluation
+ Prompt management
+ Prompt flows

**Note**  
Global inference profile is supported for On-demand model inference, Batch inference, Agents, Model evaluation, Prompt management, and Prompt flows.

## Implement global cross-Region inference


Implementing global cross-Region inference with Anthropic's Claude Sonnet 4.5 is straightforward, requiring only a few changes to your existing application code. The following is an example of how to update your code in Python:

```
import boto3
import json
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
model_id = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"  
response = bedrock.converse(
    messages=[{"role": "user", "content": [{"text": "Explain cloud computing in 2 sentences."}]}],
    modelId=model_id,
)

print("Response:", response['output']['message']['content'][0]['text'])
print("Token usage:", response['usage'])
print("Total tokens:", response['usage']['totalTokens'])
```