

# Inference provider targets
<a name="gateway-target-inference-provider"></a>

Inference provider targets give you explicit control over the endpoint, model mappings, and operations for a model provider. Use a provider configuration when you need to customize which models are available, set per-model token limits, configure path rewriting, or connect to a provider that doesn’t have a built-in connector.

**Topics**
+ [Target configuration](#gateway-target-inference-provider-config)
+ [Creating a provider inference target](#gateway-target-inference-provider-create)
+ [Invoking a provider inference target](#gateway-target-inference-provider-invoke)
+ [Model-based routing](#gateway-target-inference-provider-routing)
+ [Streaming](#gateway-target-inference-provider-streaming)
+ [Outbound authorization](#gateway-target-inference-provider-auth)

## Target configuration
<a name="gateway-target-inference-provider-config"></a>

The target configuration for an inference provider target uses the following structure:

```
{
    "inference": {
        "provider": {
            "endpoint": "https://api.openai.com",
            "operations": [
                {
                    "path": "/v1/chat/completions",
                    "models": [
                        {"model": "gpt-5.5"},
                        {"model": "gpt-5.4"},
                        {"model": "gpt-5.4-mini"}
                    ]
                },
                {
                    "path": "/v1/responses",
                    "models": [
                        {"model": "gpt-5.5"},
                        {"model": "gpt-5.4"}
                    ]
                }
            ]
        }
    }
}
```
+  **endpoint** (required) – The HTTPS URL of the model provider.
+  **modelMapping** (optional) – Model ID translation configuration.
  +  **providerPrefix** (optional) – Configures how clients can omit the provider prefix from model IDs. If omitted, no prefix translation is applied and clients must use the provider’s full model IDs.
    +  **strip** (optional) – When `true`, clients can use model IDs without the provider prefix (for example, `claude-opus-4-7` instead of `anthropic.claude-opus-4-7`). Defaults to `false`.
    +  **separator** (optional) – The separator character between the provider prefix and the model name (for example, `.`).
+  **operations** (optional) – A list of operation configurations that map request paths to supported models:
  +  **path** (required) – The request path for this operation (for example, `/v1/chat/completions`).
  +  **providerPath** (optional) – The path to forward to on the provider if it differs from the request path.
  +  **models** (optional) – The models supported for this operation. Each entry includes a **model** field (required) containing a model ID or glob pattern (for example, `anthropic.claude-opus-*`).

## Creating a provider inference target
<a name="gateway-target-inference-provider-create"></a>

The following example creates an OpenAI inference target using a provider configuration:

```
aws bedrock-agentcore-control create-gateway-target --cli-input-json '{
    "gatewayIdentifier": "GATEWAY_ID",
    "name": "openai",
    "targetConfiguration": {
        "inference": {
            "provider": {
                "endpoint": "https://api.openai.com",
                "operations": [
                    {
                        "path": "/v1/chat/completions",
                        "models": [
                            {"model": "gpt-5.5"},
                            {"model": "gpt-5.4"},
                            {"model": "gpt-5.4-mini"}
                        ]
                    },
                    {
                        "path": "/v1/responses",
                        "models": [
                            {"model": "gpt-5.5"},
                            {"model": "gpt-5.4"}
                        ]
                    }
                ]
            }
        }
    },
    "credentialProviderConfigurations": [
        {
            "credentialProviderType": "API_KEY",
            "credentialProvider": {
                "apiKeyCredentialProvider": {
                    "providerArn": "arn:aws:bedrock-agentcore:us-west-2:111122223333:token-vault/default/apikeycredentialprovider/openai-key",
                    "credentialLocation": "HEADER",
                    "credentialParameterName": "Authorization",
                    "credentialPrefix": "Bearer "
                }
            }
        }
    ]
}'
```

The following example creates a Bedrock inference target with explicit provider configuration and model mapping. The `modelMapping` configuration with `providerPrefix` allows clients to use short model names (like `claude-opus-4-7`) while the gateway translates them to provider-prefixed names (like `anthropic.claude-opus-4-7`):

```
aws bedrock-agentcore-control create-gateway-target --cli-input-json '{
    "gatewayIdentifier": "GATEWAY_ID",
    "name": "bedrock",
    "targetConfiguration": {
        "inference": {
            "provider": {
                "endpoint": "https://bedrock-mantle.us-east-1.api.aws",
                "modelMapping": {
                    "providerPrefix": {"strip": true, "separator": "."}
                },
                "operations": [
                    {
                        "path": "/v1/chat/completions",
                        "models": [
                            {"model": "anthropic.claude-opus-*"},
                            {"model": "anthropic.claude-sonnet-*"},
                            {"model": "openai.gpt-oss-*"}
                        ]
                    },
                    {
                        "path": "/v1/messages",
                        "providerPath": "/anthropic/v1/messages",
                        "models": [
                            {"model": "anthropic.claude-opus-*"},
                            {"model": "anthropic.claude-sonnet-*"}
                        ]
                    }
                ]
            }
        }
    },
    "credentialProviderConfigurations": [
        {"credentialProviderType": "GATEWAY_IAM_ROLE"}
    ]
}'
```

## Invoking a provider inference target
<a name="gateway-target-inference-provider-invoke"></a>

To invoke an inference target, send requests to the gateway’s `/inference` path. The gateway routes each request to the correct target based on the `model` field in the request body. The `model` value can be either a plain model ID (for example, `gpt-5.5`) or a target-qualified model ID in the form `{targetName}/{modelId}` (for example, `openai/gpt-5.5`). For details on how the `model` value is matched to a target, see [Model-based routing](#gateway-target-inference-provider-routing).

The URL format is:

```
https://{gatewayId}.gateway.bedrock-agentcore.{region}.amazonaws.com/inference/{path}
```

Replace `{path}` with the inference operation path (for example, `v1/chat/completions`, `v1/responses`, or `v1/messages`).

### Using the OpenAI SDK
<a name="_using_the_openai_sdk"></a>

Set the gateway’s `/inference/v1` path as the `base_url`:

```
from openai import OpenAI

client = OpenAI(
    base_url="https://<gateway-id>.gateway.bedrock-agentcore.us-west-2.amazonaws.com/inference/v1",
    api_key="<gateway-auth-token>"
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

### Using the Anthropic SDK
<a name="_using_the_anthropic_sdk"></a>

Set the gateway’s `/inference` path as the `base_url`:

```
import anthropic

client = anthropic.Anthropic(
    base_url="https://<gateway-id>.gateway.bedrock-agentcore.us-west-2.amazonaws.com/inference",
    api_key="<gateway-auth-token>"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)
```

### Using awscurl
<a name="_using_awscurl"></a>

```
awscurl --service bedrock-agentcore --region us-west-2 -X POST \
    "https://<gateway-id>.gateway.bedrock-agentcore.us-west-2.amazonaws.com/inference/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{"model": "gpt-5.5", "messages": [{"role": "user", "content": "Hello!"}]}'
```

### Qualified model routing
<a name="_qualified_model_routing"></a>

When multiple targets serve the same model, prefix the model ID with the target name to route to a specific provider:

```
# Route explicitly to the "bedrock" target
response = client.chat.completions.create(
    model="bedrock/claude-opus-4-7",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

## Model-based routing
<a name="gateway-target-inference-provider-routing"></a>

The gateway routes inference requests based on the `model` field in the request body:

1.  **Qualified routing** – If the model ID contains a `/` and the prefix matches a target name, the request is routed to that target (for example, `openai/gpt-5.5` routes to the `openai` target).

1.  **Unqualified routing** – If the model ID does not contain a `/`, the gateway matches it against all configured targets. An exact match takes priority over glob patterns. If exactly one target matches, the request is routed to it.

1.  **Collision handling** – When multiple targets match the same model at the same specificity, the gateway defaults to the Amazon Bedrock target if one is among the matches. Otherwise, it distributes requests across the matching targets in round-robin order. To pin requests to a specific target, qualify the model with the target name as a prefix (for example, `bedrock/claude-opus-4-7`).

## Streaming
<a name="gateway-target-inference-provider-streaming"></a>

Streaming follows the OpenAI SSE convention. Set `"stream": true` in the request body, and the gateway passes through the SSE stream from the provider without transformation:

```
stream = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Write a story."}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
```

## Outbound authorization
<a name="gateway-target-inference-provider-auth"></a>

Inference provider targets support the following outbound authorization types:
+  **IAM (SigV4)** – Use `GATEWAY_IAM_ROLE` for providers that accept IAM authentication (such as Amazon Bedrock).
+  **API key** – Use `API_KEY` for providers that require an API key (such as OpenAI and Anthropic). The gateway injects the stored API key into outbound requests.