

# Anthropic Claude Messages API


This section provides inference parameters and code examples for using the Anthropic Claude Messages API.

**Topics**
+ [

## Anthropic Claude Messages API overview
](#model-parameters-anthropic-claude-messages-overview)
+ [

# Tool use
](model-parameters-anthropic-claude-messages-tool-use.md)
+ [

# Extended thinking
](claude-messages-extended-thinking.md)
+ [

# Adaptive thinking
](claude-messages-adaptive-thinking.md)
+ [

# Thinking encryption
](claude-messages-thinking-encryption.md)
+ [

# Differences in thinking across model versions
](claude-messages-thinking-differences.md)
+ [

# Compaction
](claude-messages-compaction.md)
+ [

# Get validated JSON results from models
](claude-messages-structured-outputs.md)
+ [

# Request and Response
](model-parameters-anthropic-claude-messages-request-response.md)
+ [

# Code examples
](api-inference-examples-claude-messages-code-examples.md)
+ [

# Supported models
](claude-messages-supported-models.md)

## Anthropic Claude Messages API overview


You can use the Messages API to create chat bots or virtual assistant applications. The API manages the conversational exchanges between a user and an Anthropic Claude model (assistant). 

**Note**  
This topic shows how to use the Anthropic Claude messages API with the base inference operations ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)). However, we recommend that you use the Converse API to implement messages in your application. The Converse API provides a unified set of parameters that work across all models that support messages. For more information, see [Carry out a conversation with the Converse API operations](conversation-inference.md).
Restrictions apply to the following operations: `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream`. See [API restrictions](inference-api-restrictions.md) for details.

Anthropic trains Claude models to operate on alternating user and assistant conversational turns. When creating a new message, you specify the prior conversational turns with the messages parameter. The model then generates the next Message in the conversation.

Each input message must be an object with a role and content. You can specify a single user-role message, or you can include multiple user and assistant messages.

If you are using the technique of prefilling the response from Claude (filling in the beginning of Claude's response by using a final assistant role Message), Claude will respond by picking up from where you left off. With this technique, Claude will still return a response with the assistant role. 

If the final message uses the assistant role, the response content will continue immediately from the content in that message. You can use this to constrain part of the model's response. 

Example with a single user message:

```
[{"role": "user", "content": "Hello, Claude"}]
```

Example with multiple conversational turns:

```
[
  {"role": "user", "content": "Hello there."},
  {"role": "assistant", "content": "Hi, I'm Claude. How can I help you?"},
  {"role": "user", "content": "Can you explain LLMs in plain English?"},
]
```

Example with a partially-filled response from Claude:

```
[
  {"role": "user", "content": "Please describe yourself using only JSON"},
  {"role": "assistant", "content": "Here is my JSON description:\n{"},
]
```

Each input message content may be either a single string or an array of content blocks, where each block has a specific type. Using a string is shorthand for an array of one content block of type "text". The following input messages are equivalent:

```
{"role": "user", "content": "Hello, Claude"}
```

```
{"role": "user", "content": [{"type": "text", "text": "Hello, Claude"}]}
```

For information about creating prompts for Anthropic Claude models, see [Intro to prompting](https://docs.anthropic.com/claude/docs/intro-to-prompting) in the Anthropic Claude documentation. If you have existing [Text Completion](model-parameters-anthropic-claude-text-completion.md) prompts that you want to migrate to the messages API, see [Migrating from Text Completions](https://docs.anthropic.com/claude/reference/migrating-from-text-completions-to-messages).

**Important**  
The timeout period for inference calls to Anthropic Claude 3.7 Sonnet and Claude 4 models is 60 minutes. By default, AWS SDK clients timeout after 1 minute. We recommend that you increase the read timeout period of your AWS SDK client to at least 60 minutes. For example, in the AWS Python botocore SDK, change the value of the `read_timeout` field in [botocore.config](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html#) to at least 3600.

### System prompts


You can also include a system prompt in the request. A system prompt lets you provide context and instructions to Anthropic Claude, such as specifying a particular goal or role. Specify a system prompt in the `system` field, as shown in the following example. 

```
"system": "You are Claude, an AI assistant created by Anthropic to be helpful,
                harmless, and honest. Your goal is to provide informative and substantive responses
                to queries while avoiding potential harms."
```

For more information, see [System prompts](https://docs.anthropic.com/en/docs/system-prompts) in the Anthropic documentation.

### Multimodal prompts


A multimodal prompt combines multiple modalities (images and text) in a single prompt. You specify the modalities in the `content` input field. The following example shows how you could ask Anthropic Claude to describe the content of a supplied image. For example code, see [Multimodal code examples](api-inference-examples-claude-messages-code-examples.md#api-inference-examples-claude-multimodal-code-example). 

```
{
    "anthropic_version": "bedrock-2023-05-31", 
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": "iVBORw..."
                    }
                },
                {
                    "type": "text",
                    "text": "What's in these images?"
                }
            ]
        }
    ]
}
```

Each image you include in a request counts towards your token usage. For more information, see [Image costs](https://docs.anthropic.com/claude/docs/vision#image-costs) in the Anthropic documentation.

# Tool use


**Warning**  
Several functions below are offered in beta as indicated. These features are made available to you as a "Beta Service" as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA.

With Anthropic Claude models, you can specify a tool that the model can use to answer a message. For example, you could specify a tool that gets the most popular song on a radio station. If the user passes the message *What's the most popular song on WZPZ?*, the model determines that the tool you specified can help answer the question. In its response, the model requests that you run the tool on its behalf. You then run the tool and pass the tool result to the model, which then generates a response for the original message. For more information, see [Tool use (function calling)](https://docs.anthropic.com/en/docs/tool-use) in the Anthropic Claude documentation.

**Tip**  
We recommend that you use the Converse API for integrating tool use into your application. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md). 

**Important**  
Claude Sonnet 4.5 now preserves intentional formatting in tool call string parameters. Previously, trailing newlines in string parameters were sometimes incorrectly stripped. This fix ensures that tools requiring precise formatting (like text editors) receive parameters exactly as intended. This is a behind-the-scenes improvement with no API changes required. However, tools with string parameters may now receive values with trailing newlines that were previously stripped.

**Note**  
Claude Sonnet 4.5 includes automatic optimizations to improve model performance. These optimizations may add small amounts of tokens to requests, but you are not billed for these system-added tokens.

You specify the tools that you want to make available to a model in the `tools` field. The following example is for a tool that gets the most popular songs on a radio station. 

```
[
    {
        "name": "top_song",
        "description": "Get the most popular song played on a radio station.",
        "input_schema": {
            "type": "object",
            "properties": {
                "sign": {
                    "type": "string",
                    "description": "The call sign for the radio station for which you want the most popular song. Example calls signs are WZPZ and WKRP."
                }
            },
            "required": [
                "sign"
            ]
        }
    }
]
```

When the model needs a tool to generate a response to a message, it returns information about the requested tool, and the input to the tool, in the message `content` field. It also sets the stop reason for the response to `tool_use`.

```
{
    "id": "msg_bdrk_01USsY5m3XRUF4FCppHP8KBx",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 375,
        "output_tokens": 36
    },
    "content": [
        {
            "type": "tool_use",
            "id": "toolu_bdrk_01SnXQc6YVWD8Dom5jz7KhHy",
            "name": "top_song",
            "input": {
                "sign": "WZPZ"
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

In your code, you call the tool on the tools behalf. You then pass the tool result (`tool_result`) in a user message to the model.

```
{
    "role": "user",
    "content": [
        {
            "type": "tool_result",
            "tool_use_id": "toolu_bdrk_01SnXQc6YVWD8Dom5jz7KhHy",
            "content": "Elemental Hotel"
        }
    ]
}
```

In its response, the model uses the tool result to generate a response for the original message.

```
{
    "id": "msg_bdrk_012AaqvTiKuUSc6WadhUkDLP",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "According to the tool, the most popular song played on radio station WZPZ is \"Elemental Hotel\"."
        }
    ],
    "stop_reason": "end_turn"
}
```

## Fine-grained tool streaming


Fine-grained tool streaming is an Anthropic Claude model capability available with Claude Sonnet 4.5, Claude Haiku 4.5, Claude Sonnet 4, and Claude Opus 4. With fine-grained tool streaming, Claude developers can stream tool use parameters without buffering or JSON validation, reducing the latency to begin receiving large parameters.

**Note**  
When using fine-grained tool streaming, you may potentially receive invalid or partial JSON inputs. Please make sure to account for these edge cases in your code.

To use this feature, simply add the header `fine-grained-tool-streaming-2025-05-14` to a tool use request.

Here’s an example of how to specify the fine-grained tool streaming header:

```
{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 1024,
  "anthropic_beta": ["fine-grained-tool-streaming-2025-05-14"],
  "messages": [
    {
      "role": "user",
      "content": "Can you write a long poem and make a file called poem.txt?"
    }
  ],
  "tools": [
    {
      "name": "make_file",
      "description": "Write text to a file",
      "input_schema": {
        "type": "object",
        "properties": {
          "filename": {
            "type": "string",
            "description": "The filename to write text to"
          },
          "lines_of_text": {
            "type": "array",
            "description": "An array of lines of text to write to the file"
          }
        },
        "required": [
          "filename",
          "lines_of_text"
        ]
      }
    }
  ]
}
```

In this example, fine-grained tool streaming enables Claude to stream the lines of a long poem into the tool call `make_file` without buffering to validate if the `lines_of_text` parameter is valid JSON. This means you can see the parameter stream as it arrives, without having to wait for the entire parameter to buffer and validate.

With fine-grained tool streaming, tool use chunks start streaming faster, and are often longer and contain fewer word breaks. This is due to differences in chunking behavior.

For example, without fine-grained streaming (15s delay):

```
Chunk 1: '{"'
Chunk 2: 'query": "Ty'
Chunk 3: 'peScri'
Chunk 4: 'pt 5.0 5.1 '
Chunk 5: '5.2 5'
Chunk 6: '.3'
Chunk 8: ' new f'
Chunk 9: 'eatur'
...
```

With fine-grained streaming (3s delay):

```
Chunk 1: '{"query": "TypeScript 5.0 5.1 5.2 5.3'
Chunk 2: ' new features comparison'
```

**Note**  
Because fine-grained streaming sends parameters without buffering or JSON validation, there is no guarantee that the resulting stream will complete in a valid JSON string. Particularly, if the stop reason `max_tokens` is reached, the stream may end midway through a parameter and may be incomplete. You will generally have to write specific support to handle when `max_tokens` is reached.

## Computer use (Beta)


Computer use is an Anthropic Claude model capability (in beta) available with Claude 3.5 Sonnet v2, Claude Sonnet 4.5, Claude Haiku 4.5, Claude 3.7 Sonnet, Claude Sonnet 4, and Claude Opus 4. With computer use, Claude can help you automate tasks through basic GUI actions.

**Warning**  
Computer use feature is made available to you as a ‘Beta Service’ as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA. Please be aware that the Computer Use API poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the Computer Use API to interact with the Internet. To minimize risks, consider taking precautions such as:  
Operate computer use functionality in a dedicated Virtual Machine or container with minimal privileges to prevent direct system attacks or accidents.
To prevent information theft, avoid giving the Computer Use API access to sensitive accounts or data.
Limiting the computer use APIs internet access to required domains to reduce exposure to malicious content.
To ensure proper oversight, keep a human in the loop for sensitive tasks (such as making decisions that could have meaningful real-world consequences) and for anything requiring affirmative consent (such as accepting cookies, executing financial transactions, or agreeing to terms of service).
Any content that you enable Claude to see or access can potentially override instructions or cause Claude to make mistakes or perform unintended actions. Taking proper precautions, such as isolating Claude from sensitive surfaces, is essential — including to avoid risks related to prompt injection. Before enabling or requesting permissions necessary to enable computer use features in your own products, please inform end users of any relevant risks, and obtain their consent as appropriate. 

The computer use API offers several pre-defined computer use tools for you to use. You can then create a prompt with your request, such as “send an email to Ben with the notes from my last meeting” and a screenshot (when required). The response contains a list of `tool_use` actions in JSON format (for example, scroll\$1down, left\$1button\$1press, screenshot). Your code runs the computer actions and provides Claude with screenshot showcasing outputs (when requested).

Since the release of Claude 3.5 v2, the tools parameter has been updated to accept polymorphic tool types; a `tool.type` property was added to distinguish them. `type` is optional; if omitted, the tool is assumed to be a custom tool (previously the only tool type supported). To access computer use, you must use the `anthropic_beta` parameter, with a corresponding enum, whose value depends on the model version in use. See the following table for more information.

Only requests made with this parameter and enum can use the computer use tools. It can be specified as follows: `"anthropic_beta": ["computer-use-2025-01-24"]`.


| Model | Beta header | 
| --- | --- | 
|  Claude Opus 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4 Claude 3.7 Sonnet  | computer-use-2025-01-24 | 
| Claude 3.5 Sonnet v2 | computer-use-2024-10-22 | 

For more information, see [Computer use (beta)](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) in the Anthropic documentation.

The following is an example response that assumes the request contained a screenshot of your desktop with a Firefox icon. 

```
{
    "id": "msg_123",
    "type": "message",
    "role": "assistant",
    "model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
    "content": [
        {
            "type": "text",
            "text": "I see the Firefox icon. Let me click on it and then navigate to a weather website."
        },
        {
            "type": "tool_use",
            "id": "toolu_123",
            "name": "computer",
            "input": {
                "action": "mouse_move",
                "coordinate": [
                    708,
                    736
                ]
            }
        },
        {
            "type": "tool_use",
            "id": "toolu_234",
            "name": "computer",
            "input": {
                "action": "left_click"
            }
        }
    ],
    "stop_reason": "tool_use",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 3391,
        "output_tokens": 132
    }
}
```

## Anthropic defined tools


Anthropic provides a set of tools to enable certain Claude models to effectively use computers. When specifying an Anthropic defined tool, the `description` and `tool_schema` fields are not necessary or allowed. Anthropic defined tools are defined by Anthropic, but you must explicitly evaluate the results of the tool and return the `tool_results` to Claude. As with any tool, the model does not automatically execute the tool. Each Anthropic defined tool has versions optimized for specific models Claude 3.5 Sonnet (new) and Claude 3.7 Sonnet:


| Model | Tool | Notes | 
| --- | --- | --- | 
|  Claude Claude Opus 4.1 Claude Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4  |  <pre>{ <br />    "type": "text_editor_20250124", <br />    "name": "str_replace_based_edit_tool" <br />}</pre>  | Update to existing `str_replace_editor` tool | 
|  Claude 3.7 Sonnet  |  <pre>{ <br />    "type": "computer_20250124", <br />    "name": "computer" <br />}</pre>  |  Includes new actions for more precise control  | 
|  Claude 3.7 Sonnet  |  <pre>{ <br />    "type": "text_editor_20250124", <br />    "name": "str_replace_editor"<br />}</pre>  | Same capabilities as 20241022 version | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "bash_20250124", <br />    "name": "bash" <br />}</pre>  |  Same capabilities as 20241022 version  | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "text_editor_20241022", <br />    "name": "str_replace_editor"<br />}</pre>  | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "bash_20241022", <br />    "name": "bash"<br />}</pre>  | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "computer_20241022", <br />    "name": "computer"<br />}</pre>  | 

The `type` field identifies the tool and its parameters for validation purposes, the `name` field is the tool name exposed to the model.

If you want to prompt the model to use one of these tools, you can explicitly refer the tool by the `name` field. The `name` field must be unique within the tool list; you cannot define a tool with the same `name` as an Anthropic defined tool in the same API call.

## Automatic tool call clearing (Beta)


**Warning**  
Automatic tool call clearing is made available as a "Beta Service" as defined in the AWS Service Terms.

**Note**  
This feature is currently supported on Claude Sonnet 4/4.5, Claude Haiku 4.5, and Claude Opus 4/4.1/4.5.

Automatic tool call clearing is an Anthropic Claude model capability (in beta). With this feature, Claude can automatically clear old tool use results as you approach token limits, allowing for more efficient context management in multi-turn tool use scenarios. To use tool use clearing, you need to add `context-management-2025-06-27` to the list of beta headers on the anthropic\$1beta request parameter. You will also need to specify the use of `clear_tool_uses_20250919` and choose from the following configuration options.

These are the available controls for the `clear_tool_uses_20250919` context management strategy. All are optional or have defaults:


| **Configuration Option** | **Description** | 
| --- | --- | 
|  `trigger` default: 100,000 input tokens  |  Defines when the context editing strategy activates. Once the prompt exceeds this threshold, clearing will begin. You can specify this value in either input\$1tokens or tool\$1uses.  | 
|  `keep` default: 3 tool uses  |  Defines how many recent tool use/result pairs to keep after clearing occurs. The API removes the oldest tool interactions first, preserving the most recent ones. Helpful when the model needs access to recent tool interactions to continue the conversation effectively.  | 
|  `clear_at_least` (optional)  |  Ensures a minimum number of tokens are cleared each time the strategy activates. If the API can't clear at least the specified amount, the strategy will not be applied. This is useful for determining whether context clearing is worth breaking your prompt cache for.  | 
|  `exclude_tools` (optional)  |  List of tool names whose tool uses and results should never be cleared. Useful for preserving important context.  | 
|  `clear_tool_inputs` (optional, default False)  |  Controls whether the tool call parameters are cleared along with the tool results. By default, only the tool results are cleared while keeping Claude's original tool calls visible, so Claude can see what operations were performed even after the results are removed.  | 

**Note**  
Tool clearing will invalidate your cache if your prefixes contain your tools.

------
#### [ Request ]

```
response = client.beta.messages.create(
    betas=["context-management-2025-06-27"],
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Create a simple command line calculator app using Python"
       }
    ],
    tools=[
        {
            "type": "text_editor_20250728",
            "name": "str_replace_based_edit_tool",
            "max_characters": 10000
        },
       {
            "type": "web_search_20250305",
            "name": "web_search",
            "max_uses": 3
       }
    ],
    extra_body={
        "context_management": {
            "edits": [
                {
                    "type": "clear_tool_uses_20250919",
                # The below parameters are OPTIONAL:
                    # Trigger clearing when threshold is exceeded
                    "trigger": {
                        "type": "input_tokens",
                        "value": 30000
                    },
                    # Number of tool uses to keep after clearing
                    "keep": {
                        "type": "tool_uses",
                        "value": 3
                    },
                    # Optional: Clear at least this many tokens
                    "clear_at_least": {
                        "type": "input_tokens",
                        "value": 5000
                    },
                    # Exclude these tools uses from being cleared
                    "exclude_tools": ["web_search"]
                }
            ]
       }
    }
 )
```

------
#### [ Response ]

```
{
    "id": "msg_123",
    "type": "message",
    "role": "assistant",
    "content": [
        {
            "type": "tool_use",
            "id": "toolu_456",
            "name": "data_analyzer",
            "input": {
                "data": "sample data"
            }
        }
    ],
    "context_management": {
        "applied_edits": [
            {
                "type": "clear_tool_uses_20250919",
                "cleared_tool_uses": 8,  # Number of tool use/result pairs that were cleared
                "cleared_input_tokens": 50000  # Total number of input tokens removed from the prompt
            }
        ]
    }
    "stop_reason": "tool_use",
    "usage": {
        "input_tokens": 150,
        "output_tokens": 50
    }
}
```

------
#### [ Streaming Response ]

```
data: {"type": "message_start", "message": {"id": "msg_123", "type": "message", "role": "assistant"}}

data: {"type": "content_block_start", "index": 0, "content_block": {"type": "tool_use", "id": "toolu_456", "name": "data_analyzer", "input": {}}}

data: {"type": "content_block_delta", "index": 0, "delta": {"type": "input_json_delta", "partial_json": "{\"data\": \"sample"}}

data: {"type": "content_block_delta", "index": 0, "delta": {"type": "input_json_delta", "partial_json": " data\"}"}}

data: {"type": "content_block_stop", "index": 0}

data: {"type": "message_delta", "delta": {"stop_reason": "tool_use"}}

data: {"type": "message_stop"}

{
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null,
  },
  "usage": {
    "output_tokens": 1024
  },
  "context_management": {
    "applied_edits": [...],
  }
}
```

------

**Note**  
Bedrock does not currently support `clear_tool_uses_20250919` context management on the CountTokens API.

## Memory Tool (Beta)


**Warning**  
Memory Tool is made available as a "Beta Service" as defined in the AWS Service Terms.

Claude Sonnet 4.5 includes a new memory tool that provide customers a way to manage memory across conversations. With this feature, customers can allow Claude to retrieve information outside the context window by providing access to a local directory. This will be available as a beta feature. To use this feature, you must use the `context-management-2025-06-27` beta header.

Tool definition:

```
{
  "type": "memory_20250818",
  "name": "memory"
}
```

Example Request:

```
{
    "max_tokens": 2048,
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": ["context-management-2025-06-27"],
    "tools": [{
        "type": "memory_20250818",
        "name": "memory"
    }],
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": "Remember that my favorite color is blue and I work at Amazon?"}]
        }
    ]
}
```

Example Response:

```
{
    "id": "msg_vrtx_014mQ5ficCRB6PEa5k5sKqHd",
    "type": "message",
    "role": "assistant",
    "model": "claude-sonnet-4-20250514",
    "content": [
        {
            "type": "text",
            "text": "I'll start by checking your memory directory and then record this important information about you."
        },
        {
            "type": "tool_use",
            "id": "toolu_vrtx_01EU1UrCDigyPMRntr3VYvUB",
            "name": "memory",
            "input": {
                "command": "view",
                "path": "/memories"
            }
        }
    ],
    "stop_reason": "tool_use",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 1403,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0,
        "output_tokens": 87
    },
    "context_management": {
        "applied_edits": []
    }
}
```

## Cost considerations for tool use


Tool use requests are priced based on the following factors:

1. The total number of input tokens sent to the model (including in the tools parameter).

1. The number of output tokens generated.

Tools are priced the same as all other Claude API requests, but do include additional tokens per request. The additional tokens from tool use come from the following:
+ The `tools` parameter in the API requests. For example, tool names, descriptions, and schemas.
+ Any `tool_use` content blocks in API requests and responses.
+ Any `tool_result` content blocks in API requests.

When you use tools, the Anthropic models automatically include a special system prompt that enables tool use. The number of tool use tokens required for each model is listed in the following table. This table excludes the additional tokens described previously. Note that this table assumes at least one tool is provided. If no tools are provided, then a tool choice of none uses 0 additional system prompt tokens.


| Model | Tool choice | Tool use system prompt token count | 
| --- | --- | --- | 
|  Claude Opus 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4 Claude 3.7 Sonnet Claude 3.5 Sonnet v2  | auto or none | 346 | 
|  Claude Opus 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4 Claude 3.7 Sonnet Claude 3.5 Sonnet v2  | any or tool | 313 | 
|  Claude 3.5 Sonnet  | auto or none | 294 | 
|  Claude 3.5 Sonnet  | any or tool | 261 | 
|  Claude 3 Opus  | auto or none | 530 | 
|  Claude 3 Opus  | any or tool | 281 | 
|  Claude 3 Sonnet  | auto or none | 159 | 
|  Claude 3 Sonnet  | any or tool | 235 | 
|  Claude 3 Haiku  | auto or none | 264 | 
|  Claude 3 Haiku  | any or tool | 340 | 

## Tool search tool (beta)


Tool Search Tool allows Claude to work with hundreds or even thousands of tools without loading all their definitions into the context window upfront. Instead of declaring all tools immediately, you can mark them with `defer_loading: true`, and Claude finds and loads only the tools it needs through the tool search mechanism.

To access this feature you must use the beta header `tool-search-tool-2025-10-19`. Note that this feature is currently only available via the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) APIs.

Tool definition:

```
{
    "type": "tool_search_tool_regex",
    "name": "tool_search_tool_regex"
}
```

Request example:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": [
        "tool-search-tool-2025-10-19"
    ],
    "max_tokens": 4096,
    "tools": [{
            "type": "tool_search_tool_regex",
            "name": "tool_search_tool_regex"
        },
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            },
            "defer_loading": true
        },
        {
            "name": "search_files",
            "description": "Search through files in the workspace",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string"
                    },
                    "file_types": {
                        "type": "array",
                        "items": {
                            "type": "string"
                        }
                    }
                },
                "required": ["query"]
            },
            "defer_loading": true
        }
    ],
    "messages": [{
        "role": "user",
        "content": "What's the weather in Seattle?"
    }]
}
```

Response example

```
{
    "role": "assistant",
    "content": [{
            "type": "text",
            "text": "I'll search for the appropriate tools to help with this task."
        },
        {
            "type": "server_tool_use",
            "id": "srvtoolu_01ABC123",
            "name": "tool_search_tool_regex",
            "input": {
                "pattern": "weather"
            }
        },
        {
            "type": "tool_search_tool_result",
            "tool_use_id": "srvtoolu_01ABC123",
            "content": {
                "type": "tool_search_tool_search_result",
                "tool_references": [{
                    "type": "tool_reference",
                    "tool_name": "get_weather"
                }]
            }
        },
        {
            "type": "text",
            "text": "Now I can check the weather."
        },
        {
            "type": "tool_use",
            "id": "toolu_01XYZ789",
            "name": "get_weather",
            "input": {
                "location": "Seattle",
                "unit": "fahrenheit"
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

Streaming example

```
# Event 1: content_block_start(with complete server_tool_use block) {
    "type": "content_block_start",
    "index": 0,
    "content_block": {
        "type": "server_tool_use",
        "id": "srvtoolu_01ABC123",
        "name": "tool_search_tool_regex"
    }
}

# Event 2: content_block_delta(input JSON streamed) {
    "type": "content_block_delta",
    "index": 0,
    "delta": {
        "type": "input_json_delta",
        "partial_json": "{\"regex\": \".*weather.*\"}"
    }
}

# Event 3: content_block_stop(tool_use complete) {
    "type": "content_block_stop",
    "index": 0
}

# Event 4: content_block_start(COMPLETE result in single chunk) {
    "type": "content_block_start",
    "index": 1,
    "content_block": {
        "type": "tool_search_tool_result",
        "tool_use_id": "srvtoolu_01ABC123",
        "content": {
            "type": "tool_search_tool_search_result",
            "tool_references": [{
                "type": "tool_reference",
                "tool_name": "get_weather"
            }]
        }
    }
}

# Event 5: content_block_stop(result complete) {
    "type": "content_block_stop",
    "index": 1
}
```

**Custom tool search tools**  
You can implement custom tool search tools (for example, using embeddings) by defining a tool that returns `tool_reference` blocks. The custom tool must have `defer_loading: false` while other tools should have `defer_loading: true`. When you define your own Tool Search Tool, it should return a tool result containing `tool_reference` content blocks that point to the tools you want Claude to use.

The expected customer-defined Tool Search Tool result response format:

```
{
    "type": "tool_result",
    "tool_use_id": "toolu_01ABC123",
    "content": [{
            "type": "tool_reference",
            "tool_name": "get_weather"
        },
        {
            "type": "tool_reference",
            "tool_name": "weather_forecast"
        }
    ]
}
```

The `tool_name` must match a tool defined in the request with `defer_loading: true`. Claude will then have access to those tools' full schemas.

**Custom search tools - Detailed example**  
You can implement custom tool search tools (for example, using embeddings or semantic search) by defining a tool that returns `tool_reference` blocks. This enables sophisticated tool discovery mechanisms beyond regex matching.

Request example with custom TST:

```
{
    "model": "claude-sonnet-4-5-20250929",
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": ["tool-search-tool-2025-10-19"],
    "max_tokens": 4096,
    "tools": [{
            "name": "semantic_tool_search",
            "description": "Search for available tools using semantic similarity. Returns the most relevant tools for the given query.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Natural language description of what kind of tool is needed"
                    },
                    "top_k": {
                        "type": "integer",
                        "description": "Number of tools to return (default: 5)"
                    }
                },
                "required": ["query"]
            },
            "defer_loading": false
        },
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            },
            "defer_loading": true
        },
        {
            "name": "search_flights",
            "description": "Search for available flights between locations",
            "input_schema": {
                "type": "object",
                "properties": {
                    "origin": {
                        "type": "string"
                    },
                    "destination": {
                        "type": "string"
                    },
                    "date": {
                        "type": "string"
                    }
                },
                "required": ["origin", "destination", "date"]
            },
            "defer_loading": true
        }
    ],
    "messages": [{
        "role": "user",
        "content": "What's the weather forecast in Seattle for the next 3 days?"
    }]
}
```

Claude's response (calling custom TST):

```
{
    "role": "assistant",
    "content": [{
            "type": "text",
            "text": "I'll search for the appropriate tools to help with weather information."
        },
        {
            "type": "tool_use",
            "id": "toolu_01ABC123",
            "name": "semantic_tool_search",
            "input": {
                "query": "weather forecast multiple days",
                "top_k": 3
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

**Customer-provided tool result**  
After performing semantic search on the tool library, the customer returns matching tool references:

```
{
    "role": "user",
    "content": [{
        "type": "tool_search_tool_result",
        "tool_use_id": "toolu_01ABC123",
        "content": {
            "type": "tool_search_tool_search_result",
            "tool_references": [{
                "type": "tool_reference",
                "tool_name": "get_weather"
            }]
        }
    }]
}
```

Claude's follow-up (using discovered tool)

```
{
    "role": "assistant",
    "content": [{
            "type": "text",
            "text": "I found the forecast tool. Let me get the weather forecast for Seattle."
        },
        {
            "type": "tool_use",
            "id": "toolu_01DEF456",
            "name": "get_weather",
            "input": {
                "location": "Seattle, WA"
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

**Error handling**
+ Setting `defer_loading: true` for all tools (including the Tool Search Tool) will throw a 400 error.
+ Passing a `tool_reference` without a corresponding tool definition will throw a 400 error

## Tool use examples (beta)


Claude Opus 4.5 supports user-provided examples in tool definitions to increase Claude's tool use performance. You can provide examples as full function calls, formatted exactly as real LLM outputs would be, without needing translation into another format. To use this feature you must pass the beta header `tool-examples-2025-10-29`.

Tool definition example:

```
{
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
            }
        },
        "required": ["location"]
    },
    "input_examples": [{
            "location": "San Francisco, CA",
            "unit": "fahrenheit"
        },
        {
            "location": "Tokyo, Japan",
            "unit": "celsius"
        },
        {
            "location": "New York, NY"
        }
    ]
}
```

**Validation rules**
+ Schema conformance: Each example in `input_examples` must be valid according to the tool's `input_schema`.
  + Required fields must be present in at least one example.
  + Field types must match the schema.
  + Enum values must be from the allowed set.
  + If validation fails, return a 400 error with details about which example failed validation.
+ Array requirements: `input_examples` must be an array (can be empty).
  + Empty array `[]` is valid and equivalent to omitting the field.
  + Single example must still be wrapped in an array: `[{...}]`
  + Length limit: start with a limit of 20 examples per tool definition.

Error examples:

```
// Invalid: Example doesn't match schema (missing required field)
{
    "type": "invalid_request_error",
    "message": "Tool 'get_weather' input_examples[0] is invalid: Missing required property 'location'"
}

// Invalid: Example has wrong type for field
{
    "type": "invalid_request_error",
    "message": "Tool 'search_products' input_examples[1] is invalid: Property 'filters.price_range.min' must be a number, got string"
}

// Invalid: input_examples on server-side tool
{
    "type": "invalid_request_error",
    "message": "input_examples is not supported for server-side tool"
}
```

# Extended thinking


Extended thinking gives Claude enhanced reasoning capabilities for complex tasks, while providing varying levels of transparency into its step-by-step thought process before it delivers its final answer. Whenever you enable Claude’s thinking mode, you will need to set a budget for the maximum number of tokens that Claude can use for its internal reasoning process.

The supported models are as follows:


| Model | Model ID | 
| --- | --- | 
| Claude Opus 4.5 | `anthropic.claude-opus-4-5-20251101-v1:0` | 
| Claude Opus 4 | `anthropic.claude-opus-4-20250514-v1:0` | 
| Claude Sonnet 4 | `anthropic.claude-sonnet-4-20250514-v1:0` | 
| Claude Sonnet 4.5 | `anthropic.claude-sonnet-4-5-20250929-v1:0` | 
| Claude Haiku 4.5 | `anthropic.claude-haiku-4-5-20251001-v1:0` | 
| Claude 3.7 Sonnet | `anthropic.claude-3-7-sonnet-20250219-v1:0` | 
| Claude Sonnet 4.5 | `anthropic.claude-opus-4-5-20251101-v1:0` | 

**Note**  
API behavior differs between Claude 3.7 and Claude 4 models. For more information, see [Differences in thinking across model versions](claude-messages-thinking-differences.md).

**Topics**
+ [

## Best practices and considerations for extended thinking
](#claude-messages-extended-thinking-bps)
+ [

## How extended thinking works
](#claude-messages-how-extended-thinking-works)
+ [

## How to use extended thinking
](#claude-messages-use-extended-thinking)
+ [

## Extended thinking with tool use
](#claude-messages-extended-thinking-tool-use)
+ [

## Thinking block clearing (beta)
](#claude-messages-thinking-block-clearing)
+ [

## Extended thinking with prompt caching
](#claude-messages-extended-thinking-prompt-caching)
+ [

## Understanding thinking block caching behavior
](#claude-messages-extended-thinking-caching-behavior)
+ [

## Max tokens and context window size with extended thinking
](#claude-messages-extended-thinking-max-tokens)
+ [

## Extended thinking token cost considerations
](#claude-messages-extended-thinking-cost)

## Best practices and considerations for extended thinking
Best practices

Usage guidelines
+ **Task selection**: Use extended thinking for particularly complex tasks that benefit from step-by-step reasoning like math, coding, and analysis.
+ **Context handling**: You do not need to remove previous thinking blocks yourself. The Anthropic API automatically ignores thinking blocks from previous turns and they are not included when calculating context usage.
+ **Prompt engineering**: Review Anthropic's [extended thinking prompting tips](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/extended-thinking-tips) if you want to maximize Claude's thinking capabilities.

Performance considerations
+ **Response times**: Be prepared for potentially longer response times due to the additional processing required for the reasoning process. Factor in that generating thinking blocks might increase the overall response time.
+ **Streaming requirements**: Streaming is required when `max_tokens` is greater than 21,333. When streaming, be prepared to handle both `thinking` and `text` content blocks as they arrive.

Feature compatibility
+ Thinking isn't compatible with `temperature`, `top_p`, or `top_k` modifications or [forced tool use](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use#forcing-tool-use).
+ You cannot pre-fill responses when thinking is enabled.
+ Changes to the thinking budget invalidate cached prompt prefixes that include messages. However, cached system prompts and tool definitions will continue to work when thinking parameters change.

Working with thinking budgets
+ **Budget optimizations**: The minimum budget is 1,024 tokens. Anthropic suggests starting at the minimum and increasing the thinking budget incrementally to find the optimal range for your use case. Larger token counts might allow for more comprehensive and nuanced reasoning, but there can also be diminishing returns depending on the task. The thinking budget is a target rather than a strict limit - actual token usage may vary based on the task.
+ **Minimum and optimal settings**: The minimum budget is 1,024 tokens. We suggest starting at the minimum and increasing the thinking budget incrementally to find the optimal range for Claude to perform well for your use case. Higher token counts might allow you to achieve more comprehensive and nuanced reasoning, but there might also be diminishing returns depending on the task. The thinking budget is a target rather than a strict limit - actual token usage can vary based on the task.
+ **Experimentation**: The model might perform differently at different max thinking budget settings. Increasing the max thinking budget can make the model think better or harder, at the tradeoff of increased latency. For critical tasks, consider testing different budget settings to find the optimal balance between quality and performance.
+ **Large budgets**: For thinking budgets above 32K, we recommend using batch processing to avoid networking issues. Requests pushing the model to think above 32K tokens causes long running requests that might result in system timeouts and open connection limits. Please note that `max_tokens` limits vary among Claude models. For more information, see [Max tokens and context window size with extended thinking](#claude-messages-extended-thinking-max-tokens).
+ **Token usage tracking**: Monitor thinking token usage to optimize costs and performance.

## How extended thinking works


When extended thinking is turned on, Claude creates `thinking` content blocks where it outputs its internal reasoning. Claude incorporates insights from this reasoning before crafting a final response. The API response will include `thinking` content blocks, followed by `text` content blocks.

Here’s an example of the default response format:

```
{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text", 
      "text": "Based on my analysis..."
    }
  ]
}
```

For more information about the response format of extended thinking, see Anthropic’s Messages API [Request and Response](model-parameters-anthropic-claude-messages-request-response.md).

## How to use extended thinking


To turn on extended thinking, add a `thinking` object, with the `thinking` parameter set to enabled and the `budget_tokens` set to a specified token budget for extended thinking.

The `budget_tokens` parameter determines the maximum number of tokens Claude is allowed to use for its internal reasoning process. In Claude 4 models, this limit applies to full thinking tokens, and not to the summarized output. Larger budgets can improve response quality by enabling more thorough analysis for complex problems, although Claude might not use the entire budget allocated, especially at ranges above 32K.

The value of `budget_tokens` must be set to a value less than `max_tokens`. However, when using [Interleaved thinking (beta)](#claude-messages-extended-thinking-tool-use-interleaved) with tools, you can exceed this limit because the token limit becomes your entire context window (200K tokens).

### Summarized thinking


With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude’s full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.

Here are some important considerations for summarized thinking:
+ You’re charged for the full thinking tokens generated by the original request, not the summary tokens.
+ The billed output token count will not match the count of tokens you see in the response.
+ The prompt provided to the summarizer model is subject to change.
+ The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.

**Note**  
Claude 3.7 Sonnet still returns the full thinking output.  
To access the full thinking output for Claude 4 models, contact your account team.

### Streaming thinking


You can stream extended thinking responses using server-sent events (SSE). When streaming is enabled for extended thinking, you receive thinking content via `thinking_delta` events. Streamed events are not guaranteed to return at a constant rate. There can be delays between streaming events. For more documentation on streaming via the Messages API, see [Streaming messages](https://docs.anthropic.com/en/docs/build-with-claude/streaming).

Here’s how to handle streaming with thinking using **InvokeModelWithResponseStream**:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 10000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 4000
    },
    "messages": [
        {
            "role": "user",
            "content": "What is 27 * 453?"
        }
    ]
}
```

Response:

```
event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}
```

**About streaming behavior with thinking**  
When using streaming with thinking enabled, you might notice that text sometimes arrives in larger chunks alternating with smaller, token-by-token delivery. This is expected behavior, especially for thinking content. The streaming system needs to process content in batches for optimal performance, which can result in this delivery pattern.

## Extended thinking with tool use


Extended thinking can be used alongside [Tool use](model-parameters-anthropic-claude-messages-tool-use.md) allowing Claude to reason through tool selection and results processing. When using extended thinking with tool use, be aware of the following limitations:
+ **Tool choice limitation**: Tool use with thinking only supports `tool_choice: any`. It does not support providing a specific tool, `auto`, or any other values.
+ **Preserving thinking blocks**: During tool use, you must pass thinking blocks back to the API for the last assistant message. Include the complete unmodified block back to the API to maintain reasoning continuity.

Here is how context window management works with tools:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 10000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 4000
    },
  "tools": [
  {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "input_schema": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string"
        }
      },
      "required": [
        "location"
      ]
    }
  }
],
    "messages": [
        {
            "role": "user",
            "content": "What's the weather in Paris?"
        }
    ]
}
```

The first response is the following:

```
{
    "content": [
        {
            "type": "thinking",
            "thinking": "The user wants to know the current weather in Paris. I have access to a function `get_weather`...",
            "signature": "BDaL4VrbR2Oj0hO4XpJxT28J5TILnCrrUXoKiiNBZW9P+nr8XSj1zuZzAl4egiCCpQNvfyUuFFJP5CncdYZEQPPmLxYsNrcs...."
        },
        {
            "type": "text",
            "text": "I can help you get the current weather information for Paris. Let me check that for you"
        },
        {
            "type": "tool_use",
            "id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
            "name": "get_weather",
            "input": {
                "location": "Paris"
            }
        }
    ]
}
```

Continuing the conversation with tool use will generate another response. Notice that the `thinking_block` is passed in as well as the `tool_use_block`. If this is not passed in, an error occurs.

```
{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 10000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 4000
  },
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string"
          }
        },
        "required": [
          "location"
        ]
      }
    }
  ],
      "messages": [
        {
          "role": "user",
          "content": "What's the weather in Paris?"
        },
        {
          "role": "assistant",
          "content": [
            {
              "type": "thinking",
              "thinking": "The user wants to know the current weather in Paris. I have access to a function `get_weather`…",
              "signature": "BDaL4VrbR2Oj0hO4XpJxT28J5TILnCrrUXoKiiNBZW9P+nr8XSj1zuZzAl4egiCCpQNvfyUuFFJP5CncdYZEQPPmLxY",
            },
            {
              "type": "tool_use",
              "id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
              "name": "get_weather",
              "input": {
                "location": "Paris"
              }
            }
          ]
        },
        {
          "role": "user",
          "content": [
            {
              "type": "tool_result",
              "tool_use_id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
              "content": "Current temperature: 88°F"
            }
          ]
        }
      ]
    }
```

The API response will now only include text

```
{
  "content": [
    {
      "type": "text",
      "text": "Currently in Paris, the temperature is 88°F (31°C)"
    }
  ]
}
```

### Preserve thinking blocks


During tool use, you must pass thinking blocks back to the API, and you must include the complete unmodified block back to the API. This is critical for maintaining the model’s reasoning flow and conversation integrity.

**Tip**  
While you can omit `thinking` blocks from prior `assistant` role turns, we suggest always passing back all thinking blocks to the API for any multi-turn conversation. The API will do the following:  
Automatically filter the provided thinking blocks
Use the relevant thinking blocks necessary to preserve the model’s reasoning
Only bill for the input tokens for the blocks shown to Claude

When Claude invokes tools, it is pausing its construction of a response to await external information. When tool results are returned, Claude will continue building that existing response. This necessitates preserving thinking blocks during tool use, for the following reasons:
+ **Reasoning continuity**: The thinking blocks capture Claude’s step-by-step reasoning that led to tool requests. When you post tool results, including the original thinking ensures Claude can continue its reasoning from where it left oﬀ.
+ **Context maintenance**: While tool results appear as user messages in the API structure, they’re part of a continuous reasoning flow. Preserving thinking blocks maintains this conceptual flow across multiple API calls.

**Important**  
When providing thinking blocks, the entire sequence of consecutive thinking blocks must match the outputs generated by the model during the original request; you cannot rearrange or modify the sequence of these blocks.

### Interleaved thinking (beta)


**Warning**  
Interleaved thinking is made available to you as a ‘Beta Service’ as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA.

Claude 4 models support interleaved thinking, a feature that enables Claude to think between tool calls and run more sophisticated reasoning after receiving tool results. This allows for more complex agentic interactions where Claude can do the following:
+ Reason about the results of a tool call before deciding what to do next
+ Chain multiple tool calls with reasoning steps in between
+ Make more nuanced decisions based on intermediate results

To enable interleaved thinking, add the beta header `interleaved-thinking-2025-05-14` to your API request.

**Note**  
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.

## Thinking block clearing (beta)


**Warning**  
Thinking block clearing is made available as a "Beta Service" as defined in the AWS Service Terms.

**Note**  
This feature is currently supported on Claude Sonnet 4/4.5, Claude Haiku 4.5, and Claude Opus 4/4.1/4.5.

Thinking block clearing is an Anthropic Claude model capability (in beta). With this feature, Claude can automatically clear older thinking blocks from previous turns. To use Thinking block clearing, you need to add `context-management-2025-06-27` to the list of beta headers on the anthropic\$1beta request parameter. You will also need to specify the use of `clear_thinking_20251015` and choose from the following configuration options.

These are the available controls for the `clear_thinking_20251015` context management strategy. All are optional or have defaults:


| **Configuration Option** | **Description** | 
| --- | --- | 
|  `keep` default: 1 thinking turn  |  Defines how many recent assistant turns with thinking blocks to preserve. Use `{"type": "thinking_turns", "value": N}` where N must be > 0 to keep the last N turns, or `{"type": "all"}` to keep all thinking blocks.  | 

------
#### [ Request ]

```
{
      "anthropic_version": "bedrock-2023-05-31",
      "max_tokens": 10000,
      "anthropic_beta": [
        "context-management-2025-06-27"
      ],
      "thinking": {
        "type": "enabled",
        "budget_tokens": 4000
      },
      "tools": [
        {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "input_schema": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string"
              }
            },
            "required": [
              "location"
            ]
          }
        }
      ],
      "messages": [
        {
          "role": "user",
          "content": "What's the weather in Paris?"
        },
        {
          "role": "assistant",
          "content": [
            {
              "type": "thinking",
              "thinking": "The user is asking for the weather in Paris. I have access to a get_weather function that takes a location as a parameter. I have all the information I need to make this call - the location is \"Paris\".\n\nLet me call the get_weather function with \"Paris\" as the location.",
              "signature": "ErgDCkgIChABGAIqQC/Ccv8GC+5VfcMEiq78XmpU2Ef2cT+96pHKMedKcRNuPz1x0kFlo5HBpW0r1NcQFVQUPuj6PDmP7jdHY7GsrUwSDKNBMogjaM7wYkwfPhoMswjlmfF09JLjZfFlIjB03NkghGOxLbr3VCQHIY0lMaV9UBvt7ZwTpJKzlz+mulBysfvAmDfcnvdJ/6CZre4qnQJsTZaiXdEgASwPIc5jOExBguerrtYSWVC/oPjSi7KZM8PfhP/SPXupyLi8hwYxeqomqkeG7AQhD+3487ecerZJcpJSOSsf0I1OaMpmQEE/b7ehnvTV/A4nLhxIjP4msyIBW+dVwHNFRFlpJLBHUJvN99b4run6YmqBSf4y9TyNMfOr+FtfxedGE0HfJMBd4FHXmUFyW5y91jAHMWqwNxDgacaKkFCAMaqce5rm0ShOxXn1uwDUAS3jeRP26Pynihq8fw5DQwlqOpo7vvXtqb5jjiCmqfOe6un5xeIdhhbzWddhEk1Vmtg7I817pM4MZjVaeQN02drPs8QgDxihnP6ZooGhd6FCBP2X3Ymdlj5zMlbVHxmSkA4wcNtg4IAYAQ=="
            },
            {
              "type": "tool_use",
              "id": "toolu_bdrk_01U7emCvL5v5z5GT7PDr2vzc",
              "name": "get_weather",
              "input": {
                "location": "Paris"
              }
            }
          ]
        },
        {
          "role": "user",
          "content": [
            {
              "type": "tool_result",
              "tool_use_id": "toolu_bdrk_01U7emCvL5v5z5GT7PDr2vzc",
              "content": "Current temperature: 88°F"
            }
          ]
        }
      ],
      "context_management": {
        "edits": [
          {
            "type": "clear_thinking_20251015",
            "keep": {
              "type": "thinking_turns",
              "value": 1
            }
          }
        ]
      }
    }
```

------
#### [ Response ]

```
{
      "model": "claude-haiku-4-5-20251001",
      "id": "msg_bdrk_01KyTbyFbdG2kzPwWMJY1kum",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "The current weather in Paris is **88°F** (approximately 31°C). It's quite warm! If you need more detailed information like humidity, wind conditions, or a forecast, please let me know."
        }
      ],
      "stop_reason": "end_turn",
      "stop_sequence": null,
      "usage": {
        "input_tokens": 736,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0,
        "cache_creation": {
          "ephemeral_5m_input_tokens": 0,
          "ephemeral_1h_input_tokens": 0
        },
        "output_tokens": 47
      },
      "context_management": {
        "applied_edits": [...]
      }
    }
```

------

## Extended thinking with prompt caching


[Prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html) with thinking has several important considerations:

**Thinking block context removal**
+ Thinking blocks from previous turns are removed from context, which can affect cache breakpoints.
+ When continuing conversations with tool use, thinking blocks are cached and count as input tokens when read from cache. This creates a tradeoff where thinking blocks don't consume context window space visually, but they will still count towards your input token usage when cached.
+ If thinking becomes disabled, requests will fail if you pass thinking content in the current tool use turn. In other contexts, thinking content passed to the API is simply ignored.

**Cache invalidation patterns**
+ Changes to thinking parameters (such as enabling, disabling, or altering the budget allocation) invalidate message cache breakpoints.
+ [Interleaved thinking (beta)](#claude-messages-extended-thinking-tool-use-interleaved) amplifies cache invalidation, as thinking blocks can occur between multiple tool calls.
+ System prompts and tools remain cached despite thinking parameter changes or block removal.

**Note**  
While thinking blocks are removed for aching and context calculations, they must be preserved when continuing conversations with tool use, especially with interleaved thinking.

## Understanding thinking block caching behavior


When using extended thinking with tool use, thinking blocks exhibit specific caching behavior that affects token counting. The following sequence demonstrates how this works.

1. Caching only occurs when you make a subsequent request that includes tool results.

1. When the subsequent request is made, the previous conversation history (including thinking blocks) can be cached.

1. These cached thinking blocks count as input tokens in your usage metrics when they are read from the cache.

1. When a non-tool-result user block is included, all previous thinking blocks are ignored and stripped from context.

Here is a detailed example flow:

Request 1:

```
User: "What's the weather in Paris?"
```

Response 1:

```
[thinking_block 1] + [tool_use block 1]
```

Request 2:

```
User: "What's the weather in Paris?",
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True]
```

Response 2:

```
[thinking_block 2] + [text block 2]
```

Request 2 writes a cache of the request content (not the response). The cache includes the original user message, the first thinking block, tool use block, and the tool result.

Request 3:

```
User: ["What's the weather in Paris?"],
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True],
Assistant: [thinking_block_2] + [text block 2],
User: [Text response, cache=True]
```

Because a non-tool-result user block was included, all previous thinking blocks are ignored. This request will be processed the same as the following request:

Request 3 Alternate:

```
User: ["What's the weather in Paris?"]
Assistant: [tool_use block 1]
User: [tool_result_1, cache=True]
Assistant: [text block 2]
User: [Text response, cache=True]
```

This behavior is consistent whether using regular thinking or interleaved thinking.

## Max tokens and context window size with extended thinking


In older Claude models (prior to Claude 3.7 Sonnet), if the sum of prompt tokens and max\$1tokens exceeded the model’s context window, the system would automatically adjust max\$1tokens to fit within the context limit. This meant you could set a large max\$1tokens value and the system would silently reduce it as needed. With Claude 3.7 and 4 models, `max_tokens` (which includes your thinking budget when thinking is enabled) is enforced as a strict limit. The system now returns a validation error if prompt tokens \$1 max\$1tokens exceeds the context window size.

### The context window with extended thinking


When calculating context window usage with thinking enabled, there are some considerations to be aware of:
+ Thinking blocks from previous turns are removed and not counted towards your context window.
+ Current turn thinking counts towards your `max_tokens` limit for that turn.

The eﬀective context window is calculated as: context window = (current input tokens - previous thinking tokens) \$1 (thinking tokens \$1 encrypted thinking tokens \$1 text output tokens).

### Managing tokens with extended thinking and tool use


When using extended thinking with tool use, thinking blocks must be explicitly preserved and returned with the tool results. The effective context window calculation for extended thinking with tool use becomes the following:

`context window = (current input tokens + previous thinking tokens + tool use tokens) + (thinking tokens + encrypted thinking tokens + text output tokens)`

### Managing tokens with extended thinking


Given the context window and `max_tokens` behavior with extended thinking Claude 3.7 and 4 models, you might need to perform one of the following actions:
+ More actively monitor and manage your token usage.
+ Adjust `max_tokens` values as your prompt length changes.
+ Be aware that previous thinking blocks don’t accumulate in your context window. This change has been made to provide more predictable and transparent behavior, especially as maximum token limits have increased significantly.

## Extended thinking token cost considerations


The thinking process incurs charges for the following:
+ Tokens used during thinking (output tokens)
+ Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
+ Standard text output tokens

**Tip**  
When extended thinking is enabled, a specialized 28 or 29 token system prompt is automatically included to support this feature.

The `budget_tokens` parameter determines the maximum number of tokens Claude is allowed to use for its internal reasoning process. Larger budgets can improve response quality by enabling more thorough analysis for complex problems, although Claude may not use the entire budget allocated, especially at ranges above 32K.

With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter as it represents the total budget across all thinking blocks within one assistant turn.

When using summarized thinking, keep the following information in mind:
+ **Input tokens**: Tokens in your original request
+ **Output tokens (billed)**: The original thinking tokens that Claude generated internally
+ **Output tokens (visible)**: The summarized thinking tokens you see in the response
+ **No charge**: Tokens used to generate the summary
+ The `summary_status` field can indicate if token limits aﬀected summarization
+ The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

# Adaptive thinking


Adaptive thinking is the recommended way to use [Extended thinking](claude-messages-extended-thinking.md) with Claude Opus 4.6. Instead of manually setting a thinking token budget, adaptive thinking lets Claude dynamically decide when and how much to think based on the complexity of each request. Adaptive thinking reliably drives better performance than extended thinking with a fixed `budget_tokens`, and we recommend moving to adaptive thinking to get the most intelligent responses from Claude Opus 4.6. No beta header is required.

The supported models are as follows:


| Model | Model ID | 
| --- | --- | 
| Claude Opus 4.6 | `anthropic.claude-opus-4-6-v1` | 
| Claude Sonnet 4.6 | `anthropic.claude-sonnet-4-6` | 

**Note**  
`thinking.type: "enabled"` and `budget_tokens` are deprecated on Claude Opus 4.6 and will be removed in a future model release. Use `thinking.type: "adaptive"` with the effort parameter instead.  
Older models (Claude Sonnet 4.5, Claude Opus 4.5, etc.) do not support adaptive thinking and require `thinking.type: "enabled"` with `budget_tokens`.

## How adaptive thinking works


In adaptive mode, Claude evaluates the complexity of each request and decides whether and how much to think. At the default effort level (`high`), Claude will almost always think. At lower effort levels, Claude may skip thinking for simpler problems.

Adaptive thinking also automatically enables [Interleaved thinking (beta)](claude-messages-extended-thinking.md#claude-messages-extended-thinking-tool-use-interleaved). This means Claude can think between tool calls, making it especially effective for agentic workflows.

Set `thinking.type` to `"adaptive"` in your API request:

------
#### [ CLI ]

```
aws bedrock-runtime invoke-model \
--model-id "us.anthropic.claude-opus-4-6-v1" \
--body '{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 16000,
"thinking": {
"type": "adaptive"
},
"messages": [
{
"role": "user",
"content": "Three players A, B, C play a game. Each has a jar with 100 balls numbered 1-100. Simultaneously, each draws one ball. A beats B if As number > Bs number (mod 100, treating 100 as 0 for comparison). Similarly for B vs C and C vs A. The overall winner is determined by majority of pairwise wins (ties broken randomly). Is there a mixed strategy Nash equilibrium where each player draws uniformly? If not, characterize the equilibrium."
}
]
}' \
--cli-binary-format raw-in-base64-out \
output.json && cat output.json | jq '.content[] | {type, thinking: .thinking[0:200], text}'
```

------
#### [ Python ]

```
import boto3
import json

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-2'
)

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 16000,
        "thinking": {
            "type": "adaptive"
        },
        "messages": [{
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }]
    })
)

response_body = json.loads(response["body"].read())

for block in response_body["content"]:
    if block["type"] == "thinking":
        print(f"\nThinking: {block['thinking']}")
    elif block["type"] == "text":
        print(f"\nResponse: {block['text']}")
```

------
#### [ TypeScript ]

```
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

async function main() {
    const client = new BedrockRuntimeClient({});

    const command = new InvokeModelCommand({
        modelId: "us.anthropic.claude-opus-4-6-v1",
        body: JSON.stringify({
            anthropic_version: "bedrock-2023-05-31",
            max_tokens: 16000,
            thinking: {
                type: "adaptive"
            },
            messages: [{
                role: "user",
                content: "Explain why the sum of two even numbers is always even."
            }]
        })
    });

    const response = await client.send(command);
    const responseBody = JSON.parse(new TextDecoder().decode(response.body));

    for (const block of responseBody.content) {
        if (block.type === "thinking") {
            console.log(`\nThinking: ${block.thinking}`);
        } else if (block.type === "text") {
            console.log(`\nResponse: ${block.text}`);
        }
    }
}

main().catch(console.error);
```

------

## Adaptive thinking with the effort parameter


You can combine adaptive thinking with the effort parameter to guide how much thinking Claude does. The effort level acts as soft guidance for Claude's thinking allocation:


| Effort level | Thinking behavior | 
| --- | --- | 
| max | Claude always thinks with no constraints on thinking depth. Claude Opus 4.6 only — requests using max on other models will return an error. | 
| high (default) | Claude always thinks. Provides deep reasoning on complex tasks. | 
| medium | Claude uses moderate thinking. May skip thinking for very simple queries. | 
| low | Claude minimizes thinking. Skips thinking for simple tasks where speed matters most. | 

## Prompt caching


Consecutive requests using `adaptive` thinking preserve prompt cache breakpoints. However, switching between `adaptive` and `enabled`/`disabled` thinking modes breaks cache breakpoints for messages. System prompts and tool definitions remain cached regardless of mode changes.

## Tuning thinking behavior


If Claude is thinking more or less often than you'd like, you can add guidance to your system prompt:

```
Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.
```

**Warning**  
Steering Claude to think less often may reduce quality on tasks that benefit from reasoning. Measure the impact on your specific workloads before deploying prompt-based tuning to production. Consider testing with lower effort levels first.

# Thinking encryption


Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API. When streaming responses, the signature is added via a `signature_delta` inside a `content_block_delta` event just before the `content_block_stop` event.

**Note**  
It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise, you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.  
If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

## Thinking redaction in Claude 3.7 Sonnet


**Note**  
The following information applies specifically to Claude 3.7 Sonnet. Claude 4 models handle thinking differently and do not produce redacted thinking blocks.

In Claude 3.7 Sonnet, the following applies:
+ Occasionally Claude’s internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted\$1thinking block. redacted\$1thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.
+ `thinking` and `redacted_thinking` blocks are returned before the text blocks in the response.

When building customer-facing applications that use extended thinking with Claude 3.7 Sonnet, consider the following:
+ Be aware that redacted thinking blocks contain encrypted content that isn’t human-readable.
+ Consider providing a simple explanation like: “Some of Claude’s internal reasoning has been automatically encrypted for safety reasons. This doesn’t aﬀect the quality of responses.”
+ If you display thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks.
+ Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted.
+ Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI.

Here’s an example showing both normal and redacted thinking blocks:

```
{
    "content": [
        {
            "type": "thinking",
            "thinking": "Let me analyze this step by step...",
            "signature":"WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."                },
        {
            "type": "redacted_thinking",
            "data":"EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpP..."
        },
        {
            "type": "text",
            "text": "Based on my analysis..."
        }
    ]
}
```

**Tip**  
Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.  
If you need to test redacted thinking handling in your application, you can use this special test string as your prompt: `ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB`

When passing `thinking` and `redacted_thinking` blocks back to the API in a multi-turn conversation, you must include the complete unmodified block back to the API for the last assistant turn. This is critical for maintaining the model’s reasoning flow. We suggest always passing back all thinking blocks to the API. For more details, see the [Extended thinking with tool usePreserve thinking blocks](claude-messages-extended-thinking.md#claude-messages-extended-thinking-tool-use-thinking-blocks).

The following example uses the **InvokeModelWithResponseStream** API to demonstrate the request and response structure when using thinking tokens with redactions.

When streaming is enabled, you’ll receive thinking content from the thinking\$1delta events. Here’s how to handle streaming with thinking:

**Request**

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 24000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 16000
    },
    "messages": [
        {
            "role": "user",
            "content": "What is 27 * 453?"
        }
    ]
}
```

**Response**

```
event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}
```

# Differences in thinking across model versions


The Messages API handles thinking differently across Claude 3.7 Sonnet and Claude 4 models, primarily in redaction and summarization behavior. The following table summarizes those differences.


| Feature | Claude 3.7 Sonnet | Claude 4 Models | 
| --- | --- | --- | 
| Thinking output | Returns the full thinking output | Returns summarized thinking | 
| Redaction handling | Uses `redacted_thinking` blocks | Redacts and encrypts full thinking, returned in a `signature` field | 
| Interleaved thinking | Not supported | Supported with a beta header | 

# Compaction


**Tip**  
Server-side compaction is recommended for managing context in long-running conversations and agentic workflows as it handles context management automatically with minimal integration work.

**Note**  
Compaction is currently in beta. Include the beta header `compact-2026-01-12` in your API requests to use this feature. Compaction is currently not supported by the Converse API, however it is supported with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html).

Compaction extends the effective context length for long-running conversations and tasks by automatically summarizing older context when approaching the context window limit. This is ideal for:
+ Chat-based, multi-turn conversations where you want users to use one chat for a long period of time
+ Task-oriented prompts that require a lot of follow-up work (often tool use) that may exceed the 200K context window

Compaction is supported on the following models:


| Model | Model ID | 
| --- | --- | 
| Claude Sonnet 4.6 | `anthropic.claude-sonnet-4-6` | 
| Claude Opus 4.6 | `anthropic.claude-opus-4-6-v1` | 

**Note**  
The top-level `input_tokens` and `output_tokens` in the `usage` field do not include compaction iteration usage, and reflect the sum of all non-compaction iterations. To calculate the total tokens consumed and billed for a request, sum across all entries in the `usage.iterations` array.  
If you previously relied on `usage.input_tokens` and `usage.output_tokens` for cost tracking or auditing, you will need to update your tracking logic to aggregate across `usage.iterations` when compaction is enabled. The `iterations` array is only present when a new compaction is triggered during the request. Re-applying a previous `compaction` block incurs no additional compaction cost, and the top-level usage fields remain accurate in that case.

## How compaction works


When compaction is enabled, Claude automatically summarizes your conversation when it approaches the configured token threshold. The API:

1. Detects when input tokens exceed your specified trigger threshold.

1. Generates a summary of the current conversation.

1. Creates a `compaction` block containing the summary.

1. Continues the response with the compacted context.

On subsequent requests, append the response to your messages. The API automatically drops all message blocks prior to the `compaction` block, continuing the conversation from the summary.

## Basic usage


Enable compaction by adding the `compact_20260112` strategy to `context_management.edits` in your Messages API request.

------
#### [ CLI ]

```
aws bedrock-runtime invoke-model \
    --model-id "us.anthropic.claude-opus-4-6-v1" \
    --body '{
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [
            {
                "role": "user",
                "content": "Help me build a website"
            }
        ],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112"
                }
            ]
        }
    }' \
    --cli-binary-format raw-in-base64-out \
    /tmp/response.json

echo "Response:"
cat /tmp/response.json | jq '.content[] | {type, text: .text[0:500]}'
```

------
#### [ Python ]

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

messages = [{"role": "user", "content": "Help me build a website"}]

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": messages,
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112"
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())

# Append the response (including any compaction block) to continue the conversation
messages.append({"role": "assistant", "content": response_body["content"]})

for block in response_body["content"]:
    if block.get("type") == "compaction":
        print(f"[COMPACTION]: {block['content'][:200]}...")
    elif block.get("type") == "text":
        print(f"[RESPONSE]: {block['text']}")
```

------
#### [ TypeScript ]

```
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

async function main() {
    const client = new BedrockRuntimeClient({});

    const messages: Array<{role: string, content: string | object[]}> = [
        { role: "user", content: "Help me build a website" }
    ];

    const command = new InvokeModelCommand({
        modelId: "us.anthropic.claude-opus-4-6-v1",
        body: JSON.stringify({
            anthropic_version: "bedrock-2023-05-31",
            anthropic_beta: ["compact-2026-01-12"],
            max_tokens: 4096,
            messages,
            context_management: {
                edits: [
                    {
                        type: "compact_20260112"
                    }
                ]
            }
        })
    });

    const response = await client.send(command);
    const responseBody = JSON.parse(new TextDecoder().decode(response.body));

    // Append response to continue conversation
    messages.push({ role: "assistant", content: responseBody.content });

    for (const block of responseBody.content) {
        if (block.type === "compaction") {
            console.log(`[COMPACTION]: ${block.content.substring(0, 200)}...`);
        } else if (block.type === "text") {
            console.log(`[RESPONSE]: ${block.text}`);
        }
    }
}

main().catch(console.error);
```

------

## Parameters



| Parameter | Type | Default | Description | 
| --- | --- | --- | --- | 
| type | string | Required | Must be "compact\$120260112" | 
| trigger | object | 150,000 tokens | When to trigger compaction. Must be at least 50,000 tokens. | 
| pause\$1after\$1compaction | boolean | false | Whether to pause after generating the compaction summary | 
| instructions | string | null | Custom summarization prompt. Completely replaces the default prompt when provided. | 

## Trigger configuration


Configure when compaction triggers using the `trigger` parameter:

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [{"role": "user", "content": "Help me build a website"}],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {
                        "type": "input_tokens",
                        "value": 100000
                    }
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())
print(response_body["content"][-1]["text"])
```

## Custom summarization instructions


By default, compaction uses the following summarization prompt:

```
You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a <summary></summary> block.
```

You can provide custom instructions via the `instructions` parameter to replace this prompt entirely. Custom instructions don't supplement the default; they completely replace it:

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [{"role": "user", "content": "Help me build a website"}],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "instructions": "Focus on preserving code snippets, variable names, and technical decisions."
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())
print(response_body["content"][-1]["text"])
```

## Pausing after compaction


Use `pause_after_compaction` to pause the API after generating the compaction summary. This allows you to add additional content blocks (such as preserving recent messages or specific instruction-oriented messages) before the API continues with the response.

When enabled, the API returns a message with the `compaction` stop reason after generating the compaction block:

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

messages = [{"role": "user", "content": "Help me build a website"}]

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": messages,
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "pause_after_compaction": True
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())

# Check if compaction triggered a pause
if response_body.get("stop_reason") == "compaction":
    # Response contains only the compaction block
    messages.append({"role": "assistant", "content": response_body["content"]})

    # Continue the request
    response = bedrock_runtime.invoke_model(
        modelId="us.anthropic.claude-opus-4-6-v1",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "anthropic_beta": ["compact-2026-01-12"],
            "max_tokens": 4096,
            "messages": messages,
            "context_management": {
                "edits": [{"type": "compact_20260112"}]
            }
        })
    )
    response_body = json.loads(response["body"].read())

print(response_body["content"][-1]["text"])
```

## Working with compaction blocks


When compaction is triggered, the API returns a `compaction` block at the start of the assistant response.

A long-running conversation may result in multiple compactions. The last compaction block reflects the final state of the prompt, replacing content prior to it with the generated summary.

```
{
  "content": [
    {
      "type": "compaction",
      "content": "Summary of the conversation: The user requested help building a web scraper..."
    },
    {
      "type": "text",
      "text": "Based on our conversation so far..."
    }
  ]
}
```

## Streaming


When streaming responses with compaction enabled, you'll receive a `content_block_start` event when compaction begins. The compaction block streams differently from text blocks. You'll receive a `content_block_start` event, followed by a single `content_block_delta` with the complete summary content (no intermediate streaming), and then a `content_block_stop` event.

## Prompt caching


You may add a `cache_control` breakpoint on compaction blocks, which caches the full system prompt along with the summarized content. The original compacted content is ignored. Note that when compaction is triggered, it can result in a cache miss on the subsequent request.

```
{
    "role": "assistant",
    "content": [
        {
            "type": "compaction",
            "content": "[summary text]",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Based on our conversation..."
        }
    ]
}
```

## Understanding usage


Compaction requires an additional sampling step, which contributes to rate limits and billing. The API returns detailed usage information in the response:

```
{
  "usage": {
    "input_tokens": 45000,
    "output_tokens": 1234,
    "iterations": [
      {
        "type": "compaction",
        "input_tokens": 180000,
        "output_tokens": 3500
      },
      {
        "type": "message",
        "input_tokens": 23000,
        "output_tokens": 1000
      }
    ]
  }
}
```

The `iterations` array shows usage for each sampling iteration. When compaction occurs, you'll see a `compaction` iteration followed by the main `message` iteration. The final iteration's token counts reflect the effective context size after compaction.

# Get validated JSON results from models
Structured Outputs

You can use structured outputs with Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4.5, and Claude Opus 4.6. To learn more, see [Get validated JSON results from models](structured-output.md).

# Request and Response


The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html).

**Note**  
Restrictions apply to the following operations: `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream`. See [API restrictions](inference-api-restrictions.md) for details.

**Warning**  
Claude Sonnet 4.5 and Claude Haiku 4.5 support specifying either the `temperature` or `top_p` parameter, but not both. This does not apply to any older models.

------
#### [ Request ]

Anthropic Claude has the following inference parameters for a messages inference call. 

```
{
    "anthropic_version": "bedrock-2023-05-31", 
    "anthropic_beta": ["computer-use-2024-10-22"] 
    "max_tokens": int,
    "system": string,    
    "messages": [
        {
            "role": string,
            "content": [
                { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "content image bytes" } },
                { "type": "text", "text": "content text" }
      ]
        }
    ],
    "temperature": float,
    "top_p": float,
    "top_k": int,
    "tools": [
        {
                "type": "custom",
                "name": string,
                "description": string,
                "input_schema": json
            
        },
        { 
            "type": "computer_20241022",  
            "name": "computer", 
            "display_height_px": int,
            "display_width_px": int,
            "display_number": 0 int
        },
        { 
            "type": "bash_20241022", 
            "name": "bash"
        },
        { 
            "type": "text_editor_20241022",
            "name": "str_replace_editor"
        }
        
    ],
    "tool_choice": {
        "type" :  string,
        "name" : string,
    },
    

 
    "stop_sequences": [string]
}
```

The following are required parameters.
+  **anthropic\$1version** – (Required) The anthropic version. The value must be `bedrock-2023-05-31`.
+ **max\$1tokens** – (Required) The maximum number of tokens to generate before stopping.

  Note that Anthropic Claude models might stop generating tokens before reaching the value of `max_tokens`. Different Anthropic Claude models have different maximum values for this parameter. For more information, see [Model comparison](https://docs.anthropic.com/claude/docs/models-overview#model-comparison).
+ **messages** – (Required) The input messages.
  + **role** – The role of the conversation turn. Valid values are `user` and `assistant`.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
  + **content** – (required) The content of the conversation turn, as an array of objects. Each object contains a **type** field, in which you can specify one of the following values:
    + `text` – If you specify this type, you must include a **text** field and specify the text prompt as its value. If another object in the array is an image, this text prompt applies to the images.
    + `image` – If you specify this type, you must include a **source** field that maps to an object with the following fields:
      + **type** – (required) The encoding type for the image. You can specify `base64`. 
      + **media\$1type** – (required) The type of the image. You can specify the following image formats. 
        + `image/jpeg`
        + `image/png`
        + `image/webp` 
        + `image/gif`
      + **data** – (required) The base64 encoded image bytes for the image. The maximum image size is 3.75MB. The maximum height and width of an image is 8000 pixels. 

The following are optional parameters.
+  **system** – (Optional) The system prompt for the request.

  A system prompt is a way of providing context and instructions to Anthropic Claude, such as specifying a particular goal or role. For more information, see [System prompts](https://docs.anthropic.com/en/docs/system-prompts) in the Anthropic documentation. 
**Note**  
You can use system prompts with Anthropic Claude version 2.1 or higher.
+ **anthropic\$1beta** – (Optional) The anthropic beta parameter is a list of strings of beta headers used to indicate opt-in to a particular set of beta features.
**Note**  
The 1 million token context length variant of Claude Sonnet 4 is available to you in select AWS Regions as a "Beta Service" as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA. Please see the [Amazon Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/) page for more information about the pricing for longer context requests. Separate service quotas apply (for more information, see **Service Quotas** in the AWS Management Console).

  Available beta headers include the following:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **stop\$1sequences** – (Optional) Custom text sequences that cause the model to stop generating. Anthropic Claude models normally stop when they have naturally completed their turn, in this case the value of the `stop_reason` response field is `end_turn`. If you want the model to stop generating when it encounters custom strings of text, you can use the `stop_sequences` parameter. If the model encounters one of the custom text strings, the value of the `stop_reason` response field is `stop_sequence` and the value of `stop_sequence` contains the matched stop sequence.

  The maximum number of entries is 8191. 
+  **temperature** – (Optional) The amount of randomness injected into the response.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **top\$1p** – (Optional) Use nucleus sampling.

  In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by `top_p`. When adjusting sampling parameters, modify either `temperature` or `top_p`. Do not modify both at the same time.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **top\$1k** – (Optional) Only sample from the top K options for each subsequent token.

  Use `top_k` to remove long tail low probability responses.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **tools** – (Optional) Definitions of tools that the model may use.
**Note**  
Requires an Anthropic Claude 3 model.

  If you include `tools` in your request, the model may return `tool_use` content blocks that represent the model's use of those tools. You can then run those tools using the tool input generated by the model and then optionally return results back to the model using `tool_result` content blocks.

  You can pass the following tool types:

**Custom**  
Definition for a custom tool.
  + (optional) **type** – The type of the tool. If defined, use the value `custom`.
  + **name** – The name of the tool.
  + **description** – (optional, but strongly recommended) The description of the tool.
  + **input\$1schema** – The JSON schema for the tool.

**Computer**  
Definition for the computer tool that you use with the computer use API.
  +  **type** – The value must be `computer_20241022`.
  + **name** – The value must be `computer`.
  + (Required) **display\$1height\$1px** – The height of the display being controlled by the model, in pixels..    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
  + (Required) **display\$1width\$1px** – The width of the display being controlled by the model, in pixels.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
  + (Optional) **display\$1number** – The display number to control (only relevant for X11 environments). If specified, the tool will be provided a display number in the tool definition.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)

**bash**  
Definition for the bash tool that you use with the computer use API.
  + (optional) **type** – The value must be `bash_20241022`.
  + **name** – The value must be `bash`. the tool.

**text editor**  
Definition for the text editor tool that you use with the computer use API.
  + (optional) **type** – The value must be `text_editor_20241022`.
  + **name** – The value must be `str_replace_editor`. the tool.
+  **tool\$1choice** – (Optional) Specifices how the model should use the provided tools. The model can use a specific tool, any available tool, or decide by itself.
**Note**  
Requires an Anthropic Claude 3 model.
  + **type** – The type of tool choice. Possible values are `any` (use any available tool), `auto` (the model decides), and `tool` (use the specified tool).
  + **name** – (Optional) The name of the tool to use. Required if you specify `tool` in the `type` field.

------
#### [ Response ]

The Anthropic Claude model returns the following fields for a messages inference call. 

```
{
    "id": string,
    "model": string,
    "type" : "message",
    "role" : "assistant",
    "content": [
        {
            "type": string,
            "text": string,
            "image" :json,
            "id": string,
            "name":string,
            "input": json
        }
    ],
    "stop_reason": string,
    "stop_sequence": string,
    "usage": {
        "input_tokens": integer,
        "output_tokens": integer
    }
    
}
```

Example responses with new stop\$1reason values:

```
// Example with refusal
{
    "stop_reason": "refusal",
    "content": [
        {
            "type": "text",
            "text": "I can't help with that request."
        }
    ]
}

// Example with tool_use
{
    "stop_reason": "tool_use",
    "content": [
        {
            "type": "tool_use",
            "id": "toolu_123",
            "name": "calculator",
            "input": {"expression": "2+2"}
        }
    ]
}

// Example with model_context_window_exceeded (Claude Sonnet 4.5)
{
    "stop_reason": "model_context_window_exceeded",
    "content": [
        {
            "type": "text",
            "text": "The response was truncated due to context window limits..."
        }
    ]
}
```
+ **id** – The unique identifier for the response. The format and length of the ID might change over time.
+ **model** – The ID for the Anthropic Claude model that made the request.
+ **stop\$1reason** – The reason why Anthropic Claude stopped generating the response.
  + **end\$1turn** – The model reached a natural stopping point
  + **max\$1tokens** – The generated text exceeded the value of the `max_tokens` input field or exceeded the maximum number of tokens that the model supports.' .
  + **stop\$1sequence** – The model generated one of the stop sequences that you specified in the `stop_sequences` input field. 
  + **refusal** – Claude refuses to generate a response due to safety concerns
  + **tool\$1use** – Claude is calling a tool and expects you to execute it
  + **model\$1context\$1window\$1exceeded** – the model stopped generation due to hitting the context window limit.
    + New with Claude Sonnet 4.5
+ **stop\$1sequence** – The stop sequence that ended the generation.
+ **type** – The type of response. The value is always `message`.
+ **role** – The conversational role of the generated message. The value is always `assistant`.
+ **content** – The content generated by the model. Returned as an array. There are three types of content, *text*, *tool\$1use* and *image*.
  + *text* – A text response.
    + **type** – The type of the content. This value is `text`. 
    + **text** – If the value of `type` is text, contains the text of the content. 
  + *tool use* – A request from the model to use a tool.
    + **type** – The type of the content. This value is `tool_use`.
    + **id** – The ID for the tool that the model is requesting use of.
    + **name** – Contains the name of the requested tool. 
    + **input** – The input parameters to pass to the tool.
  + *Image* – A request from the model to use a tool.
    + **type** – The type of the content. This value is `image`.
    + **source** – Contains the image. For more information, see [Multimodal prompts](model-parameters-anthropic-claude-messages.md#model-parameters-anthropic-claude-messages-multimodal-prompts).
+ **usage** – Container for the number of tokens that you supplied in the request and the number tokens of that the model generated in the response.
  + **input\$1tokens** – The number of input tokens in the request.
  + **output\$1tokens** – The number tokens of that the model generated in the response.
  + **stop\$1sequence** – The model generated one of the stop sequences that you specified in the `stop_sequences` input field. 

------

## Effort parameter (beta)


The `effort` parameter is an alternative to thinking token budgets for Claude Opus 4.5. This parameter tells Claude how liberally it should spend tokens to produce the best result, adjusting token usage across thinking, tool calls, and user communication. It can be used with or without extended thinking mode.

The effort parameter can be set to:
+ `high` (default) – Claude spends as many tokens as needed for the best result
+ `medium` – Balanced token usage
+ `low` – Conservative token usage

To use this feature you must pass the beta header `effort-2025-11-24`.

Request example:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": [
        "effort-2025-11-24"
    ],
    "max_tokens": 4096,
    "output_config": {
        "effort": "medium"
    },
    "messages": [{
        "role": "user",
        "content": "Analyze this complex dataset and provide insights"
    }]
}
```

# Code examples


The following code examples show how to use the messages API. 

**Topics**
+ [

## Messages code example
](#api-inference-examples-claude-messages-code-example)
+ [

## Multimodal code examples
](#api-inference-examples-claude-multimodal-code-example)

## Messages code example


This example shows how to send a single turn user message and a user turn with a prefilled assistant message to an Anthropic Claude 3 Sonnet model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate a message with Anthropic Claude (on demand).
"""
import boto3
import json
import logging

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

def generate_message(bedrock_runtime, model_id, system_prompt, messages, max_tokens):

    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "system": system_prompt,
            "messages": messages
        }  
    )  

    
    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())
   
    return response_body


def main():
    """
    Entrypoint for Anthropic Claude message example.
    """

    try:

        bedrock_runtime = boto3.client(service_name='bedrock-runtime')

        model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
        system_prompt = "Please respond only with emoji."
        max_tokens = 1000

        # Prompt with user turn only.
        user_message =  {"role": "user", "content": "Hello World"}
        messages = [user_message]

        response = generate_message (bedrock_runtime, model_id, system_prompt, messages, max_tokens)
        print("User turn only.")
        print(json.dumps(response, indent=4))

        # Prompt with both user turn and prefilled assistant response.
        #Anthropic Claude continues by using the prefilled assistant text.
        assistant_message =  {"role": "assistant", "content": "<emoji>"}
        messages = [user_message, assistant_message]
        response = generate_message(bedrock_runtime, model_id,system_prompt, messages, max_tokens)
        print("User turn and prefilled assistant response.")
        print(json.dumps(response, indent=4))

    except ClientError as err:
        message=err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occurred: " +
            format(message))

if __name__ == "__main__":
    main()
```

## Multimodal code examples


The following examples show how to pass an image and prompt text in a multimodal message to an Anthropic Claude 3 Sonnet model.

**Topics**
+ [

### Multimodal prompt with InvokeModel
](#api-inference-examples-claude-multimodal-code-example-invoke-model)
+ [

### Streaming multimodal prompt with InvokeModelWithResponseStream
](#api-inference-examples-claude-multimodal-code-example-streaming)

### Multimodal prompt with InvokeModel


The following example shows how to send a multimodal prompt to Anthropic Claude 3 Sonnet with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html). 

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to run a multimodal prompt with Anthropic Claude (on demand) and InvokeModel.
"""

import json
import logging
import base64
import boto3

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def run_multi_modal_prompt(bedrock_runtime, model_id, messages, max_tokens):
    """
    Invokes a model with a multimodal prompt.
    Args:
        bedrock_runtime: The Amazon Bedrock boto3 client.
        model_id (str): The model ID to use.
        messages (JSON) : The messages to send to the model.
        max_tokens (int) : The maximum  number of tokens to generate.
    Returns:
        None.
    """



    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": messages
        }
    )

    response = bedrock_runtime.invoke_model(
        body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Anthropic Claude multimodal prompt example.
    """

    try:

        bedrock_runtime = boto3.client(service_name='bedrock-runtime')

        model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
        max_tokens = 1000
        input_text = "What's in this image?"
        input_image = "/path/to/image" # Replace with actual path to image file
 
        # Read reference image from file and encode as base64 strings.
        image_ext = input_image.split(".")[-1]
        with open(input_image, "rb") as image_file:
            content_image = base64.b64encode(image_file.read()).decode('utf8')

        message = {
            "role": "user",
            "content": [
                {
                    "type": "image", 
                    "source": {
                        "type": "base64",
                        "media_type": f"image/{image_ext}", 
                        "data": content_image
                    }
                },
                {
                    "type": "text", 
                    "text": input_text
                }
            ]
        }

    
        messages = [message]

        response = run_multi_modal_prompt(
            bedrock_runtime, model_id, messages, max_tokens)
        print(json.dumps(response, indent=4))

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occurred: " +
              format(message))


if __name__ == "__main__":
    main()
```

### Streaming multimodal prompt with InvokeModelWithResponseStream


The following example shows how to stream the response from a multimodal prompt sent to Anthropic Claude 3 Sonnet with [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html). 

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to stream the response from Anthropic Claude Sonnet (on demand) for a 
multimodal request.
"""

import json
import base64
import logging
import boto3

from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def stream_multi_modal_prompt(bedrock_runtime, model_id, input_text, image, max_tokens):
    """
    Streams the response from a multimodal prompt.
    Args:
        bedrock_runtime: The Amazon Bedrock boto3 client.
        model_id (str): The model ID to use.
        input_text (str) : The prompt text
        image (str) : The path to  an image that you want in the prompt.
        max_tokens (int) : The maximum  number of tokens to generate.
    Returns:
        None.
    """

    with open(image, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())

    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": input_text},
                    {"type": "image", "source": {"type": "base64",
                                                 "media_type": "image/jpeg", "data": encoded_string.decode('utf-8')}}
                ]
            }
        ]
    })

    response = bedrock_runtime.invoke_model_with_response_stream(
        body=body, modelId=model_id)

    for event in response.get("body"):
        chunk = json.loads(event["chunk"]["bytes"])

        if chunk['type'] == 'message_delta':
            print(f"\nStop reason: {chunk['delta']['stop_reason']}")
            print(f"Stop sequence: {chunk['delta']['stop_sequence']}")
            print(f"Output tokens: {chunk['usage']['output_tokens']}")

        if chunk['type'] == 'content_block_delta':
            if chunk['delta']['type'] == 'text_delta':
                print(chunk['delta']['text'], end="")


def main():
    """
    Entrypoint for Anthropic Claude Sonnet multimodal prompt example.
    """

    model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
    input_text = "What can you tell me about this image?"
    image = "/path/to/image"
    max_tokens = 100

    try:

        bedrock_runtime = boto3.client('bedrock-runtime')

        stream_multi_modal_prompt(
            bedrock_runtime, model_id, input_text, image, max_tokens)

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))


if __name__ == "__main__":
    main()
```

# Supported models


You can use the Messages API with the following Anthropic Claude models.
+ Anthropic Claude Opus 4.5
+ Anthropic Claude Opus 4.1
+ Anthropic Claude Opus 4 
+ Anthropic Claude Sonnet 4.5 
+ Anthropic Claude Haiku 4.5 
+ Anthropic Claude Sonnet 4 
+ Anthropic Claude 3.7 Sonnet 
+ Anthropic Claude 3.5 Sonnet v2 
+ Anthropic Claude 3.5 Sonnet 
+ Anthropic Claude 3 Opus 
+ Anthropic Claude 3 Sonnet 
+ Anthropic Claude 3 Haiku 
+ Anthropic Claude 2 v2.1 
+ Anthropic Claude 2 v2 
+ Anthropic Claude Instant v1.2