Importing Models with Custom Chat Templates Invoking Models with Custom Chat Templates

Handling Custom Chat Templates and Tokenizers

Custom chat templates and tokenizers are essential components for properly formatting conversational inputs and managing token generation in your imported models. Custom Model Import supports importing models with custom chat templates that define how multi-turn conversations are structured and tokenized.

Importing Models with Custom Chat Templates

When importing a model with custom chat templates, ensure that you follow the best practices outlined in the main Custom Model Import documentation, including bringing in model weights in safe tensors format and providing all necessary configuration files.

Chat Template Format Requirements

If your model uses a custom chat template and you want to use that template with Amazon Bedrock, you must include the chat template file in one of the following formats:

chat_template.jinja — A Jinja2-based template file that defines how messages are formatted.
chat_template.json — A JSON file containing the chat template as a string value.
tokenizer_config.json with embedded chat template — Alternatively, you can include the chat template directly within your tokenizer_config.json file as a chat_template field. For an example, see the Hermes-2-Pro tokenizer_config.json on Hugging Face.

Custom Model Import will automatically process these files and include them in the correct directories during the import process.

Chat Template Precedence

If you provide multiple chat template sources, Amazon Bedrock applies the following precedence rules:

Separate chat template files take precedence — If you include both a separate chat template file (chat_template.jinja or chat_template.json) AND a chat_template field in your tokenizer_config.json, the separate file will be used and the embedded template in tokenizer_config.json will be ignored.
Embedded template as fallback — If you do not provide a separate chat template file, Amazon Bedrock will use the chat_template field from your tokenizer_config.json if present.

Warning

Choose one approach: To avoid confusion and ensure predictable behavior, we strongly recommend using only ONE of these approaches:

Option 1: Provide separate chat template files (chat_template.jinja or chat_template.json) without including a chat_template field in your tokenizer_config.json.
Option 2: Include the chat_template field directly in your tokenizer_config.json and do not provide separate template files.

If you have custom tool templates or complex chat template configurations, we recommend using Option 2 (embedding the template in tokenizer_config.json) as it allows you to define multiple named templates (such as "default" and "tool_use") within a single configuration file.

Note

The chat template file must follow the Hugging Face format and naming conventions. Ensure that your template is compatible with the Transformers library.

Chat Template Format Examples

Here are examples of the two supported chat template formats:

Important

Ensure that your chat template file follows the exact naming convention (chat_template.jinja or chat_template.json) and format shown in the examples above. Incorrectly formatted templates may cause import or inference failures.

Invoking Models with Custom Chat Templates

Once you have imported a model with a custom chat template, you have two options for invoking the model with properly formatted conversational inputs:

Using OpenAI ChatCompletion API with Messages

If you are providing inputs in the messages schema format, you should use the OpenAI ChatCompletion API. When you use this API format with a model that has an uploaded chat template (chat_template.jinja or chat_template.json), Amazon Bedrock will automatically convert your input messages into the correct format using the chat template.

This is the recommended approach as it provides the most seamless integration and allows Amazon Bedrock to handle the chat template application automatically.

Example: Using OpenAI ChatCompletion API with custom chat template


import json
import boto3

# Initialize Bedrock Runtime client
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Define the model ARN for your imported model with custom chat template
model_id = 'arn:aws:bedrock:us-east-1:123456789012:imported-model/your-model-id'

# Prepare the request payload using messages format
payload = {
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hello, how are you?"
        }
    ],
    "max_tokens": 150,
    "temperature": 0.7
}

# Invoke the model
response = client.invoke_model(
    modelId=model_id,
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

# Parse and display the response
response_body = json.loads(response['body'].read())
print(json.dumps(response_body, indent=2))

In this example, Amazon Bedrock will automatically apply your custom chat template to the messages array, converting it into the appropriate format for your model.

Manual Tokenization Approach

Alternatively, if you prefer to have full control over the chat template application and tokenization process, you can manually apply the chat template to your conversation and then use the Completion API (not ChatCompletion) with the pre-formatted text.

This approach is useful when you need to customize the template application logic or when working with specialized tokenization requirements.

Step 1: Apply Chat Template Locally

Use the following code snippet to load your custom chat template and apply it to a conversation locally:



from transformers import AutoTokenizer

# Configuration paths - update these with your actual paths
TOKENIZER_PATH = ""  # Path to tokenizer directory

# Load tokenizer with updated chat template
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH)

# Test chat template with sample conversation
chat_history = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"},
]

# Apply chat template and display formatted output to make sure chat template works as expected
formatted_chat = tokenizer.apply_chat_template(chat_history, tokenize=False)
print(formatted_chat)

This code demonstrates how to:

Load your tokenizer configuration from the model files
Add the custom chat template to the tokenizer configuration
Apply the chat template to a conversation history
Generate the formatted text that can be sent to the model

Step 2: Invoke with Completion API

After applying the chat template locally, use the formatted text with the Completion API:


import json
import boto3

# Initialize Bedrock Runtime client
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Define the model ARN for your imported model
model_id = 'arn:aws:bedrock:us-east-1:123456789012:imported-model/your-model-id'

# Use the formatted_chat output from Step 1 as the prompt
payload = {
    "prompt": formatted_chat,
    "max_tokens": 150,
    "temperature": 0.7
}

# Invoke the model using Completion format (not ChatCompletion)
response = client.invoke_model(
    modelId=model_id,
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

# Parse and display the response
response_body = json.loads(response['body'].read())
print(json.dumps(response_body, indent=2))

Warning

Always use max_tokens parameter: When using the Completion API with custom model imports, always use the max_tokens parameter to ensure OpenAI completion schema compatibility. This avoids any translation confusion and ensures consistent behavior across different SDK implementations. Do not use model-specific parameter names like max_gen_len or similar variants.

Important

When using the manual tokenization approach, you must use the Completion API format (with prompt field), not the ChatCompletion API format (with messages field). The ChatCompletion API expects raw messages and will attempt to apply the chat template again, resulting in incorrect formatting.

Best Practices

Use ChatCompletion API when possible — The OpenAI ChatCompletion API with messages format provides the most seamless experience and allows Amazon Bedrock to handle chat template application automatically.
Validate your chat template — Before importing your model, test your chat template locally using the Transformers library to ensure it produces the expected output format.
Include all special tokens — Ensure your chat template includes all necessary special tokens (such as beginning-of-sequence, end-of-sequence, and role markers) that your model expects.
Test with multi-turn conversations — Validate that your chat template correctly handles multi-turn conversations with alternating user and assistant messages.
Consider vision support — If your model supports multimodal inputs, ensure your chat template includes logic for handling image and video content markers.

Warning

Incorrectly formatted chat templates can result in poor model performance, unexpected outputs, or inference failures. Always test your chat template thoroughly before deploying to production.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Advanced API features for imported models

Calculate the cost of running a custom model