Training data format and requirements Files API Characteristics of effective training data Additional properties

Prepare data for open-weight models

When you fine-tune open-weight models with reinforcement fine-tuning using OpenAI-compatible APIs, provide training data by bringing your own prompts in JSONL format with the purpose fine-tune.

Training data format and requirements

Training data must follow the OpenAI chat completions format with 100-20K examples. Each training example contains:

messages: In this field, include the user, system or assistant role containing the input prompt provided to the model.
reference_answer: In this field, it should contain the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limitedto structured outputs—it can contain any format that helps your reward function evaluate quality.
[Optional] You can add fields used by grader Lambda for grading.

Requirements:

JSONL format with prompts in OpenAI chat completion format (one prompt per line)
Purpose must be set to fine-tune
A minimum of 100 records in training dataset
Amazon Bedrock automatically validates training dataset format

Files API

You can use OpenAI-compatible files api to upload your training data for fine-tuning jobs. Files are stored securely in Amazon Bedrock, and are used when creating fine-tuning jobs. For complete API details, see the OpenAI Files documentation.

To upload a training file, choose the tab for your preferred method, and then follow the steps:

To retrieve details about a specific file, choose the tab for your preferred method, and then follow the steps:

To list uploaded files, choose the tab for your preferred method, and then follow the steps:

To delete a file, choose the tab for your preferred method, and then follow the steps:

Characteristics of effective training data

Effective RFT training data requires three key characteristics:

Clarity and consistency – Use clear, unambiguous prompts with consistent formatting. Avoid contradictory labels, ambiguous instructions, or conflicting reference answers that mislead training.
Diversity – Include varied input formats, edge cases, and difficulty levels that reflect production usage patterns across different user types and scenarios.
Efficient reward functions – Design functions that execute quickly (seconds, not minutes), parallelize with AWS Lambda, and return consistent scores for cost-effective training.

Additional properties

The RFT data format supports custom fields beyond the core schema requirements (messages and reference_answer). This flexibility allows you to add any additional data your reward function needs for proper evaluation.

Note

You don't need to configure this in your recipe. The data format inherently supports additional fields. Simply include them in your training data JSON, and they will be passed to your reward function in the metadata field.

Common additional properties

task_id – Unique identifier for tracking
difficulty_level – Problem complexity indicator
domain – Subject area or category
expected_reasoning_steps – Number of steps in solution

These additional fields are passed to your reward function during evaluation, enabling sophisticated scoring logic tailored to your specific use case.

Examples with additional properties

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Access and security

Setting up reward functions