Prepare data for open-weight models
When you fine-tune open-weight models with reinforcement fine-tuning using OpenAI-compatible APIs, provide training data by bringing your own prompts in JSONL format with the purpose fine-tune.
Training data format and requirements
Training data must follow the OpenAI chat completions format with 100-20K examples. Each training example contains:
-
messages: In this field, include the user, system or assistant role containing the input prompt provided to the model. -
reference_answer: In this field, it should contain the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limitedto structured outputs—it can contain any format that helps your reward function evaluate quality. -
[Optional] You can add fields used by grader Lambda for grading.
Requirements:
JSONL format with prompts in OpenAI chat completion format (one prompt per line)
Purpose must be set to
fine-tuneA minimum of 100 records in training dataset
Amazon Bedrock automatically validates training dataset format
Files API
You can use OpenAI-compatible files api to upload your training data for fine-tuning jobs.
Files are stored securely in Amazon Bedrock, and are used when creating fine-tuning jobs. For complete API
details, see the OpenAI Files documentation
To upload a training file, choose the tab for your preferred method, and then follow the steps:
To retrieve details about a specific file, choose the tab for your preferred method, and then follow the steps:
To list uploaded files, choose the tab for your preferred method, and then follow the steps:
To delete a file, choose the tab for your preferred method, and then follow the steps:
Characteristics of effective training data
Effective RFT training data requires three key characteristics:
-
Clarity and consistency – Use clear, unambiguous prompts with consistent formatting. Avoid contradictory labels, ambiguous instructions, or conflicting reference answers that mislead training.
-
Diversity – Include varied input formats, edge cases, and difficulty levels that reflect production usage patterns across different user types and scenarios.
-
Efficient reward functions – Design functions that execute quickly (seconds, not minutes), parallelize with AWS Lambda, and return consistent scores for cost-effective training.
Additional properties
The RFT data format supports custom fields beyond the core schema requirements (messages and reference_answer). This flexibility allows you to add any additional data your reward function needs for proper evaluation.
Note
You don't need to configure this in your recipe. The data format inherently supports additional fields. Simply include them in your training data JSON, and they will be passed to your reward function in the metadata field.
Common additional properties
task_id– Unique identifier for trackingdifficulty_level– Problem complexity indicatordomain– Subject area or categoryexpected_reasoning_steps– Number of steps in solution
These additional fields are passed to your reward function during evaluation, enabling sophisticated scoring logic tailored to your specific use case.
Examples with additional properties