Prepare your training data and reward functions for reinforcement fine-tuning - Amazon Bedrock

Prepare your training data and reward functions for reinforcement fine-tuning

To create a reinforcement fine-tuning job, you need training data and reward functions that evaluate response quality. Unlike traditional fine-tuning that requires input-output pairs, RFT uses prompts and reward signals to guide model learning.

You can use existing Amazon Bedrock API invocation logs as training data or upload new datasets. Reward functions define what makes a good response and can use rule-based verification (RLVR) or AI-based judgment (RLAIF).

Important

You can provide a maximum of 20K prompts to Amazon Bedrock for reinforcement fine-tuning the model.