Prepare your training data and reward functions for reinforcement fine-tuning
To create a reinforcement fine-tuning job, you need training data and reward functions that evaluate response quality. Unlike traditional fine-tuning that requires input-output pairs, RFT uses prompts and reward signals to guide model learning.
You can use existing Amazon Bedrock API invocation logs as training data or upload new datasets. Reward functions define what makes a good response and can use rule-based verification (RLVR) or AI-based judgment (RLAIF).
Important
You can provide a maximum of 20K prompts to Amazon Bedrock for reinforcement fine-tuning the model.