Create and manage fine-tuning jobs for Amazon Nova models
You can create a reinforcement fine-tuning (RFT) job using the Amazon Bedrock console or API. The RFT job can take a few hours depending on the size of your training data, number of epochs, and complexity of your reward functions.
Prerequisites
-
Create an IAM service role with the required permissions. For comprehensive security and permissions information including RFT-specific permissions, see Access and security for Amazon Nova models.
-
(Optional) Encrypt input and output data, your RFT job, or inference requests made to custom models. For more information, see Encryption of custom models.
Create your RFT job
Choose the tab for your preferred method, and then follow the steps:
Monitor your RFT training job
Amazon Bedrock provides real-time monitoring with visual graphs and metrics during RFT training. These metrics help you understand whether the model converges properly and if the reward function effectively guides the learning process.
Job status tracking
You can monitor your RFT job status through the validation and training phases in the Amazon Bedrock console.
Completion indicators:
-
Job status changes to Completed when training completes successfully
-
Custom model ARN becomes available for deployment
-
Training metrics reach convergence thresholds
Real-time training metrics
Amazon Bedrock provides real-time monitoring during RFT training with visual graphs displaying training and validation metrics.
Core training metrics
-
Training loss - Measures how well the model is learning from the training data
-
Training reward statistics - Shows reward scores assigned by your reward functions
-
Reward margin - Measures the difference between good and bad response rewards
-
Accuracy on training and validation sets - Shows model performance on both the training and held-out data
Detailed metric categories
Reward metrics –
critic/rewards/mean,critic/rewards/max,critic/rewards/min(reward distribution), andval-score/rewards/mean@1(validation rewards)Model behavior –
actor/entropy(policy variation; higher equals more exploratory)Training health –
actor/pg_loss(policy gradient loss),actor/pg_clipfrac(frequency of clipped updates), andactor/grad_norm(gradient magnitude)Response characteristics –
prompt_length/mean,prompt_length/max,prompt_length/min(input token statistics),response_length/mean,response_length/max,response_length/min(output token statistics), andresponse/aborted_ratio(incomplete generation rate; 0 equals all completed)Performance –
perf/throughput(training throughput),perf/time_per_step(time per training step), andtiming_per_token_ms/*(per-token processing times)Resource usage –
perf/max_memory_allocated_gb,perf/max_memory_reserved_gb(GPU memory), andperf/cpu_memory_used_gb(CPU memory)
Training progress visualization
The console displays interactive graphs that update in real-time as your RFT job progresses. These visualizations can help you:
-
Track convergence toward optimal performance
-
Identify potential training issues early
-
Determine optimal stopping points
-
Compare performance across different epochs
Set up inference
After job completion, deploy the RFT model for on-demand inference or use Provisioned Throughput for consistent performance. For setting up inference, see Set up inference for a custom model.
Use Test in Playground to evaluate and compare responses with the base model. For evaluating your completed RFT model, see Evaluate your RFT model.