Builder
Properties
Number of training epochs to run during reinforcement fine-tuning. Higher values may improve performance but increase training time.
Interval between evaluation runs during RFT training, measured in training steps. More frequent evaluation provides better monitoring.
Maximum number of tokens the model can generate in response to each prompt during RFT training.
Learning rate for the reinforcement fine-tuning. Controls how quickly the model adapts to reward signals.
Maximum length of input prompts during RFT training, measured in tokens. Longer prompts allow more context but increase memory usage and training-time.
Level of reasoning effort applied during RFT training. Higher values may improve response quality but increase training time.
Number of response samples generated per prompt during RFT training. More samples provide better reward signal estimation.