RFTHyperParameters

Hyperparameters for controlling the reinforcement fine-tuning training process, including learning settings and evaluation intervals.

batchSize

Number of training samples processed in each batch during reinforcement fine-tuning (RFT) training. Larger batches may improve training stability.

Type: Integer

Valid Range: Minimum value of 16. Maximum value of 512.

Required: No

epochCount

Number of training epochs to run during reinforcement fine-tuning. Higher values may improve performance but increase training time.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 50.

Required: No

evalInterval

Interval between evaluation runs during RFT training, measured in training steps. More frequent evaluation provides better monitoring.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 100.

Required: No

inferenceMaxTokens

Maximum number of tokens the model can generate in response to each prompt during RFT training.

Type: Integer

Required: No

learningRate

Learning rate for the reinforcement fine-tuning. Controls how quickly the model adapts to reward signals.

Type: Float

Valid Range: Minimum value of 1.0e-07. Maximum value of 0.001.

Required: No

maxPromptLength

Maximum length of input prompts during RFT training, measured in tokens. Longer prompts allow more context but increase memory usage and training-time.

Type: Integer

Required: No

reasoningEffort

Level of reasoning effort applied during RFT training. Higher values may improve response quality but increase training time.

Type: String

Valid Values: low | medium | high

Required: No

trainingSamplePerPrompt

Number of response samples generated per prompt during RFT training. More samples provide better reward signal estimation.

Type: Integer

Valid Range: Minimum value of 2. Maximum value of 16.

Required: No

RFTHyperParameters

Contents

See Also