RFTHyperParameters
Hyperparameters for controlling the reinforcement fine-tuning training process, including learning settings and evaluation intervals.
Contents
- batchSize
-
Number of training samples processed in each batch during reinforcement fine-tuning (RFT) training. Larger batches may improve training stability.
Type: Integer
Valid Range: Minimum value of 16. Maximum value of 512.
Required: No
- epochCount
-
Number of training epochs to run during reinforcement fine-tuning. Higher values may improve performance but increase training time.
Type: Integer
Valid Range: Minimum value of 1. Maximum value of 50.
Required: No
- evalInterval
-
Interval between evaluation runs during RFT training, measured in training steps. More frequent evaluation provides better monitoring.
Type: Integer
Valid Range: Minimum value of 1. Maximum value of 100.
Required: No
- inferenceMaxTokens
-
Maximum number of tokens the model can generate in response to each prompt during RFT training.
Type: Integer
Required: No
- learningRate
-
Learning rate for the reinforcement fine-tuning. Controls how quickly the model adapts to reward signals.
Type: Float
Valid Range: Minimum value of 1.0e-07. Maximum value of 0.001.
Required: No
- maxPromptLength
-
Maximum length of input prompts during RFT training, measured in tokens. Longer prompts allow more context but increase memory usage and training-time.
Type: Integer
Required: No
- reasoningEffort
-
Level of reasoning effort applied during RFT training. Higher values may improve response quality but increase training time.
Type: String
Valid Values:
low | medium | highRequired: No
- trainingSamplePerPrompt
-
Number of response samples generated per prompt during RFT training. More samples provide better reward signal estimation.
Type: Integer
Valid Range: Minimum value of 2. Maximum value of 16.
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: