learningRate

Learning rate for the reinforcement fine-tuning. Controls how quickly the model adapts to reward signals.