Reinforcement Learning via Verifiable Rewards (RLVR)Reinforcement Learning via AI Feedback (RLAIF)Lambda function implementation details

Setting up reward functions for Amazon Nova models

Reward functions evaluate response quality and provide feedback signals for model training. You can set up reward functions using custom Lambda functions or Amazon Bedrock-hosted foundation models as judges. Guided templates are available to simplify reward function creation for common tasks like instruction following and format validation. Choose the approach that matches your task requirements.

Reinforcement Learning via Verifiable Rewards (RLVR)

RLVR optimizes models for objective tasks such as code generation or math reasoning using verifiable rule-based graders or ready-to-use templates.

You have two options for RLVR (Custom Code):

Amazon Bedrock console provides sample templates for grader Lambda functions:

Mathematical reasoning with ground truth verification
Format validation and constraint checking
Generic grader Lambda template with boilerplate code

Follow the instructions in the provided template on the Create RFT job page in the Amazon Bedrock console.

Create custom reward functions using your own Lambda ARN for complex logic, external APIs, multi-step calculations, or combining multiple evaluation criteria.

Note

If you bring your own Lambda function, keep the following in mind:

Increase the Lambda timeout from default 3 seconds to maximum 15 minutes for complex evaluations.
The Lambda execution role needs permissions to invoke models as described in Access and security for Amazon Nova models.

Reinforcement Learning via AI Feedback (RLAIF)

RLAIF optimizes models for subjective tasks such as instruction following or chatbot interactions using AI-based judges with ready-to-use templates.

For RLAIF (Model as Judge):

Select an Amazon Bedrock hosted base Model as Judge
Configure instructions for evaluation
Define evaluation criteria and scoring guidelines

Available LLM-as-Judge prompt templates in the Amazon Bedrock console:

Instruction following (Judge model training)
Summarization (Multi-turn dialogs)
Reasoning evaluation (CoT for specialized domains)
RAG faithfulness (Context-grounded Q&A)

Note

The console's Model as Judge option automatically converts your configuration into a Lambda function during training.

Lambda function implementation details

When implementing custom Lambda reward functions, your function must accept and return data in the following format.

Design guidelines

Rank responses – Give the best answer a clearly higher score
Use consistent checks – Evaluate task completion, format adherence, safety, and reasonable length
Maintain stable scaling – Keep scores normalized and non-exploitable

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Prepare data

Create fine-tuning jobs