EvaluatorInferenceConfig
- class aws_cdk.aws_bedrock_agentcore_alpha.EvaluatorInferenceConfig(*, max_tokens=None, temperature=None, top_p=None)
Bases:
object(experimental) Inference configuration for a custom LLM-as-a-Judge evaluator.
Controls how the foundation model generates evaluation responses.
- Parameters:
max_tokens (
Union[int,float,None]) – (experimental) The maximum number of tokens to generate in the model response. Default: - The foundation model’s default maximum token limit is usedtemperature (
Union[int,float,None]) – (experimental) The temperature value that controls randomness in the model’s responses. Higher values produce more diverse outputs. Range: 0.0 to 1.0. Default: - The foundation model’s default temperature is usedtop_p (
Union[int,float,None]) – (experimental) The top-p sampling parameter that controls the diversity of the model’s responses. Range: 0.0 to 1.0. Default: - The foundation model’s default top-p value is used
- Stability:
experimental
- ExampleMetadata:
fixture=default infused
Example:
# LLM-as-a-Judge with categorical rating scale categorical_evaluator = agentcore.Evaluator(self, "CategoricalEvaluator", evaluator_name="domain_accuracy_evaluator", level=agentcore.EvaluationLevel.SESSION, description="Evaluates domain-specific accuracy of agent responses", evaluator_config=agentcore.EvaluatorConfig.llm_as_aJudge( instructions="Evaluate whether the agent response is accurate within the healthcare domain.", model_id="us.anthropic.claude-sonnet-4-6", rating_scale=agentcore.EvaluatorRatingScale.categorical([label="Accurate", definition="The response contains factually correct healthcare information.", label="Inaccurate", definition="The response contains incorrect or misleading healthcare information." ]) ) ) # LLM-as-a-Judge with numerical rating scale and inference config numerical_evaluator = agentcore.Evaluator(self, "NumericalEvaluator", evaluator_name="response_quality_evaluator", level=agentcore.EvaluationLevel.TRACE, evaluator_config=agentcore.EvaluatorConfig.llm_as_aJudge( instructions="Rate the overall quality of the agent response on a scale of 1 to 5.", model_id="us.anthropic.claude-sonnet-4-6", rating_scale=agentcore.EvaluatorRatingScale.numerical([label="Poor", definition="Inadequate response.", value=1, label="Below Average", definition="Partially addresses the query.", value=2, label="Average", definition="Adequately addresses the query.", value=3, label="Good", definition="Well-structured and accurate response.", value=4, label="Excellent", definition="Outstanding response exceeding expectations.", value=5 ]), inference_config=agentcore.EvaluatorInferenceConfig( max_tokens=1024, temperature=0.1 ) ) )
Attributes
- max_tokens
(experimental) The maximum number of tokens to generate in the model response.
- Default:
The foundation model’s default maximum token limit is used
- Stability:
experimental
- temperature
(experimental) The temperature value that controls randomness in the model’s responses.
Higher values produce more diverse outputs. Range: 0.0 to 1.0.
- Default:
The foundation model’s default temperature is used
- Stability:
experimental
- top_p
(experimental) The top-p sampling parameter that controls the diversity of the model’s responses.
Range: 0.0 to 1.0.
- Default:
The foundation model’s default top-p value is used
- Stability:
experimental