Interface EvaluatorInferenceConfig

All Superinterfaces:
software.amazon.jsii.JsiiSerializable
All Known Implementing Classes:
EvaluatorInferenceConfig.Jsii$Proxy

@Generated(value="jsii-pacmak/1.130.0 (build 048a5ee)", date="2026-05-19T19:44:37.075Z") @Stability(Stable) public interface EvaluatorInferenceConfig extends software.amazon.jsii.JsiiSerializable
Inference configuration for a custom LLM-as-a-Judge evaluator.

Controls how the foundation model generates evaluation responses.

Example:

 // LLM-as-a-Judge with categorical rating scale
 Evaluator categoricalEvaluator = Evaluator.Builder.create(this, "CategoricalEvaluator")
         .evaluatorName("domain_accuracy_evaluator")
         .level(EvaluationLevel.SESSION)
         .description("Evaluates domain-specific accuracy of agent responses")
         .evaluatorConfig(EvaluatorConfig.llmAsAJudge(LlmAsAJudgeOptions.builder()
                 .instructions("Evaluate whether the agent response is accurate within the healthcare domain.")
                 .modelId("us.anthropic.claude-sonnet-4-6")
                 .ratingScale(EvaluatorRatingScale.categorical(List.of(CategoricalRatingOption.builder().label("Accurate").definition("The response contains factually correct healthcare information.").build(), CategoricalRatingOption.builder().label("Inaccurate").definition("The response contains incorrect or misleading healthcare information.").build())))
                 .build()))
         .build();
 // LLM-as-a-Judge with numerical rating scale and inference config
 Evaluator numericalEvaluator = Evaluator.Builder.create(this, "NumericalEvaluator")
         .evaluatorName("response_quality_evaluator")
         .level(EvaluationLevel.TRACE)
         .evaluatorConfig(EvaluatorConfig.llmAsAJudge(LlmAsAJudgeOptions.builder()
                 .instructions("Rate the overall quality of the agent response on a scale of 1 to 5.")
                 .modelId("us.anthropic.claude-sonnet-4-6")
                 .ratingScale(EvaluatorRatingScale.numerical(List.of(NumericalRatingOption.builder().label("Poor").definition("Inadequate response.").value(1).build(), NumericalRatingOption.builder().label("Below Average").definition("Partially addresses the query.").value(2).build(), NumericalRatingOption.builder().label("Average").definition("Adequately addresses the query.").value(3).build(), NumericalRatingOption.builder().label("Good").definition("Well-structured and accurate response.").value(4).build(), NumericalRatingOption.builder().label("Excellent").definition("Outstanding response exceeding expectations.").value(5).build())))
                 .inferenceConfig(EvaluatorInferenceConfig.builder()
                         .maxTokens(1024)
                         .temperature(0.1)
                         .build())
                 .build()))
         .build();
 
  • Method Details

    • getMaxTokens

      @Stability(Stable) @Nullable default Number getMaxTokens()
      The maximum number of tokens to generate in the model response.

      Default: - The foundation model's default maximum token limit is used

    • getTemperature

      @Stability(Stable) @Nullable default Number getTemperature()
      The temperature value that controls randomness in the model's responses.

      Higher values produce more diverse outputs. Range: 0.0 to 1.0.

      Default: - The foundation model's default temperature is used

    • getTopP

      @Stability(Stable) @Nullable default Number getTopP()
      The top-p sampling parameter that controls the diversity of the model's responses.

      Range: 0.0 to 1.0.

      Default: - The foundation model's default top-p value is used

    • builder

      @Stability(Stable) static EvaluatorInferenceConfig.Builder builder()
      Returns:
      a EvaluatorInferenceConfig.Builder of EvaluatorInferenceConfig