EVALUATE()`¶

About Evaluate¶

Performs on-demand evaluation of agent traces using a specified evaluator. This synchronous API accepts traces in OpenTelemetry format and returns immediate scoring results with detailed explanations.

Method Signature¶

METHODS /AWS1/IF_BDC~EVALUATE
  IMPORTING
    !IV_EVALUATORID TYPE /AWS1/BDCEVALUATORID OPTIONAL
    !IO_EVALUATIONINPUT TYPE REF TO /AWS1/CL_BDCEVALUATIONINPUT OPTIONAL
    !IO_EVALUATIONTARGET TYPE REF TO /AWS1/CL_BDCEVALUATIONTARGET OPTIONAL
    !IT_EVALUATIONREFERENCEINPUTS TYPE /AWS1/CL_BDCEVALREFERENCEINPUT=>TT_EVALUATIONREFERENCEINPUTS OPTIONAL
  RETURNING
    VALUE(OO_OUTPUT) TYPE REF TO /aws1/cl_bdcevaluateresponse
  RAISING
    /AWS1/CX_BDCACCESSDENIEDEX
    /AWS1/CX_BDCCONFLICTEXCEPTION
    /AWS1/CX_BDCDUPLICATEIDEX
    /AWS1/CX_BDCINTERNALSERVEREX
    /AWS1/CX_BDCRESOURCENOTFOUNDEX
    /AWS1/CX_BDCSERVICEQUOTAEXCDEX
    /AWS1/CX_BDCTHROTTLINGEX
    /AWS1/CX_BDCUNAUTHORIZEDEX
    /AWS1/CX_BDCVALIDATIONEX
    /AWS1/CX_BDCCLIENTEXC
    /AWS1/CX_BDCSERVEREXC
    /AWS1/CX_RT_TECHNICAL_GENERIC
    /AWS1/CX_RT_SERVICE_GENERIC.

IMPORTING¶

Required arguments:¶

`iv_evaluatorid` `TYPE /AWS1/BDCEVALUATORID` `/AWS1/BDCEVALUATORID`¶

The unique identifier of the evaluator to use for scoring. Can be a built-in evaluator (e.g., Builtin.Helpfulness, Builtin.Correctness) or a custom evaluator Id created through the control plane API.

`io_evaluationinput` `TYPE REF TO /AWS1/CL_BDCEVALUATIONINPUT` `/AWS1/CL_BDCEVALUATIONINPUT`¶

The input data containing agent session spans to be evaluated. Includes a list of spans in OpenTelemetry format from supported frameworks like Strands (AgentCore Runtime) or LangGraph with OpenInference instrumentation.

Optional arguments:¶

`io_evaluationtarget` `TYPE REF TO /AWS1/CL_BDCEVALUATIONTARGET` `/AWS1/CL_BDCEVALUATIONTARGET`¶

The specific trace or span IDs to evaluate within the provided input. Allows targeting evaluation at different levels: individual tool calls, single request-response interactions (traces), or entire conversation sessions.

`it_evaluationreferenceinputs` `TYPE /AWS1/CL_BDCEVALREFERENCEINPUT=>TT_EVALUATIONREFERENCEINPUTS` `TT_EVALUATIONREFERENCEINPUTS`¶

Ground truth data to compare against agent responses during evaluation. Allows to provide expected responses, assertions, and expected tool trajectories at different evaluation levels. Session-level reference inputs apply to the entire conversation, while trace-level reference inputs target specific request-response interactions identified by trace ID.

RETURNING¶

`oo_output` `TYPE REF TO /aws1/cl_bdcevaluateresponse` `/AWS1/CL_BDCEVALUATERESPONSE`¶

Examples¶

Syntax Example¶

This is an example of the syntax for calling the method. It includes every possible argument and initializes every possible value. The data provided is not necessarily semantically accurate (for example the value "string" may be provided for something that is intended to be an instance ID, or in some cases two arguments may be mutually exclusive). The syntax shows the ABAP syntax for creating the various data structures.

DATA(lo_result) = lo_client->evaluate(
  io_evaluationinput = new /aws1/cl_bdcevaluationinput(
    it_sessionspans = VALUE /aws1/cl_rt_document=>tt_list(
      ( /AWS1/CL_RT_DOCUMENT=>FROM_JSON_STR( |\{"foo":"this is a JSON object..."\}| ) )
    )
  )
  io_evaluationtarget = new /aws1/cl_bdcevaluationtarget(
    it_spanids = VALUE /aws1/cl_bdcspanids_w=>tt_spanids(
      ( new /aws1/cl_bdcspanids_w( |string| ) )
    )
    it_traceids = VALUE /aws1/cl_bdctraceids_w=>tt_traceids(
      ( new /aws1/cl_bdctraceids_w( |string| ) )
    )
  )
  it_evaluationreferenceinputs = VALUE /aws1/cl_bdcevalreferenceinput=>tt_evaluationreferenceinputs(
    (
      new /aws1/cl_bdcevalreferenceinput(
        io_context = new /aws1/cl_bdccontext(
          io_spancontext = new /aws1/cl_bdcspancontext(
            iv_sessionid = |string|
            iv_spanid = |string|
            iv_traceid = |string|
          )
        )
        io_expectedresponse = new /aws1/cl_bdcevaluationcontent( |string| )
        io_expectedtrajectory = new /aws1/cl_bdcevalexpectedtraj00(
          it_toolnames = VALUE /aws1/cl_bdcevaltoolnames_w=>tt_evaluationtoolnames(
            ( new /aws1/cl_bdcevaltoolnames_w( |string| ) )
          )
        )
        it_assertions = VALUE /aws1/cl_bdcevaluationcontent=>tt_evaluationcontentlist(
          ( new /aws1/cl_bdcevaluationcontent( |string| ) )
        )
      )
    )
  )
  iv_evaluatorid = |string|
).

This is an example of reading all possible response values

lo_result = lo_result.
IF lo_result IS NOT INITIAL.
  LOOP AT lo_result->get_evaluationresults( ) into lo_row.
    lo_row_1 = lo_row.
    IF lo_row_1 IS NOT INITIAL.
      lv_evaluatorarn = lo_row_1->get_evaluatorarn( ).
      lv_evaluatorid = lo_row_1->get_evaluatorid( ).
      lv_evaluatorname = lo_row_1->get_evaluatorname( ).
      lv_evaluationexplanation = lo_row_1->get_explanation( ).
      lo_context = lo_row_1->get_context( ).
      IF lo_context IS NOT INITIAL.
        lo_spancontext = lo_context->get_spancontext( ).
        IF lo_spancontext IS NOT INITIAL.
          lv_string = lo_spancontext->get_sessionid( ).
          lv_string = lo_spancontext->get_traceid( ).
          lv_string = lo_spancontext->get_spanid( ).
        ENDIF.
      ENDIF.
      lv_double = lo_row_1->get_value( ).
      lv_string = lo_row_1->get_label( ).
      lo_tokenusage = lo_row_1->get_tokenusage( ).
      IF lo_tokenusage IS NOT INITIAL.
        lv_integer = lo_tokenusage->get_inputtokens( ).
        lv_integer = lo_tokenusage->get_outputtokens( ).
        lv_integer = lo_tokenusage->get_totaltokens( ).
      ENDIF.
      lv_evaluationerrormessage = lo_row_1->get_errormessage( ).
      lv_evaluationerrorcode = lo_row_1->get_errorcode( ).
      LOOP AT lo_row_1->get_ignoredrefinputfields( ) into lo_row_2.
        lo_row_3 = lo_row_2.
        IF lo_row_3 IS NOT INITIAL.
          lv_ignoredreferenceinputfi = lo_row_3->get_value( ).
        ENDIF.
      ENDLOOP.
    ENDIF.
  ENDLOOP.
ENDIF.

/AWS1/IF_BDC=>EVALUATE()¶