Agent details - Evaluations
Evaluations provides continuous quality monitoring metrics for your AI agents. You can use the information provided by the dashboard to assess the performance, quality, and reliability of your AI agents.
Instead of relying on simulated test cases, evaluations capture real user sessions and agent interactions, providing a comprehensive view of agent performance, from input to final output. With agent evaluations, you can define sampling rules to evaluate only a percentage of the sessions or traces, and then apply a variety of evaluators to asses and score an AI agent's operational performance. The resulting assessments and scores are displayed in the Evaluations dashboard, allowing you to monitor trends, identify potential quality issues, set alarms, and investigate and diagnose potential issues.
The Evaluations dashboard lists all of the evaluations that have been enabled and configured for the selected agent. For more information about configuring evaluations for an agent, see AgentCore evaluations. You can expand each evaluation to view the sessions, traces, and spans that were evaluated.
Evaluations details
For each evaluation, the dashboard includes the following sections:
Evaluations graphs
The Evaluations dashboard also includes a bar graph for each evaluator. The graphs show the trends for each evaluator over time, and enable you to set alarms for specific metric values. To set an alarm, click a bar in the graph, and then choose Alarm (bell) icon. For more information, see Using Amazon CloudWatch alarms.
Work with evaluation results
If you need direct access to your evaluation results data, or if you want to create custom visualizations or work outside the AgentCore Evaluations console, you can access your evaluation results directly through CloudWatch Logs, CloudWatch Metrics, and CloudWatch dashboards.
Topics
Accessing evaluation results in CloudWatch Logs
Your evaluation results are automatically published to CloudWatch Logs in Embedded Metric Format (EMF).
To find your evaluation results log group
-
Open the CloudWatch console.
-
In the navigation pane, choose Logs Management > Log groups.
-
Search for or navigate to the log groups with prefix:
/aws/bedrock-agentcore/evaluations/. -
Within this log group, the log events contain the evaluation results.
For more information about working with log groups and querying log data, see Working with Log Groups and Log Streams and Analyzing Log Data with CloudWatch Logs Insights.
Accessing evaluation metrics in CloudWatch Metrics
Evaluation results metrics are automatically extracted from the Embedded Metric Format (EMF) logs and published to CloudWatch Metrics.
To find your evaluation metrics
-
Open the CloudWatch console.
-
In the navigation pane, choose Metrics > All metrics.
-
Select the Bedrock AgentCore/Evaluations namespace.
-
Browse available metrics by dimensions.
For more information about viewing and working with metrics, see Using CloudWatch Metrics and Graphing Metrics.
Creating Custom Dashboards
You can create custom dashboards to visualize your evaluation metrics alongside other operational metrics.
To create a dashboard with evaluation metrics
-
In the CloudWatch console, choose Dashboards from the navigation pane.
-
Choose Create dashboard.
-
Add widgets and select metrics from the Bedrock AgentCore/Evaluations namespace.
-
Customize the time range, statistic, and visualization type for your needs.
For detailed instructions, see Creating and Working with Custom Dashboards and Using CloudWatch Dashboards.
Setting alarms on evaluation metrics
You can set alarms to notify you when evaluation metrics cross specified thresholds that you have specified, such as when correctness drops below acceptable levels.
To create an alarm on evaluation metrics
-
In the CloudWatch console, choose Alarms > All alarms.
-
Choose Create alarm.
-
Choose Select metric and navigate to the Bedrock AgentCore/Evaluations namespace.
-
Select the metric you want to monitor.
-
Configure the threshold conditions (dynamic anomaly detection threshold available where you don't need to specified a static number threshold) and notification actions.
For detailed instructions, see Using CloudWatch Alarms and Creating a CloudWatch Alarm Based on a Static Threshold.