Results and output
Online evaluation results are automatically saved to Amazon CloudWatch. When you create an online evaluation configuration, the service creates a dedicated CloudWatch log group to store your evaluation results in JSON format.
Topics
Log group structure
Evaluation results are stored in a CloudWatch log group with the format
/aws/bedrock-agentcore/evaluations/results/<online-evaluation-config-id>.
The log group can be viewed on evaluation configuration details
page in Amazon Bedrock AgentCore console.
Each evaluation generates a separate log entry within this log group. Additionally, evaluation scores are emitted as CloudWatch metrics for monitoring and analysis.
Result format
Evaluations results follow OpenTelemetry semantic conventions for GenAI evaluation result events. The events are parented to the original span ID when possible and contain references the original trace ID and session ID.
You can use CloudWatch Logs Insights to query and analyze your evaluation results, and CloudWatch Metrics to monitor evaluation trends over time.
Viewing results in CloudWatch Observability Console
You can view and analyze your evaluation results using the CloudWatch Observability Console. The console provides visualizations, metrics, and detailed logs of your agent evaluations.
To view evaluation results
-
Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/
-
In the navigation pane, choose GenAI Observability > Bedrock AgentCore
-
Under the Agents section, select the agent and endpoint associated with your evaluation configuration
-
Navigate to the Evaluations tab for detailed results
For more details, see AWS CloudWatch session trace evaluations documentation.
Viewing evaluation scores in CloudWatch Metrics
Evaluation scores are published as CloudWatch metrics. You can view them directly in the CloudWatch Metrics console.
To view evaluation scores
-
Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/
-
In the navigation pane, choose Metrics > All Metrics
-
In the Browse tab, select Bedrock-AgentCore/Evaluations
-
Select dimension combinations to optionally narrow down results by evaluator type or evaluation label
For more details, see AWS CloudWatch session trace evaluations documentation.