How it works

Amazon Bedrock AgentCore Evaluations provides capabilities to assess the performance of AI agents. It can compute metrics such as an agent's end-to-end task completion (goal attainment) correctness, the accuracy of a tool invoked by the agent while handling a user request, and any custom metric defined to evaluate specific dimensions of an agent's behavior. The AgentCore Evaluations can evaluate the AI agents that are hosted under AgentCore Runtime as well as AI agents hosted outside of AgentCore.

You can create and manage evaluation or relevant resources using the AgentCore starter toolkit, the AgentCore Python SDK, the AWS Management Console or directly through AWS SDKs.

Topics

Evaluation terminology
Evaluators
Evaluation types

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

AgentCore Evaluations: Evaluate agent performance

Evaluation terminology