

# Start batch evaluation
<a name="batch-evaluations-start"></a>

Start a batch evaluation to run evaluators against multiple agent sessions. The service discovers sessions from CloudWatch Logs, runs each evaluator against each session, and produces aggregate results.

## Code samples
<a name="start-batch-eval-examples"></a>

**Example**  
The CLI resolves `serviceNames` and `logGroupNames` automatically from the project configuration when you use `--runtime`:  

```
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate Builtin.Helpfulness Builtin.Faithfulness
```
With optional flags:  

```
# Custom name and lookback window
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate \
  --name "baseline-eval" \
  --lookback-days 1

# Specific sessions
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate \
  --session-ids session-abc123 session-def456

# With ground truth
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate Builtin.Correctness \
  --ground-truth ground-truth.json
```
The CLI polls until the job reaches a terminal state (`COMPLETED`, `FAILED`, or `STOPPED`), displays per-evaluator average scores, and saves results to `.cli/eval-job-results/`.

```
import boto3
import uuid
import time
import json

client = boto3.client("bedrock-agentcore", region_name="us-west-2")

# All sessions in the log group
response = client.start_batch_evaluation(
    batchEvaluationName=f"baseline_eval_{uuid.uuid4().hex[:8]}",
    evaluators=[
        {"evaluatorId": "Builtin.GoalSuccessRate"},
        {"evaluatorId": "Builtin.Helpfulness"},
        {"evaluatorId": "Builtin.Faithfulness"},
    ],
    dataSourceConfig={
        "cloudWatchLogs": {
            "serviceNames": ["MyAgent.DEFAULT"],
            "logGroupNames": ["/aws/bedrock-agentcore/runtimes/MyAgent-abc123-DEFAULT"],
        }
    },
    clientToken=str(uuid.uuid4()),
)

batch_eval_id = response["batchEvaluationId"]
print(f"Started: {batch_eval_id}")

# Poll until complete
while True:
    result = client.get_batch_evaluation(batchEvaluationId=batch_eval_id)
    status = result["status"]
    print(f"Status: {status}")

    if status in ("COMPLETED", "COMPLETED_WITH_ERRORS", "FAILED", "STOPPED"):
        break
    time.sleep(30)

print(json.dumps(result, indent=4, default=str))
```
With session ID filtering:  

```
response = client.start_batch_evaluation(
    batchEvaluationName=f"targeted-eval-{uuid.uuid4().hex[:8]}",
    evaluators=[
        {"evaluatorId": "Builtin.GoalSuccessRate"},
    ],
    dataSourceConfig={
        "cloudWatchLogs": {
            "serviceNames": ["MyAgent.DEFAULT"],
            "logGroupNames": ["/aws/bedrock-agentcore/runtimes/MyAgent-abc123-DEFAULT"],
            "filterConfig": {
                "sessionIds": ["session-001", "session-002", "session-003"]
            },
        }
    },
    clientToken=str(uuid.uuid4()),
)
```
With time range filtering:  

```
from datetime import datetime, timedelta, timezone

now = datetime.now(timezone.utc)
response = client.start_batch_evaluation(
    batchEvaluationName=f"weekly-eval-{uuid.uuid4().hex[:8]}",
    evaluators=[
        {"evaluatorId": "Builtin.GoalSuccessRate"},
    ],
    dataSourceConfig={
        "cloudWatchLogs": {
            "serviceNames": ["MyAgent.DEFAULT"],
            "logGroupNames": ["/aws/bedrock-agentcore/runtimes/MyAgent-abc123-DEFAULT"],
            "filterConfig": {
                "timeRange": {
                    "startTime": (now - timedelta(days=7)).isoformat(),
                    "endTime": now.isoformat(),
                }
            },
        }
    },
    clientToken=str(uuid.uuid4()),
)
```

## Request parameters
<a name="start-batch-eval-params"></a>


| Parameter | Type | Required | Description | 
| --- | --- | --- | --- | 
|  `batchEvaluationName`  | String | Yes | A name for the batch evaluation job. Pattern: starts with a letter, alphanumeric and underscores, max 48 characters. | 
|  `dataSourceConfig`  | Object | Yes | Where to find agent sessions. Specify a `cloudWatchLogs` source with the log groups and service name for your agent. See [Session source](#start-batch-eval-session-source) below. | 
|  `evaluators`  | List | Yes | List of evaluators. Each entry has an `evaluatorId` field (for example, `Builtin.GoalSuccessRate`). Maximum 10 evaluators. | 
|  `evaluationMetadata`  | Object | No | Contains `sessionMetadata`, a list of per-session ground truth and metadata. Maximum 500 entries. | 
|  `clientToken`  | String | No | Idempotency token. If you retry a request with the same client token, the service returns the existing job instead of creating a new one. | 

## Session source
<a name="start-batch-eval-session-source"></a>

The `dataSourceConfig` parameter specifies the CloudWatch Logs location where the service discovers agent sessions.

### Required fields
<a name="start-batch-eval-session-source-required"></a>


| Field | Type | Description | 
| --- | --- | --- | 
|  `cloudWatchLogs.serviceNames`  | List of strings (exactly 1) | The service name that identifies your agent’s traces in CloudWatch. Convention: `{RuntimeName}.DEFAULT`. | 
|  `cloudWatchLogs.logGroupNames`  | List of strings (1–5) | CloudWatch log group names where agent telemetry is stored. Convention: `/aws/bedrock-agentcore/runtimes/{agentId}-DEFAULT`. | 

### Optional fields
<a name="start-batch-eval-session-source-optional"></a>


| Field | Type | Description | 
| --- | --- | --- | 
|  `cloudWatchLogs.filterConfig.sessionIds`  | List of strings | Evaluate only these specific session IDs. When omitted, the service discovers all sessions in the log group. | 
|  `cloudWatchLogs.filterConfig.timeRange.startTime`  | ISO 8601 datetime | Filter sessions created after this time. | 
|  `cloudWatchLogs.filterConfig.timeRange.endTime`  | ISO 8601 datetime | Filter sessions created before this time. | 

## Response
<a name="start-batch-eval-response"></a>


| Field | Type | Description | 
| --- | --- | --- | 
|  `batchEvaluationId`  | String | Unique identifier for the batch evaluation. | 
|  `batchEvaluationArn`  | String | ARN of the batch evaluation. | 
|  `batchEvaluationName`  | String | The name you specified. | 
|  `status`  | String | Initial status. One of: `PENDING`, `IN_PROGRESS`. | 
|  `evaluators`  | List | The evaluators used. | 
|  `createdAt`  | Timestamp | When the job was created. | 
|  `outputConfig`  | Object | CloudWatch Logs destination for per-session results. | 

## Errors
<a name="start-batch-eval-errors"></a>


| Error | HTTP status | Description | 
| --- | --- | --- | 
|  `ValidationException`  | 400 | Invalid request parameters. Check field constraints and required fields. | 
|  `AccessDeniedException`  | 403 | Insufficient permissions. Verify IAM policies. | 
|  `ConflictException`  | 409 | A batch evaluation with the same client token already exists with different parameters. | 
|  `ThrottlingException`  | 429 | Request rate exceeded. Retry with exponential backoff. | 
|  `InternalServerException`  | 500 | Service-side error. Retry the request. | 