

# Dataset schema
<a name="dataset-evaluations-schema"></a>

A dataset contains one or more scenarios. Each scenario represents a conversation (session) with the agent. Both the [on-demand](dataset-evaluations-on-demand.md) and [batch](dataset-evaluations-batch.md) dataset runners use the same dataset format.

The AgentCore SDK supports two scenario types:
+  **Predefined scenarios** use a fixed sequence of turns that you author by hand. The runner replays the turns exactly as written.
+  **Simulated scenarios** use an LLM-backed actor to generate turns dynamically based on a persona and goal. See [User simulation](user-simulation.md) for details on actor profiles and simulation configuration.

 `FileDatasetProvider` auto-detects the scenario type from the JSON structure: scenarios with a `turns` field are loaded as predefined; scenarios with an `actor_profile` field (and no `turns`) are loaded as simulated.

## Predefined scenarios
<a name="dataset-schema-predefined"></a>

A predefined scenario specifies a fixed sequence of turns with known inputs and optional expected outputs.

### Single-turn example
<a name="dataset-schema-predefined-single-turn"></a>

Each scenario sends one prompt and checks the response:

```
{
  "scenarios": [
    {
      "scenario_id": "math-question",
      "turns": [
        {
          "input": "What is 15 + 27?",
          "expected_response": "15 + 27 = 42"
        }
      ],
      "expected_trajectory": ["calculator"],
      "assertions": ["Agent used the calculator tool to compute the result"]
    },
    {
      "scenario_id": "weather-check",
      "turns": [
        {
          "input": "What's the weather?",
          "expected_response": "The weather is sunny"
        }
      ],
      "expected_trajectory": ["weather"],
      "assertions": ["Agent used the weather tool"]
    }
  ]
}
```

### Multi-turn example
<a name="dataset-schema-predefined-multi-turn"></a>

Multi-turn scenarios have multiple turns per scenario. Turns execute sequentially within the same session, maintaining conversation context. Each turn can have its own `expected_response`, while `assertions` and `expected_trajectory` apply to the entire session:

```
{
  "scenarios": [
    {
      "scenario_id": "math-then-weather",
      "turns": [
        {
          "input": "What is 15 + 27?",
          "expected_response": "15 + 27 = 42"
        },
        {
          "input": "What's the weather?",
          "expected_response": "The weather is sunny"
        }
      ],
      "expected_trajectory": ["calculator", "weather"],
      "assertions": [
        "Agent used the calculator tool for the math question",
        "Agent used the weather tool when asked about weather"
      ]
    }
  ]
}
```

### Scenario fields
<a name="dataset-schema-predefined-fields"></a>


| Field | Required | Description | 
| --- | --- | --- | 
|  `scenario_id`  | Yes | Unique identifier for the scenario. | 
|  `turns`  | Yes | List of turns in the conversation. Each turn has `input` (required) and `expected_response` (optional). | 
|  `expected_trajectory`  | No | Expected sequence of tool names. Used by trajectory evaluators (`Builtin.TrajectoryExactOrderMatch`, `Builtin.TrajectoryInOrderMatch`, `Builtin.TrajectoryAnyOrderMatch`). | 
|  `assertions`  | No | Natural language assertions about expected behavior. Used by `Builtin.GoalSuccessRate`. | 

### Turn fields
<a name="dataset-schema-turn-fields"></a>


| Field | Required | Description | 
| --- | --- | --- | 
|  `input`  | Yes | The prompt sent to the agent for this turn. Can be a string or a dict. | 
|  `expected_response`  | No | The expected agent response for this turn. Used by `Builtin.Correctness`. Mapped positionally to the trace produced by this turn; turn 0 maps to trace 0, turn 1 maps to trace 1. | 

## Simulated scenarios
<a name="dataset-schema-simulated"></a>

A simulated scenario defines an actor profile and an initial input. The actor generates subsequent turns dynamically:

```
{
  "scenarios": [
    {
      "scenario_id": "geography-student",
      "scenario_description": "A curious student asks geography questions",
      "actor_profile": {
        "traits": {"expertise": "novice", "tone": "curious"},
        "context": "A student studying world geography who wants to learn about capitals",
        "goal": "Find out the capital cities of at least two different countries"
      },
      "input": "Hi! I'm studying geography. Can you help me learn about world capitals?",
      "max_turns": 5,
      "assertions": [
        "Agent provides accurate capital city information",
        "Agent is helpful and encouraging to the student"
      ]
    }
  ]
}
```

### Scenario fields
<a name="dataset-schema-simulated-fields"></a>


| Field | Required | Description | 
| --- | --- | --- | 
|  `scenario_id`  | Yes | Unique identifier for the scenario. | 
|  `actor_profile`  | Yes | The actor’s identity and objective, containing `context` (required), `goal` (required), and `traits` (optional). See [User simulation](user-simulation.md). | 
|  `input`  | Yes | The first message sent to your agent to start the conversation. | 
|  `scenario_description`  | No | Optional metadata describing the scenario. Useful for organizing and identifying scenarios in results. | 
|  `max_turns`  | No | Maximum number of turns before the conversation stops. Default: 10. | 
|  `assertions`  | No | Natural language assertions about expected behavior. Used by `Builtin.GoalSuccessRate`. | 

**Note**  
Simulated scenarios do not support `expected_trajectory` or per-turn `expected_response` because the conversation flow is not known in advance. Use `assertions` for ground truth with simulated scenarios.

## Ground truth mapping
<a name="dataset-schema-ground-truth"></a>

Both runners automatically map dataset fields to the evaluators that use them:


| Evaluator | Ground truth field | Level | Description | 
| --- | --- | --- | --- | 
|  `Builtin.Correctness`  |  `turns[].expected_response`  | Trace | Measures how accurately the agent’s response matches the expected answer. | 
|  `Builtin.GoalSuccessRate`  |  `assertions`  | Session | Validates whether the agent’s behavior satisfies natural language assertions. | 
|  `Builtin.TrajectoryExactOrderMatch`  |  `expected_trajectory`  | Session | Checks that the actual tool call sequence matches exactly. | 
|  `Builtin.TrajectoryInOrderMatch`  |  `expected_trajectory`  | Session | Checks that expected tools appear in order, allowing extras between them. | 
|  `Builtin.TrajectoryAnyOrderMatch`  |  `expected_trajectory`  | Session | Checks that all expected tools are present, regardless of order. | 
+ Ground truth fields are optional. Evaluators that do not use ground truth (for example, `Builtin.Helpfulness`, `Builtin.Faithfulness`) evaluate based on session content alone.
+ You can include all ground truth fields in a single dataset. Each runner routes the relevant fields to the appropriate evaluators.
+ If no ground truth fields are present, evaluators fall back to their ground truth-free mode.

For more details on ground truth fields and how they work with the Evaluate API, see [Ground truth evaluations](ground-truth-evaluations.md).

## Inline dataset construction
<a name="dataset-schema-inline-construction"></a>

Instead of loading from a JSON file, you can construct datasets directly in Python:

```
from bedrock_agentcore.evaluation import Dataset, PredefinedScenario, Turn

dataset = Dataset(
    scenarios=[
        PredefinedScenario(
            scenario_id="math-question",
            turns=[
                Turn(
                    input="What is 15 + 27?",
                    expected_response="15 + 27 = 42",
                ),
            ],
            expected_trajectory=["calculator"],
            assertions=["Agent used the calculator tool"],
        ),
        PredefinedScenario(
            scenario_id="weather-check",
            turns=[
                Turn(input="What's the weather?"),
            ],
            expected_trajectory=["weather"],
        ),
    ]
)
```

Or load from a JSON file:

```
from bedrock_agentcore.evaluation import FileDatasetProvider

dataset = FileDatasetProvider("dataset.json").get_dataset()
```