

# Create a prompt dataset for a RAG evaluation in Amazon Bedrock
<a name="knowledge-base-evaluation-prompt"></a>

To evaluate retrieval and generation for an Amazon Bedrock Knowledge Base or for your own Retrieval Augmented Generation (RAG) system, you provide a prompt dataset. When you provide response data from your own RAG system, Amazon Bedrock skips the Knowledge Base invoke step and performs the evaluation job directly on your data.

Prompt datasets must be stored in Amazon S3 and use the JSON line format and `.jsonl` file extension. Each line must be a valid JSON object. There can be up to 1000 prompts in your dataset per evaluation job. For retrieve-and-generate evaluation jobs, the maximum number of turns for each conversation is 5. For retrieve-only evaluations, you can specify only a single turn.

For jobs created using the console you must update the Cross Origin Resource Sharing (CORS) configuration on the S3 bucket. To learn more about the required CORS permissions, see [Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets](model-evaluation-security-cors.md). 

See the following topics to learn more about key value pairs that are required based on the type of evaluation job you select.

**Topics**
+ [Create a prompt dataset for retrieve-only RAG evaluation jobs](knowledge-base-evaluation-prompt-retrieve.md)
+ [Creating a prompt dataset for retrieve-and-generate RAG evaluation jobs](knowledge-base-evaluation-prompt-retrieve-generate.md)

# Create a prompt dataset for retrieve-only RAG evaluation jobs
<a name="knowledge-base-evaluation-prompt-retrieve"></a>

A retrieve-only evaluation jobs require a prompt dataset using JSON lines format. You can have up to 1000 prompts in your dataset.

## Prepare a dataset for a retrieve-only evaluation job where Amazon Bedrock invokes your Knowledge Base
<a name="knowledge-base-evaluation-prompt-retrieve-invoke"></a>

To create a retrieve-only evaluation job where Amazon Bedrock invokes your Knowledge Base, your prompt dataset must contain the following key-value pairs:
+ `referenceResponses` – This parent key is used to specify the ground-truth response you would expect an end-to-end RAG system to return. This parameter does not represent the expected passages or chunks you expect to be retrieved from your Knowledge Base. Specify the ground truth in the `text` key. `referenceResponses` is required if you choose the **Context coverage** metric in your evaluation job.
+ `prompt` – This parent key is used to specify the prompt (user query) that you want the RAG system to respond to.

The following is an example custom dataset that contains 6 inputs and uses the JSON line format.

```
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
```

The following prompt is expanded for clarity. In your actual prompt dataset each line (a prompt) must be a valid JSON object.

```
{
    "conversationTurns": [
        {
            "prompt": {
                "content": [
                    {
                        "text": "What is the recommended service interval for your product?"
                    }
                ]
            },
            "referenceResponses": [
                {
                    "content": [
                        {
                            "text": "The recommended service interval for our product is two years."
                        }
                    ]
                }
            ]
        }
    ]
}
```

## Prepare a dataset for a retrieve-only evaluation job using your own inference response data
<a name="knowledge-base-evaluation-prompt-retrieve-byoir"></a>

To create a retrieve-only evaluation job where you provide your own inference response data, your prompt dataset must contain the following:
+ `prompt` – this parent key is used to specify the prompt (user query) that you used to generate your inference response data.
+ `referenceResponses` – This parent key is used to specify the ground-truth response you would expect an end-to-end RAG system to return. This parameter does not represent the expected passages or chunks you expect to be retrieved from the knowledge base. Specify the ground truth in the `text` key. `referenceResponses` is required if you choose the **Context coverage** metric in your evaluation job.
+ `referenceContexts` (optional) – This optional parent key is used to specify the ground truth passages you would expect to be retrieved from the RAG source. You only need to include this key if you want to use it in your own custom evaluation metrics. The built-in metrics Amazon Bedrock provides don't use this property.
+ `knowledgeBaseIdentifier` – a customer-defined string identifying the RAG source used to generate the retrieval results.
+ `retrievedResults` – a JSON object with a list of retrieval results. For each result, you can supply an optional `name` and optional `metadata` specified as key-value pairs.

The following is an example custom dataset that contains 6 inputs and uses the JSON line format.

```
{"conversationTurns":[{"prompt":{"content":[{"text":"The prompt you used to generate your response"}]},"referenceResponses":[{"content":[{"text":"A ground-truth response"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedResults":{"retrievalResults":[{"name":"(Optional) a name for your reference context","content":{"text":"The output from your RAG inference"},"metadata":{"(Optional) a key for your metadata":"(Optional) a metadata value"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"The prompt you used to generate your response"}]},"referenceResponses":[{"content":[{"text":"A ground-truth response"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedResults":{"retrievalResults":[{"name":"(Optional) a name for your reference context","content":{"text":"The output from your RAG inference"},"metadata":{"(Optional) a key for your metadata":"(Optional) a metadata value"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"The prompt you used to generate your response"}]},"referenceResponses":[{"content":[{"text":"A ground-truth response"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedResults":{"retrievalResults":[{"name":"(Optional) a name for your reference context","content":{"text":"The output from your RAG inference"},"metadata":{"(Optional) a key for your metadata":"(Optional) a metadata value"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"The prompt you used to generate your response"}]},"referenceResponses":[{"content":[{"text":"A ground-truth response"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedResults":{"retrievalResults":[{"name":"(Optional) a name for your reference context","content":{"text":"The output from your RAG inference"},"metadata":{"(Optional) a key for your metadata":"(Optional) a metadata value"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"The prompt you used to generate your response"}]},"referenceResponses":[{"content":[{"text":"A ground-truth response"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedResults":{"retrievalResults":[{"name":"(Optional) a name for your reference context","content":{"text":"The output from your RAG inference"},"metadata":{"(Optional) a key for your metadata":"(Optional) a metadata value"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"The prompt you used to generate your response"}]},"referenceResponses":[{"content":[{"text":"A ground-truth response"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedResults":{"retrievalResults":[{"name":"(Optional) a name for your reference context","content":{"text":"The output from your RAG inference"},"metadata":{"(Optional) a key for your metadata":"(Optional) a metadata value"}}]}}}]}
```

The following prompt is expanded for clarity. In your actual prompt dataset each line (a prompt) must be a valid JSON object.

```
{
  "conversationTurns": [
    {
      "prompt": {
        "content": [
          {
            "text": "What is the recommended service interval for your product?"
          }
        ]
      },
      "referenceResponses": [
        {
          "content": [
            {
              "text": "The recommended service interval for our product is two years."
            }
          ]
        }
      ],
      "referenceContexts": [
        {
          "content": [
            {
              "text": "A ground truth for a received passage"
            }
          ]
        }
      ],
       "output": {
        "knowledgeBaseIdentifier": "RAG source 1",
        "retrievedResults": {
          "retrievalResults": [
            {
              "name": "(Optional) a name for your retrieval",
              "content": {
                "text": "The recommended service interval for our product is two years."
              },
              "metadata": {
                "(Optional) a key for your metadata": "(Optional) a value for your metadata"
              }
            }
          ]
        }
      }
    }
  ]
}
```

# Creating a prompt dataset for retrieve-and-generate RAG evaluation jobs
<a name="knowledge-base-evaluation-prompt-retrieve-generate"></a>

A retrieve-and-generate evaluation jobs require a prompt dataset using JSON lines format. You can have up to 1000 prompts in your dataset

## Prepare a dataset for a retrieve-and-generate evaluation job where Amazon Bedrock invokes your Knowledge Base
<a name="knowledge-base-evaluation-prompt-retrieve-generate-invoke"></a>

To create a retrieve-only evaluation job where Amazon Bedrock invokes your Knowledge Base, your prompt dataset must contain the following key-value pairs:
+ `referenceResponses` – This parent key is used to specify the ground truth response you expect the [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html) would return. Specify the ground truth in the `text` key. `referenceResponses` is required if you choose the **Context coverage** metric in your evaluation job.
+ `prompt` – This parent key is used to specify the prompt (user query) that you want the model to respond to while the evaluation job is running.

The following is an example custom dataset that contains 6 inputs and uses the JSON line format.

```
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you want to use during inference"}]},"referenceResponses":[{"content":[{"text":"Specify a ground-truth response"}]}]}]}
```

The following prompt is expanded for clarity. In your actual prompt dataset each line (a prompt) must be a valid JSON object.

```
{
    "conversationTurns": [
        {
            "prompt": {
                "content": [
                    {
                        "text": "What is the recommended service interval for your product?"
                    }
                ]
            },
            "referenceResponses": [
                {
                    "content": [
                        {
                            "text": "The recommended service interval for our product is two years."
                        }
                    ]
                }
            ]
        }
    ]
}
```

## Prepare a dataset for a retrieve-and-generate evaluation job using your own inference response data
<a name="knowledge-base-evaluation-prompt-retrieve-generate-byoir"></a>

To create a retrieve-and-generate evaluation job where you provide your own inference response data, your prompt dataset is a list of conversation turns and contains the following for each turn. You can only evaluate one RAG source per job.
+ `prompt` – The prompt you supplied to your model to generate the results.
+ `referenceResponses` – This parent key is used to specify the ground-truth response you would expect for the final output from your LLM after it has ingested the retrieval results and the input query.
+ `referenceContexts` (optional) – This optional parent key is used to specify the ground truth passages you would expect to be retrieved from the RAG source. You only need to include this key if you want to use it in your own custom evaluation metrics. The built-in metrics Amazon Bedrock provides don't use this property.
+ `output` – the output from your RAG source, comprising the following:
  + `text` – The final output from the LLM in your RAG system.
  + `retrievedPassages` – This parent key is used to specify the content your RAG source retrieved.

Your `output` data must also include the string `knowledgeBaseIdentifier` that defines the RAG source you used to generate the inference responses. You can also include an optional `modelIdentifier` string that identifies the LLM you used. For the `retrievalResults` and `retrievedReferences`, you can supply optional names and metadata.

The following is an example custom dataset that contains 6 inputs and uses the JSON line format.

```
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you used to generate the response"}]},"referenceResponses":[{"content":[{"text":"A ground truth for the final response generated by the LLM"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"text":"The output of the LLM","modelIdentifier":"(Optional) a string identifying your model","knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedPassages":{"retrievalResults":[{"name":"(Optional) a name for your retrieval","content":{"text":"The retrieved content"},"metadata":{"(Optional) a key for your metadata":"(Optional) a value for your metadata"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you used to generate the response"}]},"referenceResponses":[{"content":[{"text":"A ground truth for the final response generated by the LLM"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"text":"The output of the LLM","modelIdentifier":"(Optional) a string identifying your model","knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedPassages":{"retrievalResults":[{"name":"(Optional) a name for your retrieval","content":{"text":"The retrieved content"},"metadata":{"(Optional) a key for your metadata":"(Optional) a value for your metadata"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you used to generate the response"}]},"referenceResponses":[{"content":[{"text":"A ground truth for the final response generated by the LLM"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"text":"The output of the LLM","modelIdentifier":"(Optional) a string identifying your model","knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedPassages":{"retrievalResults":[{"name":"(Optional) a name for your retrieval","content":{"text":"The retrieved content"},"metadata":{"(Optional) a key for your metadata":"(Optional) a value for your metadata"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you used to generate the response"}]},"referenceResponses":[{"content":[{"text":"A ground truth for the final response generated by the LLM"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"text":"The output of the LLM","modelIdentifier":"(Optional) a string identifying your model","knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedPassages":{"retrievalResults":[{"name":"(Optional) a name for your retrieval","content":{"text":"The retrieved content"},"metadata":{"(Optional) a key for your metadata":"(Optional) a value for your metadata"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you used to generate the response"}]},"referenceResponses":[{"content":[{"text":"A ground truth for the final response generated by the LLM"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"text":"The output of the LLM","modelIdentifier":"(Optional) a string identifying your model","knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedPassages":{"retrievalResults":[{"name":"(Optional) a name for your retrieval","content":{"text":"The retrieved content"},"metadata":{"(Optional) a key for your metadata":"(Optional) a value for your metadata"}}]}}}]}
{"conversationTurns":[{"prompt":{"content":[{"text":"Provide the prompt you used to generate the response"}]},"referenceResponses":[{"content":[{"text":"A ground truth for the final response generated by the LLM"}]}],"referenceContexts":[{"content":[{"text":"A ground truth for a received passage"}]}],"output":{"text":"The output of the LLM","modelIdentifier":"(Optional) a string identifying your model","knowledgeBaseIdentifier":"A string identifying your RAG source","retrievedPassages":{"retrievalResults":[{"name":"(Optional) a name for your retrieval","content":{"text":"The retrieved content"},"metadata":{"(Optional) a key for your metadata":"(Optional) a value for your metadata"}}]}}}]}
```

The following shows the prompt dataset format expanded for clarity. In your actual prompt dataset each line (a prompt) must be a valid JSON object.

```
{
    "conversationTurns": [
        {
            "prompt": {
                "content": [
                    {
                        "text": "Provide the prompt you used to generate the responses"
                    }
                ]
            },
            "referenceResponses": [
                {
                    "content": [
                        {
                            "text": "A ground truth for the final response generated by the LLM"
                        }
                    ]
                }
            ],
            "referenceContexts": [
                {
                    "content": [
                        {
                            "text": "A ground truth for a received passage"
                        }
                    ]
                }
            ],
            "output": {
                "text": "The output of the LLM",
                "modelIdentifier": "(Optional) a string identifying your model",
                "knowledgeBaseIdentifier": "A string identifying your RAG source",
                "retrievedPassages": {
                    "retrievalResults": [
                        {
                            "name": "(Optional) a name for your retrieval",
                            "content": {
                                "text": "The retrieved content"
                            },
                            "metadata": {
                                "(Optional) a key for your metadata": "(Optional) a value for your metadata"
                            }
                        }
                    ]
                }
            }
        }
    ]
}
```