执行 Amazon Bedrock 在其中为您调用模型的评测作业使用自己的推理响应数据执行评测作业

为使用人工工作人员的模型评测作业创建自定义提示数据集

要创建使用人工工作人员的模型评测作业，您必须指定自定义提示数据集。这些提示随后会在推理过程中用于所选的待评测模型。

如果您想使用已生成的响应来评测非 Amazon Bedrock 模型，请按使用自己的推理响应数据执行评测作业中所述，将这些响应包含在提示数据集内。当您提供自己的推理响应数据时，Amazon Bedrock 会跳过模型-调用步骤，并使用您提供的数据执行评测作业。

自定义提示数据集必须存储在 Amazon S3 中，使用 JSON 行格式和 .jsonl 文件扩展名。每行必须是有效的 JSON 对象。每个自动评估作业的数据集中最多可以有 1000 条提示。

对于使用控制台创建的作业，您必须更新 S3 存储桶上的跨源资源共享（CORS）配置。要了解有关所需 CORS 权限的更多信息，请参阅必需的 S3 存储桶的跨源资源共享（CORS）权限。

执行 Amazon Bedrock 在其中为您调用模型的评测作业

要运行 Amazon Bedrock 在其中为您调用模型的评测作业，请提供包含以下键-值对的提示数据集：

prompt – 您希望模型进行响应的提示。
referenceResponse –（可选）工作人员在评测期间可参考的基础事实响应。
category –（可选）在模型评测报告卡片中查看结果时用于筛选结果的键。

工作人员可以在自己的 UI 中看到您为 prompt 和 referenceResponse 指定的内容。

下面是一个包含 6 个输入并使用了 JSON 行格式的自定义数据集示例。


{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}

为清楚起见，以下示例显示了一个展开的条目。在实际提示数据集内，每一行都必须是一个有效的 JSON 对象。


{
  "prompt": "What is high intensity interval training?",
  "category": "Fitness",
  "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods."
}

使用自己的推理响应数据执行评测作业

要使用已生成的响应运行评测作业，您需要提供一个包含以下键-值对的提示数据集：

prompt – 您的模型用来生成响应的提示。
referenceResponse –（可选）工作人员在评测期间可参考的基础事实响应。
category –（可选）在模型评测报告卡片中查看结果时用于筛选结果的键。
modelResponses – 来自要评测的推理的响应。您可以在 modelResponses 列表中提供一个或两个具有以下属性的条目。
- response – 包含模型推理响应的字符串。
- modelIdentifier – 标识生成了响应的模型的字符串。

提示数据集内的每一行都必须包含相同数量的响应（一个或两个）。此外，您必须在每行中指定一个或多个相同的模型标识符，并且在单个数据集内，用于 modelIdentifier 的唯一值不得超过 2 个。

下面是一个包含 6 个采用 JSON 行格式的输入的自定义数据集示例。


{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}
{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}
{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}
{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}
{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}
{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}

为清楚起见，下面的示例显示了提示数据集内一个已展开的条目。


{
    "prompt": "What is high intensity interval training?",
    "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods.",
    "category": "Fitness",
     "modelResponses": [
        {
            "response": "High intensity interval training (HIIT) is a workout strategy that alternates between short bursts of intense, maximum-effort exercise and brief recovery periods, designed to maximize calorie burn and improve cardiovascular fitness.",
            "modelIdentifier": "Model1"
        },
        {
            "response": "High-intensity interval training (HIIT) is a cardiovascular exercise strategy that alternates short bursts of intense, anaerobic exercise with less intense recovery periods, designed to maximize calorie burn, improve fitness, and boost metabolic rate.",
            "modelIdentifier": "Model2"
        }
    ]
}

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

创建使用人工工作人员的模型评测作业

创建模型评测作业