为使用人工工作人员的模型评测作业创建自定义提示数据集 - Amazon Bedrock

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

为使用人工工作人员的模型评测作业创建自定义提示数据集

要创建使用人工工作人员的模型评测作业,您必须指定自定义提示数据集。这些提示随后会在推理过程中用于所选的待评测模型。

如果您想使用已生成的响应来评测非 Amazon Bedrock 模型,请按使用自己的推理响应数据执行评测作业中所述,将这些响应包含在提示数据集内。当您提供自己的推理响应数据时,Amazon Bedrock 会跳过模型-调用步骤,并使用您提供的数据执行评测作业。

自定义提示数据集必须存储在 Amazon S3 中,使用 JSON 行格式和 .jsonl 文件扩展名。每行必须是有效的 JSON 对象。每个自动评估作业的数据集中最多可以有 1000 条提示。

对于使用控制台创建的作业,您必须更新 S3 存储桶上的跨源资源共享(CORS)配置。要了解有关所需 CORS 权限的更多信息,请参阅 必需的 S3 存储桶的跨源资源共享(CORS)权限

执行 Amazon Bedrock 在其中为您调用模型的评测作业

要运行 Amazon Bedrock 在其中为您调用模型的评测作业,请提供包含以下键-值对的提示数据集:

  • prompt – 您希望模型进行响应的提示。

  • referenceResponse –(可选)工作人员在评测期间可参考的基础事实响应。

  • category –(可选)在模型评测报告卡片中查看结果时用于筛选结果的键。

工作人员可以在自己的 UI 中看到您为 promptreferenceResponse 指定的内容。

下面是一个包含 6 个输入并使用了 JSON 行格式的自定义数据集示例。

{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."} {"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."} {"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."} {"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."} {"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."} {"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}

为清楚起见,以下示例显示了一个展开的条目。在实际提示数据集内,每一行都必须是一个有效的 JSON 对象。

{ "prompt": "What is high intensity interval training?", "category": "Fitness", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods." }

使用自己的推理响应数据执行评测作业

要使用已生成的响应运行评测作业,您需要提供一个包含以下键-值对的提示数据集:

  • prompt – 您的模型用来生成响应的提示。

  • referenceResponse –(可选)工作人员在评测期间可参考的基础事实响应。

  • category –(可选)在模型评测报告卡片中查看结果时用于筛选结果的键。

  • modelResponses – 来自要评测的推理的响应。您可以在 modelResponses 列表中提供一个或两个具有以下属性的条目。

    • response – 包含模型推理响应的字符串。

    • modelIdentifier – 标识生成了响应的模型的字符串。

提示数据集内的每一行都必须包含相同数量的响应(一个或两个)。此外,您必须在每行中指定一个或多个相同的模型标识符,并且在单个数据集内,用于 modelIdentifier 的唯一值不得超过 2 个。

下面是一个包含 6 个采用 JSON 行格式的输入的自定义数据集示例。

{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}

为清楚起见,下面的示例显示了提示数据集内一个已展开的条目。

{ "prompt": "What is high intensity interval training?", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods.", "category": "Fitness", "modelResponses": [ { "response": "High intensity interval training (HIIT) is a workout strategy that alternates between short bursts of intense, maximum-effort exercise and brief recovery periods, designed to maximize calorie burn and improve cardiovascular fitness.", "modelIdentifier": "Model1" }, { "response": "High-intensity interval training (HIIT) is a cardiovascular exercise strategy that alternates short bursts of intense, anaerobic exercise with less intense recovery periods, designed to maximize calorie burn, improve fitness, and boost metabolic rate.", "modelIdentifier": "Model2" } ] }