本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
为使用人工工作人员的模型评测作业创建自定义提示数据集
要创建使用人工工作人员的模型评测作业,您必须指定自定义提示数据集。这些提示随后会在推理过程中用于所选的待评测模型。
如果您想使用已生成的响应来评测非 Amazon Bedrock 模型,请按使用自己的推理响应数据执行评测作业中所述,将这些响应包含在提示数据集内。当您提供自己的推理响应数据时,Amazon Bedrock 会跳过模型-调用步骤,并使用您提供的数据执行评测作业。
自定义提示数据集必须存储在 Amazon S3 中,使用 JSON 行格式和 .jsonl 文件扩展名。每行必须是有效的 JSON 对象。每个自动评估作业的数据集中最多可以有 1000 条提示。
对于使用控制台创建的作业,您必须更新 S3 存储桶上的跨源资源共享(CORS)配置。要了解有关所需 CORS 权限的更多信息,请参阅 必需的 S3 存储桶的跨源资源共享(CORS)权限。
执行 Amazon Bedrock 在其中为您调用模型的评测作业
要运行 Amazon Bedrock 在其中为您调用模型的评测作业,请提供包含以下键-值对的提示数据集:
-
prompt– 您希望模型进行响应的提示。 -
referenceResponse–(可选)工作人员在评测期间可参考的基础事实响应。 -
category–(可选)在模型评测报告卡片中查看结果时用于筛选结果的键。
工作人员可以在自己的 UI 中看到您为 prompt 和 referenceResponse 指定的内容。
下面是一个包含 6 个输入并使用了 JSON 行格式的自定义数据集示例。
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
为清楚起见,以下示例显示了一个展开的条目。在实际提示数据集内,每一行都必须是一个有效的 JSON 对象。
{ "prompt": "What is high intensity interval training?", "category": "Fitness", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods." }
使用自己的推理响应数据执行评测作业
要使用已生成的响应运行评测作业,您需要提供一个包含以下键-值对的提示数据集:
-
prompt– 您的模型用来生成响应的提示。 -
referenceResponse–(可选)工作人员在评测期间可参考的基础事实响应。 -
category–(可选)在模型评测报告卡片中查看结果时用于筛选结果的键。 -
modelResponses– 来自要评测的推理的响应。您可以在modelResponses列表中提供一个或两个具有以下属性的条目。-
response– 包含模型推理响应的字符串。 -
modelIdentifier– 标识生成了响应的模型的字符串。
-
提示数据集内的每一行都必须包含相同数量的响应(一个或两个)。此外,您必须在每行中指定一个或多个相同的模型标识符,并且在单个数据集内,用于 modelIdentifier 的唯一值不得超过 2 个。
下面是一个包含 6 个采用 JSON 行格式的输入的自定义数据集示例。
{"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]} {"prompt":"The prompt you used to generate the model responses","referenceResponse":"(Optional) a ground truth response","category":"(Optional) a category for the prompt","modelResponses":[{"response":"The response your first model generated","modelIdentifier":"A string identifying your first model"},{"response":"The response your second model generated","modelIdentifier":"A string identifying your second model"}]}
为清楚起见,下面的示例显示了提示数据集内一个已展开的条目。
{ "prompt": "What is high intensity interval training?", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods.", "category": "Fitness", "modelResponses": [ { "response": "High intensity interval training (HIIT) is a workout strategy that alternates between short bursts of intense, maximum-effort exercise and brief recovery periods, designed to maximize calorie burn and improve cardiovascular fitness.", "modelIdentifier": "Model1" }, { "response": "High-intensity interval training (HIIT) is a cardiovascular exercise strategy that alternates short bursts of intense, anaerobic exercise with less intense recovery periods, designed to maximize calorie burn, improve fitness, and boost metabolic rate.", "modelIdentifier": "Model2" } ] }