Meta Llama 模型

本部分介绍了 Meta Llama 模型的请求参数和响应字段。使用此信息通过InvokeModel和 InvokeModelWithResponseStream（流式传输）操作对MetaLlama模型进行推理调用。本部分还包括 Python 代码示例，展示了如何调用 Meta Llama 模型。要在推理操作中使用模型，您需要相关模型的模型 ID。要获取模型 ID，请参阅 Amazon Bedrock 中支持的根基模型。有些模型也可以使用 ConverseAPI。要检查 Converse API 是否支持特定MetaLlama模型，请参阅支持的模型和模型功能。有关更多代码示例，请参阅使用 Amazon Bedrock 的代码示例 AWS SDKs。

Amazon Bedrock 中的基础模型支持输入和输出模态，这些模态因模型而异。要查看 Meta Llama 模型支持的模态，请参阅 Amazon Bedrock 中支持的根基模型。要查看 Meta Llama 模型支持哪些 Amazon Bedrock 功能，请参阅 Amazon Bedrock 中支持的根基模型。要查看MetaLlama模型在哪些 AWS 区域可用，请参阅Amazon Bedrock 中支持的根基模型。

使用 Meta Llama 模型进行推理调用时，您可以为模型创建提示。有关为 Amazon Bedrock 支持的模型创建提示的一般信息，请参阅提示工程概念。有关 Meta Llama 的特定提示信息，请参阅 MetaLlama 提示工程指南。

注意

Llama 3.2 Instruct而Llama 3.3 Instruct模型则使用地理围栏。这意味着这些模型不能在 AWS 区域表中列出的这些模型的可用区域之外使用。

本部分提供了有关使用以下 Meta 模型的信息。

Llama 3 Instruct
Llama 3.1 Instruct
Llama 3.2 Instruct
Llama 3.3 Instruct
Llama 4 Instruct

请求和响应

请求正文在请求body字段中传递给InvokeModel或InvokeModelWithResponseStream。

注意

您不能将InvokeModelWithResponseStream或 ConverseStream（流式传输）操作与一起使用Llama 4 Instruct。

Request

Llama 3 Instruct、Llama 3.1 InstructLlama 3.2 Instruct、和Llama 4 Instruct模型具有以下推理参数：


{
    "prompt": string,
    "temperature": float,
    "top_p": float,
    "max_gen_len": int
}

注意：Llama 3.2 及更高版本的模型增加了images请求结构，即字符串列表。示例：images: Optional[List[str]]

以下是必要参数：

prompt –（必要）要传递给模型的提示。为了获得最佳结果，请使用以下模板格式化对话。


<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

带有系统提示符的示例模板

下面是一个包含系统提示符的示例。


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

多回合对话示例

以下是多回合对话的示例提示。


<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of France is Paris!<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the weather like in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

带有系统提示符的示例模板

有关更多信息，请参阅 MetaLlama 3。

以下是可选参数：

temperature – 使用较低的值可降低响应中的随机性。

Default	最小值	最大值
0.5	0	1

top_p – 使用较低的值可忽略不太可能的选项。设置为 0 或 1.0 可禁用。

Default	最小值	最大值
0.9	0	1

max_gen_len – 指定要在生成的响应中使用的最大词元数。一旦生成的文本超过 max_gen_len，模型就会截断响应。

Default	最小值	最大值
512	1	2048

Response

Llama 3 Instruct模型为文本完成推理调用返回以下字段。


{
    "generation": "\n\n<response>",
    "prompt_token_count": int,
    "generation_token_count": int,
    "stop_reason" : string
}

下面提供了有关每个字段的更多信息。

generation – 生成的文本。
prompt_token_count – 提示中的词元数。
generation_token_count – 生成的文本中的词元数。
stop_reason – 响应停止生成文本的原因。可能的值有：
- stop – 模型已结束为输入提示生成文本。
- length – 生成的文本的令牌长度超过了对 InvokeModel（如果您要对输出进行流式传输，则为 InvokeModelWithResponseStream）的调用中的 max_gen_len 值。响应被截断为 max_gen_len 个令牌。考虑增大 max_gen_len 的值并重试。

代码示例

此示例说明如何调用Llama 3 Instruct模型。


# Use the native inference API to send a text message to Meta Llama 3.

import boto3
import json

from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS 区域 of your choice.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the model ID, e.g., Llama 3 70b Instruct.
model_id = "meta.llama3-70b-instruct-v1:0"

# Define the prompt for the model.
prompt = "Describe the purpose of a 'hello world' program in one line."

# Embed the prompt in Llama 3's instruction format.
formatted_prompt = f"""
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
{prompt}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

# Format the request payload using the model's native structure.
native_request = {
    "prompt": formatted_prompt,
    "max_gen_len": 512,
    "temperature": 0.5,
}

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=request)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract and print the response text.
response_text = model_response["generation"]
print(response_text)

此示例说明如何使用Llama 3 Instruct模型控制生成长度。有关详细的回复或摘要，请调整 “max_gen_len”，并在提示中加入具体说明。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

Luma AI 模型

Mistral AI 模型