Meta Llama 模型

本節說明 Meta Llama 模型的請求參數和回應欄位。使用此資訊透過 InvokeModel 和 InvokeModelWithResponseStream (串流) 操作對 Meta Llama 模型進行推論呼叫。本節也包含 Python 程式碼範例，示範如何呼叫 Meta Llama 模型。若要在推論操作中使用模型，您需要模型的模型 ID。若要取得模型 ID，請參閱Amazon Bedrock 中支援的基礎模型。某些模型也可以使用 Converse API。若要檢查特定Meta Llama模型是否支援某項功能，請參閱模型一目了然。如需更多程式碼範例，請參閱使用 AWS SDKs Amazon Bedrock 程式碼範例。

Amazon Bedrock 中的基礎模型支援輸入和輸出模態，因模型而異。若要檢查 Meta Llama 模型支援的模態，請參閱 Amazon Bedrock 中支援的基礎模型。若要檢查 Meta Llama 模型支援的 Amazon Bedrock 功能，請參閱 Amazon Bedrock 中支援的基礎模型。若要檢查哪些 AWS 區域提供Meta Llama模型，請參閱 Amazon Bedrock 中支援的基礎模型。

當您使用 Meta Llama 模型進行推論呼叫時，您會包含模型的提示。如需建立 Amazon Bedrock 支援之模型提示的相關資訊，請參閱提示工程概念。如需 Meta Llama 特定提示資訊，請參閱《Meta Llama 提示工程指南》。

注意

Llama 3.2 Instruct 和 Llama 3.3 Instruct 模型使用地理柵欄。這表示這些模型不能在 AWS 區域資料表中列出的這些模型可用的區域之外使用。

本節提供有關使用 Meta 中下列模型的資訊。

Llama 3 Instruct
Llama 3.1 Instruct
Llama 3.2 Instruct
Llama 3.3 Instruct
Llama 4 Instruct

請求與回應

請求本文在請求 body 欄位中傳遞到 InvokeModel 或 InvokeModelWithResponseStream。

注意

您無法將 InvokeModelWithResponseStream 或 ConverseStream (串流) 操作搭配 Llama 4 Instruct 使用。

Request

Llama 3 Instruct、Llama 3.1 Instruct、Llama 3.2 Instruct 和 Llama 4 Instruct 模型具有下列推論參數：


{
    "prompt": string,
    "temperature": float,
    "top_p": float,
    "max_gen_len": int
}

注意：Llama 3.2 和更新版本模型會將新增至 images 請求結構，此結構是字串清單。範例：images: Optional[List[str]]

以下是必要的參數：

prompt – (必要) 您要傳遞至模型的提示。為了獲得最佳結果，請使用下列範本將對話格式化。


<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

具有系統提示的範例範本

以下是包含系統提示的範例提示。


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

多回合對話範例

以下是多回合對話的範例提示。


<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of France is Paris!<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the weather like in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

具有系統提示的範例範本

如需更多詳細資訊，請參閱 MetaLlama 3。

以下是選用參數：

temperature – 使用較低的值來降低回應中的隨機性。

預設	下限	上限
0.5	0	1

top_p – 使用較低的值可忽略較少可能的選項。設定為 1.0 以停用。

預設	下限	上限
0.9	0	1

max_gen_len – 指定產生的回應中使用的字符數目上限。一旦產生的文字超過 max_gen_len，模型就會截斷回應。

預設	下限	上限
512	1	2048

Response

這些 Llama 3 Instruct 模型會傳回文字完成推論呼叫的下列欄位。


{
    "generation": "\n\n<response>",
    "prompt_token_count": int,
    "generation_token_count": int,
    "stop_reason" : string
}

下方提供有關每個欄位的詳細資訊。

產生 – 產生的文字。
prompt_token_count – 提示中的字符數量。
generation_token_count – 所產生文字中的字符數量。
stop_reason – 回應停止產生文字的原因。可能值為：
- stop — 模型已完成產生輸入提示的文字。
- length — 產生文字的記號長度超出對 InvokeModel 呼叫之 max_gen_len 的值 (如果您正在串流輸出則為 InvokeModelWithResponseStream)。回應會截斷為 max_gen_len 記號。請考慮增加 max_gen_len 的值，然後再試一次。

範例程式碼

此範例展示如何 Llama 3 Instruct 模型。


# Use the native inference API to send a text message to Meta Llama 3.

import boto3
import json

from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS 區域 of your choice.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the model ID, e.g., Llama 3 70b Instruct.
model_id = "meta.llama3-70b-instruct-v1:0"

# Define the prompt for the model.
prompt = "Describe the purpose of a 'hello world' program in one line."

# Embed the prompt in Llama 3's instruction format.
formatted_prompt = f"""
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
{prompt}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

# Format the request payload using the model's native structure.
native_request = {
    "prompt": formatted_prompt,
    "max_gen_len": 512,
    "temperature": 0.5,
}

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=request)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract and print the response text.
response_text = model_response["generation"]
print(response_text)

此範例展示如何使用 Llama 3 Instruct 模型控制產生長度。如需詳細回應或摘要，請調整 `max_gen_len`，並在提示中包含特定指示。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

Luma AI 模型

Mistral AI 模型