Meta Llama models
This section describes the request parameters and response fields for Meta Llama models. Use this information to make inference calls to Meta Llama models with the InvokeModel and InvokeModelWithResponseStream (streaming) operations. This section also includes Python code examples that shows how to call Meta Llama models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see Supported foundation models in Amazon Bedrock. Some models also work with the Converse API. To check if the Converse API supports a specific Meta Llama model, see Supported models and model features. For more code examples, see Code examples for Amazon Bedrock using AWS SDKs.
Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Meta Llama models support, see Supported foundation models in Amazon Bedrock. To check which Amazon Bedrock features the Meta Llama models support, see Supported foundation models in Amazon Bedrock. To check which AWS Regions that Meta Llama models are available in, see Supported foundation models in Amazon Bedrock.
When you make inference calls with Meta Llama models, you include a prompt for the model. For general information
        about creating prompts for the models that Amazon Bedrock supports, see  Prompt engineering concepts.
         For Meta Llama specific prompt information, see the Meta Llama prompt engineering guide
Note
Llama 3.2 Instruct and Llama 3.3 Instruct models use geofencing. This means that these models cannot be used outside the AWS Regions available for these models listed in the Regions table.
This section provides information for using the following models from Meta.
- Llama 3 Instruct 
- Llama 3.1 Instruct 
- Llama 3.2 Instruct 
- Llama 3.3 Instruct 
- Llama 4 Instruct 
Request and response
The request body is passed in the body field of a request to
                InvokeModel or InvokeModelWithResponseStream.
Note
You can't use the InvokeModelWithResponseStream or ConverseStream (streaming) operations with Llama 4 Instruct.
Example code
This example shows how to call the Llama 3 Instruct model.
# Use the native inference API to send a text message to Meta Llama 3. import boto3 import json from botocore.exceptions import ClientError # Create a Bedrock Runtime client in the AWS Region of your choice. client = boto3.client("bedrock-runtime", region_name="us-west-2") # Set the model ID, e.g., Llama 3 70b Instruct. model_id = "meta.llama3-70b-instruct-v1:0" # Define the prompt for the model. prompt = "Describe the purpose of a 'hello world' program in one line." # Embed the prompt in Llama 3's instruction format. formatted_prompt = f""" <|begin_of_text|><|start_header_id|>user<|end_header_id|> {prompt} <|eot_id|> <|start_header_id|>assistant<|end_header_id|> """ # Format the request payload using the model's native structure. native_request = { "prompt": formatted_prompt, "max_gen_len": 512, "temperature": 0.5, } # Convert the native request to JSON. request = json.dumps(native_request) try: # Invoke the model with the request. response = client.invoke_model(modelId=model_id, body=request) except (ClientError, Exception) as e: print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}") exit(1) # Decode the response body. model_response = json.loads(response["body"].read()) # Extract and print the response text. response_text = model_response["generation"] print(response_text)
This example shows how to control the generation length using Llama 3 Instruct models. For detailed responses or summaries, adjust `max_gen_len` and include specific instructions in your prompt.