Benefits Using on-demand inference Pricing Quotas and limits

On-demand inference

On-demand inference provides serverless access to Amazon Nova models without requiring provisioned capacity. This mode automatically scales to handle your workload and charges based on usage.

Benefits

On-demand inference offers several advantages:

No capacity planning: Automatically scales to meet demand
Pay per use: Charged only for tokens processed
Instant availability: No provisioning or warm-up time required
Cost effective: Ideal for variable or unpredictable workloads

Using on-demand inference

On-demand inference is the default mode for Amazon Nova models. Simply specify the model ID when making API calls:


import boto3

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

response = bedrock.converse(
    modelId='us.amazon.nova-2-lite-v1:0',
    messages=[
        {
            'role': 'user',
            'content': [{'text': 'Hello, Nova!'}]
        }
    ]
)

# Print the response text
content_list = response["output"]["message"]["content"]
text = next((item["text"] for item in content_list if "text" in item), None)
if text is not None:
    print(text)

Pricing

On-demand inference is billed based on the number of input and output tokens processed. For current pricing details, see Amazon Bedrock pricing.

Quotas and limits

On-demand inference has default quotas that vary by model and region. To request quota increases, use the Service Quotas console.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Using Amazon Nova embeddings

Using Nova capabilities