Core inference
Inference is the process of sending a request to an Amazon Nova model and receiving a generated response. Amazon Nova models support inferencing through two API options:
-
Converse API (Converse, ConverseStream): Provides a consistent interface across different models, making it easier to switch between models or build applications that work with multiple models. Recommended for most use cases.
-
Invoke API (InvokeModel, InvokeModelWithResponseStream): Request payloads are structured specifically for each model's native format. Runs inference using the prompt and inference parameters provided in the request body.
Both APIs support the same core features including multi-turn conversations, multimodal inputs (text, images, video, audio), tool use, guardrails and streaming responses. The request structure is nearly identical between the two APIs, differing only in how byte data (documents, images, video and audio) is encoded.
Note
For model request parameters unique to Amazon Nova models, such as
reasoningConfig and TopK, these are placed within an
additional inferenceConfig object within the
additionalModelRequestFields. These are top-level parameters for
InvokeModel and InvokeModelWithResponseStream.
Set the modelId to one of the following to use Amazon Nova models:
Model |
Model ID |
|---|---|
| Nova 2 Lite |
|
Nova 2 Sonic |
|
Nova Multimodal Embeddings |
amazon.nova-2-multimodal-embeddings-v1:0 |
Important
Amazon Nova inference requests can take up to 60 minutes to complete. Configure your client timeout settings accordingly:
from botocore.config import Config bedrock = boto3.client( 'bedrock-runtime', region_name='us-east-1', config=Config( read_timeout=3600 # 60 minutes ) )