Using Amazon Nova embeddings

Amazon Nova Multimodal Embeddings is a multimodal embeddings model for agentic RAG and semantic search applications. It supports text, documents, images, video and audio through a single model, enabling cross-modal retrieval. Nova Multimodal Embeddings maps each of these content types into a unified semantic space, enabling you to conduct unimodal, cross-modal and multimodal vector operations.

When a piece of content is passed through Nova embeddings, the model converts that content into a universal numerical format, referred to as a vector. A vector is a set of numerical values that can be used for various search functionalities. Similar content is given a closer vector than less similar content. For example, content that could be described as "happy" is given a vector closer to a vector like "joyful" as opposed to one like "sadness".

The Nova Embeddings API can be used in a variety of applications, such as:

Semantic Content Retrieval and Recommendation: Generate embeddings for your content, then use them to find similar items or provide personalized recommendations to your users.
Multimodal Search: Combine embeddings from different content types to enable cross-modal search capabilities.
RAG: Generate embeddings from multimodal content such as documents with interleaved text and images to power your retrieval workflow for GenAI applications.

Key features

Support for text, image, document image, video and audio in a unified semantic space. The maximum context length is 8K tokens or 30s of video and 30s of audio.
Synchronous and asynchronous APIs: The API supports both synchronous and asynchronous use.
Large file segmentation: The async API makes it easy to work with large inputs by providing API built segmentation for long text, video and audio, controlled by user-defined parameters. The model will generate a single embedding for each segment.
Video with audio: Process video with audio simultaneously. The API enables you to specify if you would like a single embedding representing both modalities or two separate embeddings representing the video and audio stream respectively.
Embedding purpose: Nova Multimodal Embeddings enables you to optimize your embeddings depending on the intended downstream application. Supported use-cases include retrieval (RAG/Search), classification and clustering.
Dimension sizes: 4 dimension sizes to trade-off embedding accuracy and vector storage cost: 3072; 1024; 384; 256.
Input methods: You can either pass content to be embedded by specifying an S3 URI or inline as a base64 encoding.

Generating embeddings

You can generate embeddings synchronously or asynchronously.

For smaller content items, you can use the Bedrock Runtime InvokeModel API. This is a good option for quickly generating embeddings for text, images, or short audio/video files.

The following example generates a synchronous embedding for the text "Hello World!"


import boto3
import json

# Create the Bedrock Runtime client.
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1',
)

# Define the request body.
request_body = {
    'taskType': 'SINGLE_EMBEDDING',
    'singleEmbeddingParams': {
        'embeddingPurpose': 'GENERIC_INDEX',
        'embeddingDimension': 3072,
        'text': {'truncationMode': 'END', 'value': 'Hello, World!'},
    },
}

try:
    # Invoke the Nova Embeddings model.
    response = bedrock_runtime.invoke_model(
        body=json.dumps(request_body, indent=2),
        modelId='amazon.nova-2-multimodal-embeddings-v1:0',
        accept='application/json',
        contentType='application/json',
    )
    
except Exception as e:
    # Add your own exception handling here.
    print(e)
    
# Print the request ID.
print('Request ID:', response.get('ResponseMetadata').get('RequestId'))

# Print the response body.
response_body = json.loads(response.get('body').read())
print(json.dumps(response_body, indent=2))

For larger content files, you can use the Bedrock Runtime StartAsyncInvoke function to generate embeddings asynchronously. This allows you to submit a job and retrieve the results later, without blocking application execution. Results are saved to Amazon S3.

The following example starts an asynchronous embedding generation job for a video file:


import boto3

# Create the Bedrock Runtime client.
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1',
)

model_input = {
    'taskType': 'SEGMENTED_EMBEDDING',
    'segmentedEmbeddingParams': {
        'embeddingPurpose': 'GENERIC_INDEX',
        'embeddingDimension': 3072,
        'video': {
            'format': 'mp4',
            'embeddingMode': 'AUDIO_VIDEO_COMBINED',
            'source': {
                's3Location': {'uri': 's3://my-bucket/path/to/video.mp4'}
            },
            'segmentationConfig': {
                'durationSeconds': 15  # Segment into 15 second chunks
            },
        },
    },
}

try:
    # Invoke the Nova Embeddings model.
    response = bedrock_runtime.start_async_invoke(
        modelId='amazon.nova-2-multimodal-embeddings-v1:0',
        modelInput=model_input,
        outputDataConfig={
            's3OutputDataConfig': {
                's3Uri': 's3://my-bucket'
            }
        },
    )

except Exception as e:
    # Add your own exception handling here.
    print(e)

# Print the request ID.
print('Request ID:', response.get('ResponseMetadata').get('RequestId'))

# Print the invocation ARN.
print('Invocation ARN:', response.get('invocationArn'))

Batch inference

With batch inference, you can submit multiple requests and generate embeddings asynchronously. Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket. After defining model inputs in files you create, you upload the files to an S3 bucket. You then submit a batch inference request and specify the S3 bucket. After the job is complete, you can retrieve the output files from S3. You can use batch inference to improve the performance of model inference on large datasets.

Format and upload your batch inference data

You must add your batch inference data to an S3 location that you'll choose or specify when submitting a model invocation job. The S3 location must contain at least one JSONL file that defines the model inputs. A JSONL contains rows of JSON objects. Your JSONL file must end in the extension .jsonl and be in the following format:


{
    "recordId": "record001",
    "modelInput": {
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_INDEX",
            "embeddingDimension": 3072,
            "text": {
                "source": {
                    "s3Location": {
                        "uri": "s3://batch-inference-input-bucket/text_001.txt",
                        "bucketOwner": "111122223333"
                    }
                },
                "truncationMode": "END"
            }
        }
    }
}

You can use the following input file types:

Image Formats: PNG, JPEG, WEBP, GIF
Audio Formats: MP3, WAV, OGG
Video Formats: MP4, MOV, MKV, WEBM, FLV, MPEG, MPG, WMV, 3GP

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Streaming responses

On-demand inference