TwelveLabs Marengo Embed 2.7 - Amazon Bedrock

TwelveLabs Marengo Embed 2.7

The TwelveLabs Marengo Embed 2.7 model generates embeddings from video, text, audio, or image inputs. These embeddings can be used for similarity search, clustering, and other machine learning tasks. The model supports asynchronous inference through the StartAsyncInvoke API.

  • Provider — TwelveLabs

  • Categories — Embeddings, multimodal

  • Model ID — twelvelabs.marengo-embed-2-7-v1:0

  • Input modality — Video, Text, Audio, Image

  • Output modality — Embeddings

  • Max video size — 2 hours long video (< 2GB file size)

TwelveLabs Marengo Embed 2.7 request parameters

The following table describes the input parameters for the TwelveLabs Marengo Embed 2.7 model:

TwelveLabs Marengo Embed 2.7 request parameters
Field Type Required Description
inputType string Yes Modality for the embedding. Valid values: video, text, audio, image.
inputText string No Text to be embedded when inputType is text. Required if inputType is text. Text input is not available by S3 URI but only by the inputText field.
startSec double No The start offset in seconds from the beginning of the video or audio where processing should begin. Specifying 0 means starting from the beginning of the media. Default: 0, Min: 0.
lengthSec double No The length in seconds of the video or audio where the processing would take from startSec. Default: media duration, Max: media duration.
useFixedLengthSec double No For audio or video inputs only. The desired fixed duration in seconds for each clip for which the platform generates an embedding. Min: 2, Max: 10. If missing, for video: segments are divided dynamically by shot boundary detection; for audio: segments are divided evenly to be closest to 10 seconds (so if it's a 50 second clip then it will be 5 segments with 10 seconds each, but if it's a 16 second clip it will be 2 segments 8 seconds each).
textTruncate string No For text input only. Specifies how the platform truncates text that exceeds 77 tokens. Valid values: end (truncate the end of the text), none (return an error if text exceeds limit). Default: end.
embeddingOption list No For video input only. Specifies which types of embeddings to retrieve. Valid values: visual-text (visual embeddings optimized for text search), visual-image (visual embeddings optimized for image search), audio (audio embeddings). If not provided, all available embeddings are returned.
mediaSource object No Describes the media source. Required for input types: image, video, and audio.
mediaSource.base64String string No Base64 encoded byte string for the media. Max: 36MB. Either base64String or s3Location must be provided if mediaSource is used.
mediaSource.s3Location.uri string No S3 URI where the media could be downloaded from. For video, max: 2 hours long (< 2GB file size). Required if using s3Location.
mediaSource.s3Location.bucketOwner string No AWS account ID of the bucket owner.
minClipSec int No For video input only. Set a minimum clip second. Note: useFixedLengthSec should be larger than this value. Default: 4, Min: 1, Max: 5.

TwelveLabs Marengo Embed 2.7 response fields

The following table describes the output fields for the TwelveLabs Marengo Embed 2.7 model:

TwelveLabs Marengo Embed 2.7 response fields
Field Type Description
embedding List of doubles Embedding values
embeddingOption string The type of embeddings for multi-vector output (only applicable for video). Valid values: visual-text (visual embeddings closely aligned with text embeddings), visual-image (visual embeddings closely aligned with image embeddings), audio (audio embeddings).
startSec double The start offset of the clip. Not applicable for text and image embeddings.
endSec double The end offset of the clip. Not applicable for text and image embeddings.

TwelveLabs Marengo Embed 2.7 request and response

The following examples show how to use the TwelveLabs Marengo Embed 2.7 model with different input types. Note that TwelveLabs Marengo Embed 2.7 uses the StartAsyncInvoke API for processing.

Request

The following examples show request formats for the TwelveLabs Marengo Embed 2.7 model using the StartAsyncInvoke API.

Text input:

{ "modelId": "twelvelabs.marengo-embed-2-7-v1:0", "modelInput": { "inputType": "text", "inputText": "Spiderman flies through a street and catches a car with his web" }, "outputDataConfig": { "s3OutputDataConfig": { "s3Uri": "s3://your-bucket-name" } } }

Image input with S3 location:

{ "modelId": "twelvelabs.marengo-embed-2-7-v1:0", "modelInput": { "inputType": "image", "mediaSource": { "s3Location": { "uri": "s3://your-image-object-s3-path", "bucketOwner": "your-image-object-s3-bucket-owner-account" } } }, "outputDataConfig": { "s3OutputDataConfig": { "s3Uri": "s3://your-bucket-name" } } }

Image input with base64 encoding:

{ "modelId": "twelvelabs.marengo-embed-2-7-v1:0", "modelInput": { "inputType": "image", "mediaSource": { "base64String": "base_64_encoded_string_of_image" } }, "outputDataConfig": { "s3OutputDataConfig": { "s3Uri": "s3://your-bucket-name" } } }

Video input with S3 location:

{ "modelId": "twelvelabs.marengo-embed-2-7-v1:0", "modelInput": { "inputType": "video", "mediaSource": { "s3Location": { "uri": "s3://your-video-object-s3-path", "bucketOwner": "your-video-object-s3-bucket-owner-account" } } }, "outputDataConfig": { "s3OutputDataConfig": { "s3Uri": "s3://your-bucket-name" } } }

Video input with base64 encoding and time range:

{ "modelId": "twelvelabs.marengo-embed-2-7-v1:0", "modelInput": { "inputType": "video", "mediaSource": { "base64String": "base_64_encoded_string_of_video" }, "startSec": 0, "lengthSec": 13, "useFixedLengthSec": 5, "embeddingOption": ["visual-text", "audio"] }, "outputDataConfig": { "s3OutputDataConfig": { "s3Uri": "s3://your-bucket-name" } } }

Audio input with S3 location:

{ "modelId": "twelvelabs.marengo-embed-2-7-v1:0", "modelInput": { "inputType": "audio", "mediaSource": { "s3Location": { "uri": "s3://your-audio-object-s3-path", "bucketOwner": "your-audio-object-s3-bucket-owner-account" } } }, "outputDataConfig": { "s3OutputDataConfig": { "s3Uri": "s3://your-bucket-name" } } }

Audio input with base64 encoding and time range:

{ "modelId": "twelvelabs.marengo-embed-2-7-v1:0", "modelInput": { "inputType": "audio", "mediaSource": { "base64String": "base_64_encoded_string_of_audio" }, "startSec": 0, "lengthSec": 13, "useFixedLengthSec": 10 }, "outputDataConfig": { "s3OutputDataConfig": { "s3Uri": "s3://your-bucket-name" } } }
Response

The following examples show response formats from the TwelveLabs Marengo Embed 2.7 model. Since this model uses StartAsyncInvoke, responses are delivered to the specified S3 Output Location in outputDataConfig.

Text embedding response:

{ "embedding": [0.123, -0.456, 0.789, ...], "embeddingOption": null, "startSec": null, "endSec": null }

Image embedding response:

{ "embedding": [0.234, -0.567, 0.890, ...], "embeddingOption": null, "startSec": null, "endSec": null }

Video embedding response (single clip):

{ "embedding": [0.345, -0.678, 0.901, ...], "embeddingOption": "visual-text", "startSec": 0.0, "endSec": 5.0 }

Video embedding response (multiple clips with different embedding types):

[ { "embedding": [0.123, -0.456, 0.789, ...], "embeddingOption": "visual-text", "startSec": 0.0, "endSec": 5.0 }, { "embedding": [0.234, -0.567, 0.890, ...], "embeddingOption": "visual-text", "startSec": 5.0, "endSec": 10.0 }, { "embedding": [0.345, -0.678, 0.901, ...], "embeddingOption": "audio", "startSec": 0.0, "endSec": 10.0 } ]

Audio embedding response (multiple clips):

[ { "embedding": [0.456, -0.789, 0.012, ...], "embeddingOption": null, "startSec": 0.0, "endSec": 10.0 }, { "embedding": [0.567, -0.890, 0.123, ...], "embeddingOption": null, "startSec": 10.0, "endSec": 13.0 } ]