TwelveLabs Marengo Embed 2.7 request parameters TwelveLabs Marengo Embed 2.7 response fields TwelveLabs Marengo Embed 2.7 request and response

TwelveLabs Marengo Embed 2.7

The TwelveLabs Marengo Embed 2.7 model generates embeddings from video, text, audio, or image inputs. These embeddings can be used for similarity search, clustering, and other machine learning tasks. The model supports asynchronous inference through the StartAsyncInvoke API.

Provider — TwelveLabs
Categories — Embeddings, multimodal
Model ID — twelvelabs.marengo-embed-2-7-v1:0
Input modality — Video, Text, Audio, Image
Output modality — Embeddings
Max video size — 2 hours long video (< 2GB file size)

TwelveLabs Marengo Embed 2.7 request parameters

The following table describes the input parameters for the TwelveLabs Marengo Embed 2.7 model:

TwelveLabs Marengo Embed 2.7 request parameters
Field	Type	Required	Description
`inputType`	string	Yes	Modality for the embedding. Valid values: `video`, `text`, `audio`, `image`.
`inputText`	string	No	Text to be embedded when `inputType` is `text`. Required if `inputType` is `text`. Text input is not available by S3 URI but only by the `inputText` field.
`startSec`	double	No	The start offset in seconds from the beginning of the video or audio where processing should begin. Specifying 0 means starting from the beginning of the media. Default: 0, Min: 0.
`lengthSec`	double	No	The length in seconds of the video or audio where the processing would take from `startSec`. Default: media duration, Max: media duration.
`useFixedLengthSec`	double	No	For `audio` or `video` inputs only. The desired fixed duration in seconds for each clip for which the platform generates an embedding. Min: 2, Max: 10. If missing, for video: segments are divided dynamically by shot boundary detection; for audio: segments are divided evenly to be closest to 10 seconds (so if it's a 50 second clip then it will be 5 segments with 10 seconds each, but if it's a 16 second clip it will be 2 segments 8 seconds each).
`textTruncate`	string	No	For `text` input only. Specifies how the platform truncates text that exceeds 77 tokens. Valid values: `end` (truncate the end of the text), `none` (return an error if text exceeds limit). Default: `end`.
`embeddingOption`	list	No	For `video` input only. Specifies which types of embeddings to retrieve. Valid values: `visual-text` (visual embeddings optimized for text search), `visual-image` (visual embeddings optimized for image search), `audio` (audio embeddings). If not provided, all available embeddings are returned.
`mediaSource`	object	No	Describes the media source. Required for input types: `image`, `video`, and `audio`.
`mediaSource.base64String`	string	No	Base64 encoded byte string for the media. Max: 36MB. Either `base64String` or `s3Location` must be provided if `mediaSource` is used.
`mediaSource.s3Location.uri`	string	No	S3 URI where the media could be downloaded from. For video, max: 2 hours long (< 2GB file size). Required if using `s3Location`.
`mediaSource.s3Location.bucketOwner`	string	No	AWS account ID of the bucket owner.
`minClipSec`	int	No	For `video` input only. Set a minimum clip second. Note: `useFixedLengthSec` should be larger than this value. Default: 4, Min: 1, Max: 5.

TwelveLabs Marengo Embed 2.7 response fields

The following table describes the output fields for the TwelveLabs Marengo Embed 2.7 model:

TwelveLabs Marengo Embed 2.7 response fields
Field	Type	Description
`embedding`	List of doubles	Embedding values
`embeddingOption`	string	The type of embeddings for multi-vector output (only applicable for video). Valid values: `visual-text` (visual embeddings closely aligned with text embeddings), `visual-image` (visual embeddings closely aligned with image embeddings), `audio` (audio embeddings).
`startSec`	double	The start offset of the clip. Not applicable for text and image embeddings.
`endSec`	double	The end offset of the clip. Not applicable for text and image embeddings.

TwelveLabs Marengo Embed 2.7 request and response

The following examples show how to use the TwelveLabs Marengo Embed 2.7 model with different input types. Note that TwelveLabs Marengo Embed 2.7 uses the StartAsyncInvoke API for processing.

Request

The following examples show request formats for the TwelveLabs Marengo Embed 2.7 model using the StartAsyncInvoke API.

Text input:


{
    "modelId": "twelvelabs.marengo-embed-2-7-v1:0",
    "modelInput": {
        "inputType": "text",
        "inputText": "Spiderman flies through a street and catches a car with his web"
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

Image input with S3 location:


{
    "modelId": "twelvelabs.marengo-embed-2-7-v1:0",
    "modelInput": {
        "inputType": "image",
        "mediaSource": {
            "s3Location": {
                "uri": "s3://your-image-object-s3-path",
                "bucketOwner": "your-image-object-s3-bucket-owner-account"
            }
        }
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

Image input with base64 encoding:


{
    "modelId": "twelvelabs.marengo-embed-2-7-v1:0",
    "modelInput": {
        "inputType": "image",
        "mediaSource": {
            "base64String": "base_64_encoded_string_of_image"
        }
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

Video input with S3 location:


{
    "modelId": "twelvelabs.marengo-embed-2-7-v1:0",
    "modelInput": {
        "inputType": "video",
        "mediaSource": {
            "s3Location": {
                "uri": "s3://your-video-object-s3-path",
                "bucketOwner": "your-video-object-s3-bucket-owner-account"
            }
        }
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

Video input with base64 encoding and time range:


{
    "modelId": "twelvelabs.marengo-embed-2-7-v1:0",
    "modelInput": {
        "inputType": "video",
        "mediaSource": {
            "base64String": "base_64_encoded_string_of_video"
        },
        "startSec": 0,
        "lengthSec": 13,
        "useFixedLengthSec": 5,
        "embeddingOption": ["visual-text", "audio"]
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

Audio input with S3 location:


{
    "modelId": "twelvelabs.marengo-embed-2-7-v1:0",
    "modelInput": {
        "inputType": "audio",
        "mediaSource": {  
            "s3Location": { 
                "uri": "s3://your-audio-object-s3-path", 
                "bucketOwner": "your-audio-object-s3-bucket-owner-account" 
            }
        }
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

Audio input with base64 encoding and time range:


{
    "modelId": "twelvelabs.marengo-embed-2-7-v1:0",
    "modelInput": {
        "inputType": "audio", 
        "mediaSource": { 
            "base64String": "base_64_encoded_string_of_audio"
        },
        "startSec": 0,
        "lengthSec": 13,
        "useFixedLengthSec": 10
    },
    "outputDataConfig": {
        "s3OutputDataConfig": {
            "s3Uri": "s3://your-bucket-name"
        }
    }
}

Response

The following examples show response formats from the TwelveLabs Marengo Embed 2.7 model. Since this model uses StartAsyncInvoke, responses are delivered to the specified S3 Output Location in outputDataConfig.

Text embedding response:


{
    "embedding": [0.123, -0.456, 0.789, ...],
    "embeddingOption": null,
    "startSec": null,
    "endSec": null
}

Image embedding response:


{
    "embedding": [0.234, -0.567, 0.890, ...],
    "embeddingOption": null,
    "startSec": null,
    "endSec": null
}

Video embedding response (single clip):


{
    "embedding": [0.345, -0.678, 0.901, ...],
    "embeddingOption": "visual-text",
    "startSec": 0.0,
    "endSec": 5.0
}

Video embedding response (multiple clips with different embedding types):


[
    {
        "embedding": [0.123, -0.456, 0.789, ...],
        "embeddingOption": "visual-text",
        "startSec": 0.0,
        "endSec": 5.0
    },
    {
        "embedding": [0.234, -0.567, 0.890, ...],
        "embeddingOption": "visual-text",
        "startSec": 5.0,
        "endSec": 10.0
    },
    {
        "embedding": [0.345, -0.678, 0.901, ...],
        "embeddingOption": "audio",
        "startSec": 0.0,
        "endSec": 10.0
    }
]

Audio embedding response (multiple clips):


[
    {
        "embedding": [0.456, -0.789, 0.012, ...],
        "embeddingOption": null,
        "startSec": 0.0,
        "endSec": 10.0
    },
    {
        "embedding": [0.567, -0.890, 0.123, ...],
        "embeddingOption": null,
        "startSec": 10.0,
        "endSec": 13.0
    }
]

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

TwelveLabs Pegasus 1.2

Writer AI Palmyra models