Solicitud y respuesta Solicitud y respuesta para distintos tipos de entrada Ejemplos de código

Cohere Embed v4

Cohere Embed v4 es un modelo de incrustación multimodal que admite entradas de texto e imágenes. Puede procesar contenido de texto e imagen intercalado, lo que lo hace ideal para aplicaciones de comprensión de documentos, búsqueda visual y recuperación multimodal. El modelo admite varios tipos de incrustación, incluidos los formatos float, int8, uint8, binary y ubinary, con dimensiones de salida configurables de 256 a 1536.

El ID de modelo de Cohere Embed v4 es cohere.embed-v4.

Notas de uso adicionales

Longitud del contexto: se admiten hasta 128 000 tokens aproximadamente; en el caso de RAG, los fragmentos más pequeños suelen mejorar la recuperación y el costo.
Tamaño de la imagen: las imágenes de más de 2 458 624 píxeles se reducen a ese tamaño; las imágenes de menos de 3136 píxeles se amplían.
Entradas intercaladas: es preferible usar inputs.content[] para contenido multimodal de tipo página, de modo que el contexto del texto (por ejemplo, el nombre del archivo o las entidades) se mantenga asociado a la imagen.

Temas

Solicitud y respuesta
Solicitud y respuesta para distintos tipos de entrada
Ejemplos de código

Solicitud y respuesta

Request

Content-type: application/json


{
  "input_type": "search_document | search_query | classification | clustering",
  "texts": ["..."],                      // optional; text-only
  "images": ["data:<mime>;base64,..."],  // optional; image-only
  "inputs": [
    { "content": [
        { "type": "text",      "text": "..." },
        { "type": "image_url", "image_url": "data:<mime>;base64,..." }
      ]
    }
  ],                                     // optional; mixed (interleaved) text+image
  "embedding_types": ["float" | "int8" | "uint8" | "binary" | "ubinary"],
  "output_dimension": 256 | 512 | 1024 | 1536,
  "max_tokens": 128000,
  "truncate": "NONE | LEFT | RIGHT"
}

Parámetros

input_type (obligatorio): añade tokens especiales para distinguir los casos de uso. Se permiten: search_document, search_query, classification, clustering. Para búsqueda y RAG, incruste su corpus con search_document y las consultas con search_query.
texts (opcional): matriz de cadenas que se van a incrustar. Máximo de 96 por llamada. Si usa texts, no envíe images en la misma llamada.
images (opcional): matriz de imágenes en base64 con URI de datos que se van a incrustar. Máximo de 96 por llamada. No envíe texts y images juntos. (Use inputs para intercalar).
inputs (opcional; modalidad mixta/fusionada): lista en la que cada elemento tiene una lista de contenido de partes. Cada parte es { "type": "text", "text": ... } o { "type": "image_url", "image_url": "data:<mime>;base64,..." }. Envíe aquí contenido intercalado de tipo página (por ejemplo, una imagen de una página en PDF más un subtítulo o metadatos). Máximo de 96 elementos.
embedding_types (opcional): uno o más de: float, int8, uint8, binary, ubinary. Si se omite, devuelve incrustaciones flotantes.
output_dimension (opcional): sirve para seleccionar la longitud del vector. Permitidos: 256, 512, 1024, 1536 (el valor predeterminado es 1536 si no se especifica un valor).
max_tokens (opcional): presupuesto de truncamiento por objeto de entrada. El modelo admite hasta aproximadamente 128 000 tokens; si es necesario, se pueden usar fragmentos más pequeños para RAG.
truncate (opcional): cómo gestionar entradas demasiado largas: LEFT elimina los tokens del inicio, RIGHT los elimina del final y NONE devuelve un error si la entrada supera el límite.

Límites y tamaño

Elementos por solicitud: hasta 96 imágenes. El tipo de archivo de imagen original debe estar en formato png, jpeg, webp o gif y puede tener un tamaño máximo de 5 MB.
Tamaño máximo de la solicitud: carga útil total de aproximadamente 20 MB.
Número máximo de tokens de entrada: 128 000 tokens como máximo. Los archivos de imagen se convierten en tokens y el total de tokens debe ser inferior a 128 000.
Imágenes: un máximo de 2 458 624 píxeles antes de reducir el tamaño de la imagen; las imágenes de menos de 3136 píxeles se amplían. Proporcione las imágenes como data:<mime>;base64,....
Contabilidad de tokens (por elemento inputs): tokens de una entrada de imagen ≈ (píxeles de imagen ÷ 784) x 4 tokens de una entrada de texto e imagen intercaladas = (píxeles de imagen ÷ 784) x 4 + (tokens de texto)

Consejo: En el caso de los archivos PDF, convierta cada página en una imagen y envíela a través de inputs junto con los metadatos de la página (p. ej., file_name, entidades) en partes de texto adyacentes.

Response

Content-type: application/json

Si ha solicitado un único tipo de incrustación (por ejemplo, solo float):


{
"id": "string",
"embeddings": [[ /* length = output_dimension */ ]],
"response_type": "embeddings_floats",
"texts": ["..."], // present if text was provided
"inputs": [ { "content": [ ... ] } ] // present if 'inputs' was used
}

Si ha solicitado varios tipos de incrustación (por ejemplo, ["float","int8"]):


{
  "id": "string",
  "embeddings": {
    "float": [[ ... ]],
    "int8":  [[ ... ]]
  },
  "response_type": "embeddings_by_type",
  "texts": ["..."],     // when text used
  "inputs": [ { "content": [ ... ] } ] // when 'inputs' used
}

El número de vectores devueltos coincide con la longitud de la matriz texts o el número de elementos inputs.
La longitud de cada vector es igual a output_dimension (1536 de forma predeterminada).

Solicitud y respuesta para distintos tipos de entrada

A) Página intercalada (imagen + pie de foto) con vectores int8 compactos

Solicitud


{
  "input_type": "search_document",
  "inputs": [
    {
      "content": [
        { "type": "text", "text": "Quarterly ARR growth chart; outlier in Q3." },
        { "type": "image_url", "image_url": "data:image/png;base64,{{BASE64_PAGE_IMG}}" }
      ]
    }
  ],
  "embedding_types": ["int8"],
  "output_dimension": 512,
  "truncate": "RIGHT",
  "max_tokens": 128000
}

Respuesta (truncada)


{
  "id": "836a33cc-61ec-4e65-afaf-c4628171a315",
  "embeddings": { "int8": [[ 7, -3, ... ]] },
  "response_type": "embeddings_by_type",
  "inputs": [
    { "content": [
      { "type": "text", "text": "Quarterly ARR growth chart; outlier in Q3." },
      { "type": "image_url", "image_url": "data:image/png;base64,{{...}}" }
    ] }
  ]
}

B) Indexación de corpus solo de texto (float por defecto, 1536 dimensiones)

Solicitud


{
  "input_type": "search_document",
  "texts": [
    "RAG system design patterns for insurance claims",
    "Actuarial loss triangles and reserving primer"
  ]
}

Respuesta (ejemplo)


{
  "response_type": "embeddings_floats",
  "embeddings": [
    [0.0135, -0.0272, ...],   // length 1536
    [0.0047,  0.0189, ...]
  ]
}

Ejemplos de código

Text input


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate embeddings using the Cohere Embed v4 model.
"""
import json
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text_embeddings(model_id, body, region_name):
    """
    Generate text embedding by using the Cohere Embed model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
        region_name (str): The AWS region to invoke the model on
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating text embeddings with the Cohere Embed model %s", model_id)

    accept = '*/*'
    content_type = 'application/json'

    bedrock = boto3.client(service_name='bedrock-runtime', region_name=region_name)

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type
    )

    logger.info("Successfully generated embeddings with Cohere model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere Embed example.
    """

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
    
    region_name = 'us-east-1'

    model_id = 'cohere.embed-v4:0'
    text1 = "hello world"
    text2 = "this is a test"
    input_type = "search_document"
    embedding_types = ["float"]

    try:
        body = json.dumps({
            "texts": [
                text1,
                text2],
            "input_type": input_type,
            "embedding_types": embedding_types
        })
        
        response = generate_text_embeddings(model_id=model_id, body=body, region_name=region_name)

        response_body = json.loads(response.get('body').read())

        print(f"ID: {response_body.get('id')}")
        print(f"Response type: {response_body.get('response_type')}")

        print("Embeddings")
        embeddings = response_body.get('embeddings')
        for i, embedding_type in enumerate(embeddings):
            print(f"\t{embedding_type} Embeddings:")
            print(f"\t{embeddings[embedding_type]}")

        print("Texts")
        for i, text in enumerate(response_body.get('texts')):
            print(f"\tText {i}: {text}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(
            f"Finished generating text embeddings with Cohere model {model_id}.")


if __name__ == "__main__":
    main()

Mixed modalities


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate image embeddings using the Cohere Embed v4 model.
"""
import json
import logging
import boto3
import base64


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

def get_base64_image_uri(image_file_path: str, image_mime_type: str):
    with open(image_file_path, "rb") as image_file:
        image_bytes = image_file.read()
        base64_image = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{image_mime_type};base64,{base64_image}"


def generate_embeddings(model_id, body, region_name):
    """
    Generate image embedding by using the Cohere Embed model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
        region_name (str): The AWS region to invoke the model on
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating image embeddings with the Cohere Embed model %s", model_id)

    accept = '*/*'
    content_type = 'application/json'

    bedrock = boto3.client(service_name='bedrock-runtime', region_name=region_name)

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type
    )

    logger.info("Successfully generated embeddings with Cohere model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere Embed example.
    """

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
    
    region_name = 'us-east-1'

    image_file_path = "image.jpg"
    image_mime_type = "image/jpg"
    text = "hello world"

    model_id = 'cohere.embed-v4:0'
    input_type = "search_document"
    image_base64_uri = get_base64_image_uri(image_file_path, image_mime_type)
    embedding_types = ["int8","float"]

    try:
        body = json.dumps({
            "inputs": [
                {
                  "content": [
                    { "type": "text", "text": text },
                    { "type": "image_url", "image_url": "data:image/png;base64,{{image_base64_uri}}" }
                  ]
                }
              ],
            "input_type": input_type,
            "embedding_types": embedding_types
        })
        
        response = generate_embeddings(model_id=model_id, body=body, region_name=region_name)

        response_body = json.loads(response.get('body').read())

        print(f"ID: {response_body.get('id')}")
        print(f"Response type: {response_body.get('response_type')}")

        print("Embeddings")
        embeddings = response_body.get('embeddings')
        for i, embedding_type in enumerate(embeddings):
            print(f"\t{embedding_type} Embeddings:")
            print(f"\t{embeddings[embedding_type]}")

        print("inputs")
        for i, input in enumerate(response_body.get('inputs')):
            print(f"\tinput {i}: {input}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(
            f"Finished generating embeddings with Cohere model {model_id}.")


if __name__ == "__main__":
    main()

Aviso JavaScript está desactivado o no está disponible en su navegador.

Para utilizar la documentación de AWS, debe estar habilitado JavaScript. Para obtener más información, consulte las páginas de ayuda de su navegador.

Convenciones del documento

Modelos Cohere Embed y Cohere Embed v4

Cohere Embed v3