# Overview of Amazon Titan models
<a name="titan-models"></a>

 Amazon Titan foundation models (FMs) are a family of FMs pretrained by AWS on large datasets, making them powerful, general-purpose models built to support a variety of use cases. Use them as-is or privately customize them with your own data.

Amazon Titan supports the following models for Amazon Bedrock.
+ **Amazon Titan Text Embeddings V2**
+ **Amazon Titan Multimodal Embeddings G1**
+ **Amazon Titan Image Generator G1 v2**

**Topics**
+ [Amazon Titan Text Embeddings models](titan-embedding-models.md)
+ [Amazon Titan Multimodal Embeddings G1 model](titan-multiemb-models.md)
+ [Amazon Titan Image Generator G1 model](titan-image-models.md)

# Amazon Titan Text Embeddings models
<a name="titan-embedding-models"></a>

Amazon Titan Embeddings models include Amazon Titan Text Embeddings V2 and Titan Text Embeddings G1 model.

**Note**  
Embedding models on Amazon Bedrock are throttled by Requests Per Minute (RPM), not Tokens Per Minute (TPM). When planning capacity or requesting quota increases for embedding models, use the RPM quota. For more information, see [Quotas for Amazon Bedrock](quotas.md).

Text embeddings represent meaningful vector representations of unstructured text such as documents, paragraphs, and sentences. You input a body of text and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications.

The Amazon Titan Text Embedding v2 model (`amazon.titan-embed-text-v2:0`) can intake up to 8,192 tokens or 50,000 characters and outputs a vector of 1,024 dimensions. The model is optimized for text retrieval tasks, but can also be used for additional tasks, such as semantic similarity and clustering.

Amazon Titan Embeddings models generate meaningful semantic representation of documents, paragraphs and sentences. Amazon Titan Text Embeddings takes as input a body of text and generates a (1 x n) vector. Amazon Titan Text Embeddings is offered via latency-optimized endpoint invocation for generating vectors at low latency (recommended during the retrieval step) as well as throughput optimized batch jobs for faster indexing. The actual similarity computation and retrieval are performed by your vector database, not by the embedding model. Amazon Titan Text Embeddings v2 supports long documents, however for retrieval tasks, it is recommended to segment documents into logical segments, such as paragraphs or sections.

**Note**  
Amazon Titan Text Embeddings v2 model and Titan Text Embeddings v1 model do not support inference parameters such as `maxTokenCount` or `topP`.

**Amazon Titan Text Embeddings V2 model**
+ **Model ID** – `amazon.titan-embed-text-v2:0`
+ **Max input text tokens** – 8,192
+ **Max input text characters** – 50,000
+ **Languages** – English (100\$1 languages in preview)
+ **Output vector size** – 1,024 (default), 512, 256
+ **Inference types** – On-Demand, Provisioned Throughput
+ **Supported use cases** – RAG, document search, reranking, classification, etc.

**Note**  
Titan Text Embeddings V2 takes as input a non-empty string with up to 8,192 tokens or 50,000 characters. The characters to token ratio in English is 4.7 characters per token, on average. While Titan Text Embeddings V1 and Titan Text Embeddings V2 are able to accommodate up to 8,192 tokens, it is recommended to segment documents into logical segments (such as paragraphs or sections).

The Amazon Titan Embedding Text v2 model is optimized for English, with multilingual support for the following languages. Cross-language queries (such as providing a knowledge base in Korean and querying it in German) will return sub-optimal results.
+ Afrikaans
+ Albanian
+ Amharic
+ Arabic
+ Armenian
+ Assamese
+ Azerbaijani
+ Bashkir
+ Basque
+ Belarusian
+ Bengali
+ Bosnian
+ Breton
+ Bulgarian
+ Burmese
+ Catalan
+ Cebuano
+ Chinese
+ Corsican
+ Croatian
+ Czech
+ Danish
+ Dhivehi
+ Dutch
+ English
+ Esperanto
+ Estonian
+ Faroese
+ Finnish
+ French
+ Galician
+ Georgian
+ German
+ Gujarati
+ Haitian
+ Hausa
+ Hebrew
+ Hindi
+ Hungarian
+ Icelandic
+ Indonesian
+ Irish
+ Italian
+ Japanese
+ Javanese
+ Kannada
+ Kazakh
+ Khmer
+ Kinyarwanda
+ Kirghiz
+ Korean
+ Kurdish
+ Lao
+ Latin
+ Latvian
+ Lithuanian
+ Luxembourgish
+ Macedonian
+ Malagasy
+ Malay
+ Malayalam
+ Maltese
+ Maori
+ Marathi
+ Modern Greek
+ Mongolian
+ Nepali
+ Norwegian
+ Norwegian Nynorsk
+ Occitan
+ Oriya
+ Panjabi
+ Persian
+ Polish
+ Portuguese
+ Pushto
+ Romanian
+ Romansh
+ Russian
+ Sanskrit
+ Scottish Gaelic
+ Serbian
+ Sindhi
+ Sinhala
+ Slovak
+ Slovenian
+ Somali
+ Spanish
+ Sundanese
+ Swahili
+ Swedish
+ Tagalog
+ Tajik
+ Tamil
+ Tatar
+ Telugu
+ Thai
+ Tibetan
+ Turkish
+ Turkmen
+ Uighur
+ Ukrainian
+ Urdu
+ Uzbek
+ Vietnamese
+ Waray
+ Welsh
+ Western Frisian
+ Xhosa
+ Yiddish
+ Yoruba
+ Zulu

# Amazon Titan Multimodal Embeddings G1 model
<a name="titan-multiemb-models"></a>

Amazon Titan Foundation Models are pre-trained on large datasets, making them powerful, general-purpose models. Use them as-is, or customize them by fine tuning the models with your own data for a particular task without annotating large volumes of data.

There are three types of Titan models: embeddings, text generation, and image generation.

There are two Titan Multimodal Embeddings G1 models. The Titan Multimodal Embeddings G1 model translates text inputs (words, phrases or possibly large units of text) into numerical representations (known as embeddings) that contain the semantic meaning of the text. While this model will not generate text, it is useful for applications like personalization and search. By comparing embeddings, the model will produce more relevant and contextual responses than word matching. The Multimodal Embeddings G1 model is used for use cases like searching images by text, by image for similarity, or by a combination of text and image. It translates the input image or text into an embedding that contain the semantic meaning of both the image and text in the same semantic space.

Titan Text models are generative LLMs for tasks such as summarization, text generation, classification, open-ended QnA, and information extraction. They are also trained on many different programming languages, as well as rich text format like tables, JSON, and .csv files, among other formats.

**Amazon Titan Multimodal Embeddings model G1**
+ **Model ID** – `amazon.titan-embed-image-v1`
+ **Max input text tokens** – 256
+ **Languages** – English 
+ **Max input image size** – 25 MB
+ **Output vector size** – 1,024 (default), 384, 256
+ **Inference types** – On-Demand, Provisioned Throughput
+ **Supported use cases** – Search, recommendation, and personalization.

Titan Text Embeddings V1 takes as input a non-empty string with up to 8,192 tokens and returns a 1,024 dimensional embedding. The characters to token ratio in English is 4.7 char/token, on average. Note on RAG uses cases: While Titan Text Embeddings V2 is able to accommodate up to 8,192 tokens, we recommend to segment documents into logical segments (such as paragraphs or sections). 

## Embedding length
<a name="titanmm-embedding"></a>

Setting a custom embedding length is optional. The embedding default length is 1024 characters which will work for most use cases. The embedding length can be set to 256, 384, or 1024 characters. Larger embedding sizes create more detailed responses, but will also increase the computational time. Shorter embedding lengths are less detailed but will improve the response time. 

```
    # EmbeddingConfig Shape
    {
     'outputEmbeddingLength': int // Optional, One of: [256, 384, 1024], default: 1024
    }
    
    # Updated API Payload Example
    body = json.dumps({
     "inputText": "hi",
     "inputImage": image_string,
     "embeddingConfig": { 
     "outputEmbeddingLength": 256
     }
    })
```

## Finetuning
<a name="titanmm-finetuning"></a>
+ Input to the Amazon Titan Multimodal Embeddings G1 finetuning is image-text pairs. 
+ Image formats: PNG, JPEG
+ Input image size limit: 25 MB
+ Image dimensions: min: 256 px, max: 4,096 px
+ Max number of tokens in caption: 128
+ Training dataset size range: 1000 - 500,000
+ Validation dataset size range: 8 - 50,000
+ Caption length in characters: 0 - 2,560
+ Maximum total pixels per image: 2048\$12048\$13
+ Aspect ratio (w/h): min: 0.25, max: 4

## Preparing datasets
<a name="titanmm-datasets"></a>

For the training dataset, create a `.jsonl`file with multiple JSON lines. Each JSON line contains both an `image-ref` and `caption` attributes similar to [Sagemaker Augmented Manifest format](https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html). A validation dataset is required. Auto-captioning is not currently supported.

```
   {"image-ref": "s3://bucket-1/folder1/0001.png", "caption": "some text"}
   {"image-ref": "s3://bucket-1/folder2/0002.png", "caption": "some text"}
   {"image-ref": "s3://bucket-1/folder1/0003.png", "caption": "some text"}
```

For both the training and validation datasets, you will create `.jsonl`files with multiple JSON lines.

The Amazon S3 paths need to be in the same folders where you have provided permissions for Amazon Bedrock to access the data by attaching an IAM policy to your Amazon Bedrock service role. For more information on granting an IAM policies for training data, see [Grant custom jobs access to your training data](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html#security_iam_id-based-policy-examples-model-customization).

## Hyperparameters
<a name="titanmm-hyperparameters"></a>

These values can be adjusted for the Multimodal Embeddings model hyperparameters. The default values will work well for most use cases.
+ Learning rate - (min/max learning rate) – default: 5.00E-05, min: 5.00E-08, max: 1
+ Batch size - Effective batch size – default: 576, min: 256, max: 9,216 
+ Max epochs – default: "auto", min: 1, max: 100

# Amazon Titan Image Generator G1 model
<a name="titan-image-models"></a>

Amazon Titan Image Generator G1 is an image generation model that enables users to generate and edit images in versatile ways. Users can create images that match their text-based descriptions by simply inputting natural language prompts. Furthermore, they can upload and edit existing images, including applying text-based prompts without the need for a mask, or editing specific parts of an image using an image mask. The model also supports outpainting, which extends the boundaries of an image, and inpainting, which fills in missing areas. It offers the ability to generate variations of an image based on an optional text prompt, as well as instant customization options that allow users to transfer styles using reference images or combine styles from multiple references, all without requiring any fine-tuning.

Amazon Titan Image Generator G1 v2 adds several advanced capabilities. It allows users to leverage reference images to guide image generation, where the output image aligns with the layout and composition of the reference image while still following the textual prompt. It also includes an automatic background removal feature, which can remove backgrounds from images containing multiple objects without any user input. The model provides precise control over the color palette of generated images, allowing users to preserve a brand's visual identity without the requirement for additional fine-tuning. Additionally, the subject consistency feature enables users to fine-tune the model with reference images to preserve the chosen subject (e.g., pet, shoe or handbag) in generated images. This comprehensive suite of features empowers users to unleash their creative potential and bring their imaginative visions to life.

For more information on Amazon Titan Image Generator G1 model prompt engineering guidelines, see [Amazon Titan Image Generator Prompt Engineering Best Practices](https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Image+Generator+Prompt+Engineering+Guidelines.pdf).

To continue supporting best practices in the responsible use of AI, Titan Foundation Models (FMs) are built to detect and remove harmful content in the data, reject inappropriate content in the user input, and filter the models’ outputs that contain inappropriate content (such as hate speech, profanity, and violence). The Titan Image Generator FM adds an invisible watermark and [C2PA](https://c2pa.org/) metadata to all generated images.

You can use the watermark detection feature in Amazon Bedrock console or call Amazon Bedrock watermark detection API (preview) to check whether an image contains watermark from Titan Image Generator. You can also use sites like [Content Credentials Verify](https://contentcredentials.org/verify) to check if an image was generated by Titan Image Generator.

**Amazon Titan Image Generator v2** overview
+ **Model ID** – `amazon.titan-image-generator-v2:0 `
+ **Max input characters** – 512 char
+ **Max input image size** – 5 MB (only some specific resolutions are supported)
+ **Max image size using in/outpainting, background removal, image conditioning, color palette** – 1,408 x 1,408 px
+ **Max image size using image variation** – 4,096 x 4,096 px
+ **Languages** – English
+ **Output type** – image
+ **Supported image types** – JPEG, JPG, PNG
+ **Inference types** – On-Demand, Provisioned Throughput
+ **Supported use cases** – image generation, image editing, image variations, background removal, color guided content 

## Features
<a name="titanimage-features"></a>
+ Text-to-image (T2I) generation – Input a text prompt and generate a new image as output. The generated image captures the concepts described by the text prompt.
+ Finetuning of a T2I model – Import several images to capture your own style and personalization and then fine tune the core T2I model. The fine-tuned model generates images that follow the style and personalization of a specific user.
+ Image editing options – include: inpainting, outpainting, generating variations, and automatic editing without an image mask.
+ Inpainting – Uses an image and a segmentation mask as input (either from the user or estimated by the model) and reconstructs the region within the mask. Use inpainting to remove masked elements and replace them with background pixels. 
+ Outpainting – Uses an image and a segmentation mask as input (either from the user or estimated by the model) and generates new pixels that seamlessly extend the region. Use precise outpainting to preserve the pixels of the masked image when extending the image to the boundaries. Use default outpainting to extend the pixels of the masked image to the image boundaries based on segmentation settings.
+ Image variation – Uses 1 to 5 images and an optional prompt as input. It generates a new image that preserves the content of the input image(s), but varies its style and background. 
+ Image conditioning – (V2 only) Uses an input reference image to guide image generation. The model generates output image that aligns with the layout and the composition of the reference image, while still following the textual prompt.
+ Subject consistency – (V2 only) Subject consistency allows users to fine-tune the model with reference images to preserve the chosen subject (for example, pet, shoe, or handbag) in generated images.
+ Color guided content – (V2 only) You can provide a list of hex color codes along with a prompt. A range of 1 to 10 hex codes can be provided. The image returned by Titan Image Generator G1 V2 will incorporate the color palette provided by the user.
+ Background removal – (V2 only) Automatically identifies multiple objects in the input image and removes the background. The output image has a transparent background.
+ Content provenance – Use sites like [Content Credentials Verify](https://contentcredentials.org/verify) to check if an image was generated by Titan Image Generator. This should indicate the image was generated unless the metadata has been removed.

**Note**  
if you are using a fine-tuned model, you cannot use inpainting, outpainting or color palette features of the API or the model. 

## Parameters
<a name="titanimage-parameters"></a>

For information on Amazon Amazon Titan Image Generator G1 model inference parameters, see [Amazon Titan Image Generator G1 model inference parameters](model-parameters-titan-image.md).

## Fine-tuning
<a name="titanimage-finetuning"></a>

For more information on fine-tuning the Amazon Titan Image Generator G1 model, see the following pages.
+ [Prepare data for fine-tuning your models](model-customization-prepare.md)
+ [Amazon Titan Image Generator G1 models customization hyperparameters](custom-models-hp.md#cm-hp-titan-image)

**Amazon Titan Image Generator G1 model fine-tuning and pricing**

The model uses the following example formula to calculate the total price per job:

Total Price = Steps \$1 Batch size \$1 Price per image seen

Minimum values (auto):
+ Minimum steps (auto) - 500
+ Minimum batch size - 8
+ Default learning rate - 0.00001
+ Price per image seen - 0.005

**Fine-tuning hyperparameter settings**

**Steps** – The number of times the model is exposed to each batch. There is no default step count set. You must select a number between 10 - 40,000, or a String value of "Auto." 

**Step settings - Auto** – Amazon Bedrock determines a reasonable value based on training information. Select this option to prioritize model performance over training cost. The number of steps is determined automatically. This number will typically be between 1,000 and 8,000 based on your dataset. Job costs are impacted by the number of steps used to expose the model to the data. Refer to the pricing examples section of pricing details to understand how job cost is calculated. (See example table above to see how step count is related to number of images when Auto is selected.) 

**Step settings - Custom** – You can enter the number of steps you want Bedrock to expose your custom model to the training data. This value can be between 10 and 40,000. You can reduce the cost per image produced by the model by using a lower step count value. 

**Batch size** – The number of sample processed before model parameters are updated. This value is between 8 and 192 and is a multiple of 8. 

**Learning rate** – The rate at which model parameters are updated after each batch of training data. This is a float value between 0 and 1. The learning rate is set to 0.00001 by default. 

For more information on the fine-tuning procedure, see [Submit a model customization job.](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-submit.html)

## Output
<a name="titanimage-output"></a>

The Amazon Titan Image Generator G1 model uses the output image size and quality to determine how an image is priced. The model has two pricing segments based on size: one for 512\$1512 images and another for 1024\$11024 images. Pricing is based on image size height\$1width, less than or equal to 512\$1512 or greater than 512\$1512.

For more information on Amazon Bedrock pricing, see [Amazon Bedrock Pricing.](https://aws.amazon.com/bedrock/pricing/)

## Watermark detection
<a name="titanimage-watermark"></a>

**Note**  
Watermark detection for the Amazon Bedrock console and API is available in public preview release and will only detect a watermark generated from Titan Image Generator G1. This feature is currently only available in the `us-west-2` and `us-east-1` Regions. Watermark detection is a highly accurate detection of the watermark generated by Titan Image Generator G1. Images that are modified from the original image may produce less accurate detection results.

This model adds an invisible watermark to all generated images to reduce the spread of misinformation, assist with copyright protection, and track content usage. Watermark detection is available to help you confirm whether an image was generated by the Titan Image Generator G1 model, which checks for the existence of this watermark. 

**Note**  
Watermark Detection API is in preview and is subject to change. We recommend that you create a virtual environment to use the SDK. Because watermark detection APIs aren't available in the latest SDKs, we recommend that you uninstall the latest version of the SDK from the virtual environment before installing the version with the watermark detection APIs.

You can upload your image to detect if a watermark from Titan Image Generator G1 is present on the image. Use the console to detect a watermark from this model by following the below steps.

**To detect a watermark with Titan Image Generator G1:**

1. Open the Amazon Bedrock console at [Amazon Bedrock console](https://console.aws.amazon.com/bedrock)

1. Select **Overview** from the navigation pane in Amazon Bedrock. Choose the **Build and Test** tab.

1. In the **Safeguards** section, go to **Watermark detection** and choose **View watermark detection**.

1. Select **Upload image** and locate a file that is in JPG or PNG format. The maximum file size allowed is 5 MB.

1. Once uploaded, a thumbnail of image is shown with the name, file size, and the last date modified. Select X to delete or replace image from the **Upload** section.

1. Select **Analyze** to begin watermark detection analysis.

1. The image is previewed under **Results**, and indicates if a watermark is detected with **Watermark detected** below the image and a banner across the image. If no watermark is detected, the text below the image will say **Watermark NOT detected**.

1. To load the next image, select X in the thumbnail of the image in the **Upload** section and choose a new image to analyze.

## Prompt Engineering Guidelines
<a name="titanimage-prompt"></a>

**Mask prompt** – This algorithm classifies pixels into concepts. The user can give a text prompt that will be used to classify the areas of the image to mask, based on the interpretation of the mask prompt. The prompt option can interpret more complex prompts, and encode the mask into the segmentation algorithm.

**Image mask** – You can also use an image mask to set the mask values. The image mask can be combined with prompt input for the mask to improve accuracy. The image mask file must conform to the following parameters:
+ Mask image values must be 0 (black) or 255 (white) for the mask image. The image mask area with the value of 0 will be regenerated with the image from the user prompt and/or input image.
+ The `maskImage` field must be a base64 encoded image string.
+ Mask image must have the same dimensions as the input image (same height and width).
+ Only PNG or JPG files can be used for the input image and the mask image.
+ Mask image must only use black and white pixels values.
+ Mask image can only use the RGB channels (alpha channel not supported).

For more information on Amazon Titan Image Generator prompt engineering, see [Amazon Titan Image Generator G1 models Prompt Engineering Best Practices](https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Image+Generator+Prompt+Engineering+Guidelines.pdf). 

For general prompt engineering guidelines, see [Prompt Engineering Guidelines](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html).