

# Vector ingestion
Vector ingestion

Vector ingestion helps you quickly ingest and index OpenSearch domains and OpenSearch Serverless collections. The service examines your domain or collection and creates an ingestion pipeline on your behalf to load your data into OpenSearch. The ingestion and indexing of your domain or collection are managed for you by Vector ingestion.

You can accelerate and optimize the indexing process by enabling [GPU-acceleration for vector indexing](gpu-acceleration-vector-index.md) and [Auto-optimize](serverless-auto-optimize.md) features. With Vector ingestion, you don't need to manage the underlying infrastructure, patch software, or scale clusters to support your vector database indexing and ingestion. This allows you to quickly build your vector database to meet your needs.

## How it works


Vector ingestion examines your domain or collection and their index. You can manually configure your vector index fields or allow OpenSearch to use automatic configuration.

Vector ingestion uses OpenSearch Ingestion (OSI) as the data pipeline between Amazon S3 and OpenSearch. The service processes vectors in parallel to optimize ingestion speed while respecting the scaling limits of both OSI and OpenSearch.

## OpenSearch Vector ingestion pricing


At any specific time, you only pay for the number of vector ingestion OCUs that are allocated to a pipeline, regardless of whether there's data flowing through the pipeline. OpenSearch vector ingestion immediately accommodates your workloads by scaling pipeline capacity up or down based on usage.

For full pricing details, see [Amazon OpenSearch Service Pricing](https://aws.amazon.com/opensearch-service/pricing/).

## Prerequisites


Before using vector ingestion, ensure you have the following resources:
+ Amazon S3 bucket containing your OpenSearch JSON documents in Parquet or JSONL format
+ OpenSearch resource - either a domain or collection
+ OpenSearch version `2.19` or later (required for auto-optimize integration)

## Create vector database


Use the vector ingestion job creation workflow to set up automated vector index tuning and accelerate large-scale index builds.

**Note**  
The procedural content in this section is subject to change as the user interface is finalized. The workflow may be updated in future releases to reflect the latest console experience.

**To create a vector ingestion job**

1. In the **Vector ingestion job details** section, for **Name**, enter a name for your ingestion job.

1. In the **Data source** section, configure the following:

   1. For **Amazon S3 URI**, enter the Amazon S3 bucket location containing your OpenSearch Service JSON documents.

   1. Choose **Browse Amazon S3** to select from available buckets, or choose **View** to preview the bucket contents.

   1. For **Content type**, select one of the following:
      + **Vectors** - Documents already contain vectors and doesn't require further vector embedding generation.
      + **Text, image, or audio** - Documents contain content such as text, images or audio bytes that need to be encoded into vector embeddings.

1. In the **Data source permissions** section, configure access permissions:

   1. For **IAM role**, choose one of the following:
      + **Create a new role**
      + **Use an existing role**

   1. For **IAM role name**, enter a name for the role.

1. In the **Destination** section, configure the OpenSearch Service endpoint:

   1. For **Endpoint**, choose **Choose an option** to select from your compatible domains or collections in the current region.

   1. Choose **Next** to proceed with the selected endpoint.

1. Choose **Next** to continue to the next step, or choose **Cancel** to exit without saving.

## Related features


Vector ingestion works with the following Amazon OpenSearch Service features to optimize your vector database performance:

[GPU-acceleration for vector indexing](gpu-acceleration-vector-index.md)  
GPU-acceleration reduces the time needed to create, update, and delete vector indexes. When used with vector ingestion, you can significantly accelerate the ingestion and indexing process for large-scale vector databases.

[Auto-optimize](serverless-auto-optimize.md)  
Auto-optimize automatically discovers optimal trade-offs between search latency, quality, and memory requirements. Vector ingestion can apply auto-optimize recommendations during the ingestion process to ensure your vector indexes are optimally configured.

For best results, consider enabling both GPU-acceleration and Auto-optimize when using vector ingestion to build large-scale vector databases.

# Export Amazon S3 vector index to OpenSearch Service vector engine


A point-in-time export for your selected Amazon S3 vector index to OpenSearch Service. The OpenSearch Service vector engine provides a simple and scalable vector store with advanced search functionality.

**To export Amazon S3 vector index to OpenSearch Service vector engine**

1. In the **Source** section, verify the Amazon S3 vector index details:
   + **Amazon S3 vector index** - The name of your source index
   + **Amazon S3 vector index ARN** - The Amazon Resource Name of your index

1. In the **Service access** section, configure OpenSearch Service authorization:

   1. For **Choose a method to authorize OpenSearch Service**, select one of the following:
      + **Create and use a new service role**
      + **Use an existing service role**

   1. For **Service role name**, enter a name for the service role.
**Note**  
Service role name must be 1 to 64 characters. Valid characters are a-z, A-Z, 0-9, and periods (.).

   1. Choose **View permission details** to review the required permissions.

1. Expand **Additional settings - optional** to configure advanced options if needed.

1. In the **Export details** section, configure the following options:
   + **Automate OpenSearch Service vector collection creation** - OpenSearch Service collections are used to store vector data. Serverless compute capacity is measured in OpenSearch Service Compute Units (OCUs), by default the Max OCU capacity is 50.
   + **Automate IAM role creation for service access** - This role is used by OpenSearch Service to read the Amazon S3 vector index and write to the OpenSearch Service collection.
   + **Automate OpenSearch Service ingestion pipeline creation** - OpenSearch Service ingestion pipelines are used to ingest data. An Amazon S3 bucket is created as a best practice to capture and store failed events in an Amazon S3 bucket Dead Letter Queue (DLQ), enabling easy access for troubleshooting and analysis.

1. Choose **Export** to start the export process, or choose **Cancel** to exit without exporting.

# Import Amazon S3 vector namespace to OpenSearch Service vector engine


Analyzing your vector data with OpenSearch Service requires a one-time OpenSearch Service collection and IAM permission setup.

**To import Amazon S3 vector namespace to OpenSearch Service vector engine**

1. In the **Source** section, configure the Amazon S3 vector index:

   1. For **Amazon S3 vector index ARN**, enter the ARN of your Amazon S3 vector index.
**Note**  
Must be in format arn:aws:iam::account-id:vector-bucket-name/\$1:index

1. In the **Service access** section, configure OpenSearch Service authorization:

   1. For **Choose a method to authorize OpenSearch Service**, select one of the following:
      + **Create and use a new service role**
      + **Use an existing service role**

   1. For **Service role name**, enter a name for the service role.
**Note**  
Service role name must be 1 to 64 characters. Valid characters are a-z, A-Z, 0-9, and periods (.).

   1. Choose **View permission details** to review the required permissions.

1. Expand **Additional settings - optional** to configure advanced options if needed.

1. In the **Import steps** section, configure the following automation options:
   + **Automate OpenSearch Service vector collection creation** - OpenSearch Service collections are used to store vector data. Serverless compute capacity is measured in OpenSearch Service Compute Units (OCUs), by default the Max OCU capacity is 50.
   + **Automate IAM role creation for service access** - This role is used by OpenSearch Service to read the Amazon S3 vector index and write to the OpenSearch Service collection.
   + **Automate OpenSearch Service ingestion pipeline creation** - OpenSearch Service ingestion pipelines are used to ingest data. An Amazon S3 bucket is created as a best practice to capture and store failed events in an Amazon S3 bucket Dead Letter Queue (DLQ), enabling easy access for troubleshooting and analysis.

1. Choose **Import** to start the import process, or choose **Cancel** to exit without importing.

# View vector ingestion jobs and import history


Vector ingestion jobs create a pipeline for vectorizing data sets, automating vector index tuning and accelerating large-scale index builds.

**To view vector ingestion jobs**

1. In the **Vector ingestion jobs** section, view the summary information:
   + **Jobs** - Total number of ingestion jobs
   + Choose **Create vector database** to create a new ingestion job

1. In the **Amazon S3 vectors imports** section, view the import summary:
   + **Total imports** - Number of completed imports
   + Choose **Import Amazon S3 vectors** to start a new import

1. In the **Vector ingestion jobs** table, monitor active jobs with the following information:
   + **Name** - The job name
   + **Status** - Current job status (e.g., Active)
   + **Data source** - Source location (e.g., s3://location)
   + **Destination** - Target destination
   + **Last updated** - Most recent update timestamp

1. Use the search box to **Find vector ingestion job** to locate specific jobs.

1. To manage jobs, choose from the following actions:
   + Choose **Delete** to remove selected jobs
   + Choose **Create vector database** to create additional jobs

1. In the **Amazon S3 vectors import history** section, track import events:

   1. Use the **Date range** filter to specify a time period for import history.

   1. Use the **Status** dropdown to filter by import status (e.g., Any status).

   1. Use the search box to **Find imports by Amazon S3 vector index na...** to locate specific imports.

   1. View import details including:
      + **Import initiated on (UTC\$15:30)** - When the import started
      + **Import status** - Current status (In progress, Complete, Failed, Partially complete)
      + **Amazon S3 vector index ARN** - Source index identifier
      + **OpenSearch Service vector collection** - Destination collection

1. Choose **Import Amazon S3 vector** to start a new import process.