# GPU-acceleration for vector indexing
GPU-acceleration for vector indexing

GPU-acceleration helps you build large-scale vector databases faster and more efficiently. You can enable this feature on new or existing OpenSearch domains and OpenSearch Serverless collections. This feature uses GPU-acceleration to reduce the time needed to index data into vector indexes.

With GPU-acceleration, you can increase vector indexing speed by up to 10X at a quarter of the indexing cost.

## Prerequisites


GPU-acceleration is supported on OpenSearch domains running OpenSearch version `3.1` or later, and OpenSearch Serverless collections. For more information, see [Upgrading Amazon OpenSearch Service domains](version-migration.md), [UpdateDomainConfig](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_UpdateDomainConfig.html), and [UpdateCollection](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_UpdateCollection.html) APIs.

## How it works


Vector indexes require significant compute resources to build data structures such as Hierarchical Navigable Small Worlds (HNSW) graphs. When you enable GPU-acceleration on your domain or collection, OpenSearch automatically detects opportunities to accelerate your index builds and offloads the index builds to GPU instances. OpenSearch Service manages the GPU instances on your behalf, assigning them to your domain or collection when needed. This means you don't manage utilization or pay for idle time.

You pay only for useful processing through Compute Units (OCU) - Vector Acceleration. Each Vector Acceleration OCU is a combination of approximately 8 GiB of CPU memory, 2 vCPUs, and 6 GiB of GPU memory. For more information, see [GPU Acceleration Pricing](#gpu-acceleration-pricing).

To enable GPU acceleration for your domain or collection, see [Enabling GPU-acceleration](gpu-acceleration-enabling.md).

## GPU Acceleration Pricing


AWS charges you when OpenSearch detects opportunities to accelerate your domain's or collection's index build workloads. Each Vector Acceleration OCU is a combination of approximately 8 GiB of CPU memory, 2 vCPUs, and 6 GiB of GPU memory.

AWS bills OCU with second-level granularity. In your account statement, you'll see an entry for compute in OCU-hours.

For example, when you use GPU-acceleration for one hour to create an index, using 2 vCPU and 1 GiB of GPU memory, you're billed 1 OCU. If you use 9 GiB of CPU memory while using GPU-acceleration, you're billed 2 OCU.

OpenSearch Serverless adds additional OCUs in increments of 1 OCU based on the compute power and storage needed to support your collections. You can configure a maximum number of OCUs for your account in order to control costs.

**Note**  
The number of OCUs provisioned at any time can vary and isn't exact. Over time, the algorithm that OpenSearch and OpenSearch Serverless uses will continue to improve in order to better minimize system usage.

For full pricing details, see [Amazon OpenSearch Service Pricing](https://aws.amazon.com/opensearch-service/pricing/).

## GPU-acceleration and write operations


GPU-acceleration is activated when OpenSearch's vector ingestion rate (MB/sec) is within a range. On OpenSearch domains, you have the flexibility to [configure this range](https://docs.opensearch.org/3.2/vector-search/remote-index-build/#using-the-remote-index-build-service) through `index.knn.remote_index_build.size.min` and `index.knn.remote_index_build.size.max`. For example, with the lower range default of 50 MB, writing 15,000 full-precision vectors with 768 dimension between [refresh intervals](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html#bp-perf) will trigger GPU-acceleration by default.

Data is written with the following API operations:
+ [Flush](https://docs.opensearch.org/latest/api-reference/index-apis/flush/)
+ [Bulk](https://docs.opensearch.org/latest/api-reference/document-apis/bulk/)
+ [Reindex](https://docs.opensearch.org/latest/api-reference/document-apis/reindex/)
+ [Index](https://docs.opensearch.org/latest/api-reference/index-apis/index/)
+ [Update](https://docs.opensearch.org/latest/api-reference/document-apis/update-document/)
+ [Delete](https://docs.opensearch.org/latest/api-reference/document-apis/delete-document/)
+ [Force Merge](https://docs.opensearch.org/latest/api-reference/index-apis/force-merge/)

GPU-acceleration is activated with both automatic and [manual](https://docs.opensearch.org/latest/api-reference/index-apis/force-merge/) segment merges.

## Supported index configurations


The [Faiss](https://docs.opensearch.org/latest/field-types/supported-field-types/knn-methods-engines/#faiss-engine) engine supports GPU-acceleration.

The following configurations do not support GPU-acceleration:
+ [Faiss product quantization](https://docs.opensearch.org/latest/vector-search/optimizing-storage/faiss-product-quantization/)
+ [Inverted File Index (IVF)](https://docs.opensearch.org/latest/field-types/supported-field-types/knn-methods-engines/#ivf-parameters)
+ [Non-Metric Space Library](https://docs.opensearch.org/latest/field-types/supported-field-types/knn-methods-engines/#nmslib-engine-deprecated)
+ [Lucene engine](https://docs.opensearch.org/latest/field-types/supported-field-types/knn-methods-engines/#lucene-engine)

## Supported AWS Regions


GPU-acceleration is available in the following AWS Regions:
+ US East (N. Virginia)
+ US West (Oregon)
+ Asia Pacific (Sydney)
+ Asia Pacific (Tokyo)
+ Europe (Ireland)

# Enabling GPU-acceleration
Enable GPU-acceleration

You can enable GPU-acceleration when creating or updating an OpenSearch domain or OpenSearch Serverless collection with the AWS Management Console, AWS CLI, or AWS SDK.

Once you enable GPU-acceleration on your domain or collection, this feature is enabled by default on all indexes. If you need to disable this feature at the index level, see [Creating GPU-accelerated vector indexes](gpu-acceleration-creating-indexes.md).

## Console


The following procedures enable GPU-acceleration for OpenSearch domains and OpenSearch Serverless collections using the OpenSearch Serverless management console.

------
#### [ Create new domain ]

To create an OpenSearch domain with GPU-acceleration enabled, see [Creating OpenSearch Service domains](createupdatedomains.md#createdomains).

------
#### [ Edit existing domain ]

1. Open the [OpenSearch Service](https://console.aws.amazon.com/aos/home ) management console.

1. In the navigation pane, choose **Domains**.

1. Choose your domain name to open the domain details page.

1. Choose **Actions**, then **Edit domain**.

1. In the **Advanced features** section, select **Enable GPU acceleration**. Once this feature is enabled, your vector indexing operations are [accelerated](gpu-acceleration-vector-index.md#gpu-acceleration-write-operations).

1. Choose **Save changes**.

------
#### [ Create new collection ]

To create an OpenSearch Serverless collection with GPU-acceleration enabled, see [Tutorial: Getting started with Amazon OpenSearch Serverless](serverless-getting-started.md). During collection creation, ensure you select the **Vector search** collection type and enable GPU-acceleration in the vector search configuration.

------
#### [ Edit existing collection ]

1. Open the [OpenSearch Service](https://console.aws.amazon.com/aos/home ) management console.

1. In the navigation pane, choose **Collections**.

1. Choose your collection name to open the collection details page.

1. In the **Deployment options** section, **Edit** Vector GPU acceleration.

1. Disable or enable GPU acceleration.

1. Choose **Save changes**.

------

### AWS CLI


------
#### [ Create new domain ]

The following AWS CLI example creates an OpenSearch domain with GPU-acceleration enabled in US East (N. Virginia). Replace the *text* with that of your own configuration.

```
aws opensearch create-domain \
    --domain-name my-domain \
    --engine-version OpenSearch_3.1 \
    --cluster-config InstanceType=r6g.xlarge.search,\
        InstanceCount=1,\
        DedicatedMasterEnabled=true,\
        DedicatedMasterCount=3,\
        DedicatedMasterType=m6g.large.search \
    --ebs-options "EBSEnabled=true,\
        VolumeType=gp3,\
        VolumeSize=2000" \
    --encryption-at-rest-options '{"Enabled":true}' \
    --aiml-options '{"ServerlessVectorAcceleration": {"Enabled": true}}' \
    --node-to-node-encryption-options '{"Enabled":true}' \
    --domain-endpoint-options '{"EnforceHTTPS":true,\
        "TLSSecurityPolicy":"Policy-Min-TLS-1-0-2019-07"}' \
    --access-policies '{"Version": "2012-10-17",		 	 	 
        "Statement": [{
            "Effect": "Allow",
            "Principal": {"AWS": "*"},
            "Action": "es:*",
            "Resource": "arn:aws:es:us-east-1:123456789012:domain/my-domain/*"
        }]}' \
    --advanced-security-options '{
        "Enabled":true,
        "InternalUserDatabaseEnabled":true,
        "MasterUserOptions": {
            "MasterUserName":"USER_NAME",
            "MasterUserPassword":"PASSWORD"
        }}' \
    --region us-east-1
```

------
#### [ Edit existing domain ]

The following AWS CLI example enables GPU-acceleration for an existing OpenSearch domain. Replace the *text* with that of your own configuration.

```
aws opensearch update-domain-config \
    --domain-name my-domain \
    --cluster-config InstanceType=r7g.16xlarge.search,InstanceCount=3 \
    --aiml-options '{"ServerlessVectorAcceleration": {"Enabled": true}}'
```

------
#### [ Create new collection ]

The following AWS CLI example creates an OpenSearch Serverless collection with GPU-acceleration enabled in US East (N. Virginia). Replace the *text* with that of your own configuration.

```
aws opensearchserverless create-collection \
    --name "my-collection" \
    --type "VECTORSEARCH" \
    --description "My vector collection with GPU acceleration" \
    --vector-options '{"ServerlessVectorAcceleration": "ENABLED"}' \
    --region us-east-1
```

------
#### [ Edit existing collection ]

The following AWS CLI example enables GPU-acceleration for an existing OpenSearch Serverless collection. Replace the *text* with that of your own configuration.

```
aws opensearchserverless update-collection \
    --id 07tjusf2h91cunochc \
    --vector-options '{"ServerlessVectorAcceleration": "ENABLED"}' \
    --region us-east-1
```

------

# Creating GPU-accelerated vector indexes
Create GPU-accelerated vector indexes

After enabling GPU-acceleration on your domain or collection, create vector indexes that can take advantage of GPU processing.

**Note**  
When you create a domain with GPU-acceleration enabled, the `index.knn.remote_index_build.enabled` setting is `true` by default. You don't need to explicitly set this setting when creating indexes. For collections, you must explicitly specify a value for this setting.

------
#### [ Creating index with GPU-acceleration ]

The following example creates a vector index optimized for GPU processing. This index stores 768-dimensional vectors (common for text embeddings).

```
PUT my-vector-index
{
  "settings": {
    "index.knn": true,
    "index.knn.remote_index_build.enabled": true
  },
  "mappings": {
    "properties": {
      "vector_field": {
        "type": "knn_vector",
        "dimension": 768
      },
      "text": {
        "type": "text"
      }
    }
  }
}
```

Key configuration elements:
+ `"index.knn": true` - Enables k-nearest neighbor functionality
+ `"index.knn.remote_index_build.enabled": true` - Enables GPU processing for this index. When the domain has GPU-acceleration enabled, not specifying this setting defaults to `true`. For collections, you must explicitly specify a value for this setting.
+ `"dimension": 768` - Specifies vector size (adjust based on your embedding model)

------
#### [ Creating index without GPU-acceleration ]

The following example creates a vector index where GPU processing is disabled. This index stores 768-dimensional vectors (common for text embeddings).

```
PUT my-vector-index
{
  "settings": {
    "index.knn": true,
    "index.knn.remote_index_build.enabled": false
  },
  "mappings": {
    "properties": {
      "vector_field": {
        "type": "knn_vector",
        "dimension": 768
      },
      "text": {
        "type": "text"
      }
    }
  }
}
```

------

# Indexing vector data and force-merging
Index and force-merge

Once you've created a GPU-accelerated vector index on your domain or collection, you can add vector data and optimize your index using standard OpenSearch operations. GPU-acceleration automatically enhances both indexing performance and force-merge operations, making it faster to build and maintain large-scale vector search applications without requiring changes to your existing workflows.

## Indexing vector data


Index vector data as you normally would. The GPU-acceleration automatically applies to indexing and force-merge operations. The following example demonstrates how to add vector documents to your index using the [bulk](https://docs.opensearch.org/latest/api-reference/document-apis/bulk/#index) API. Each document contains a vector field with numerical values and associated text content:

```
POST _bulk
{"index": {"_index": "my-vector-index"}}
{"vector_field": [0.1, 0.2, 0.3, ...], "text": "Sample document 1"}
{"index": {"_index": "my-vector-index"}}
{"vector_field": [0.4, 0.5, 0.6, ...], "text": "Sample document 2"}
```

### Force-merge operations


GPU-acceleration also applies to [force-merge](https://docs.opensearch.org/latest/api-reference/index-apis/force-merge/) operations, which can significantly reduce the time required to optimize vector indexes. Note that force-merge operations aren't supported on collections. The following example demonstrates how to optimize your vector index by consolidating all segments into a single segment:

```
POST my-vector-index/_forcemerge?max_num_segments=1
```

## Best practices


Follow these best practices to maximize the benefits of GPU-acceleration for your vector search workloads:
+ **Increase index clients** - To take full advantage of GPUs during the index build, increase the number of index clients that are ingesting data into OpenSearch. This allows for better parallelization and utilization of GPU resources.
+ **Adjust approximate threshold** - Change the `index.knn.advanced.approximate_threshold` setting to ensure that smaller segment index builds are not happening, which improves the overall speed of ingestion. A value of 10,000 is a good starting point. For collections, you must explicitly specify a value for this setting.
+ **Optimize shard size** - Try creating shards that have at least 1 million documents. Shards with fewer than this number of documents may not see overall benefits from GPU-acceleration.