$向量搜索 - Amazon DocumentDB

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

$向量搜索

8.0 版的新增内容

弹性集群不支持。

Amazon DocumentDB 中的$vectorSearch运算符允许您执行矢量搜索,这是机器学习中使用的一种方法,通过使用距离或相似度指标比较它们的矢量表示来查找相似的数据点。该功能将基于 JSON 的文档数据库的灵活性和丰富的查询功能与矢量搜索的强大功能相结合,使您能够构建机器学习和生成式 AI 用例,例如语义搜索、产品推荐等。

参数

  • <exact>(可选):指定是运行精确最近邻 (ENN) 还是近似最近邻 (ANN) 搜索的标志。值可以是以下值之一:

  • false-运行 ANN 搜索

  • true-运行 ENN 搜索

如果省略或设置为 false,numCandidates则为必填项。

- `<index>` : Name of the Vector Search index to use. - `<limit>` : Number of documents to return in the results. - `<numCandidates>` (optional): This field is required if 'exact' is false or omitted. Number of nearest neighbors to use during the search. Value must be less than or equal to (<=) 10000. You can't specify a number less than the number of documents to return ('limit'). - `<path>` : Indexed vector type field to search. - `<queryVector>` : Array of numbers that represent the query vector.

示例(MongoDB 外壳)

以下示例演示如何使用$vectorSearch运算符根据其向量表示法查找相似的产品描述。

创建示例文档

db.products.insertMany([ { _id: 1, name: "Product A", description: "A high-quality, eco-friendly product for your home.", description_vector: [ 0.2, 0.5, 0.8 ] }, { _id: 2, name: "Product B", description: "An innovative and modern kitchen appliance.", description_vector: [0.7, 0.3, 0.9] }, { _id: 3, name: "Product C", description: "A comfortable and stylish piece of furniture.", description_vector: [0.1, 0.2, 0.4] } ]);

创建矢量搜索索引

db.runCommand( { createIndexes: "products", indexes: [{ key: { "description_vector": "vector" }, vectorOptions: { type: "hnsw", dimensions: 3, similarity: "cosine", m: 16, efConstruction: 64 }, name: "description_index" }] } );

查询示例

db.products.aggregate([ { $vectorSearch: { index: "description_index", limit: 2, numCandidates: 10, path: "description_vector", queryVector: [0.1, 0.2, 0.3] } } ]);

输出

[ { "_id": 1, "name": "Product A", "description": "A high-quality, eco-friendly product for your home.", "description_vector": [ 0.2, 0.5, 0.8 ] }, { "_id": 3, "name": "Product C", "description": "A comfortable and stylish piece of furniture.", "description_vector": [ 0.1, 0.2, 0.4 ] } ]

代码示例

要查看使用该$vectorSearch命令的代码示例,请选择要使用的语言的选项卡:

Node.js
const { MongoClient } = require('mongodb'); async function findSimilarProducts(queryVector) { const client = await MongoClient.connect('mongodb://<username>:<password>@<cluster-endpoint>:27017/?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false'); const db = client.db('test'); const collection = db.collection('products'); const result = await collection.aggregate([ { $vectorSearch: { index: "description_index", limit: 2, numCandidates: 10, path: "description_vector", queryVector: queryVector } } ]).toArray(); console.log(result); client.close(); } findSimilarProducts([0.1, 0.2, 0.3]);
Python
from pymongo import MongoClient def find_similar_products(query_vector): client = MongoClient('mongodb://<username>:<password>@<cluster-endpoint>:27017/?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false') db = client.test collection = db.products result = list(collection.aggregate([ { '$vectorSearch': { 'index': "description_index", 'limit': 2, 'numCandidates': 10, 'path': "description_vector", 'queryVector': query_vector } } ])) print(result) client.close() find_similar_products([0.1, 0.2, 0.3])