With its ability to efficiently handle high-dimensional, unstructured data, vector search delivers relevant results even when users don’t know what they’re looking for and uses machine learning models to find similar results across any data type. Rapidly emerging as a key technology for modern applications, vector search empowers developers to build next-generation search and generative AI applications faster and easier.
MongoDB Atlas Vector Search goes beyond the approximate nearest neighbor (ANN) methods with the introduction of exact nearest neighbor (ENN) vector search. This innovative capability guarantees retrieval of the absolute closest vectors to your query, eliminating the accuracy limitations inherent in ANN. In sum, ENN vector search can help you unleash a new level of precision for your search and generative AI applications, improving benchmarking and moving to production faster.
When exact nearest neighbor (ENN) vector search benefits developers
While ANN shines in searching across large datasets, ENN vector search offers advantages in specific scenarios:
Small-scale vector data: For datasets under 10,000 vectors, the linear time complexity of ENN vector search makes it a viable option, especially considering the added development complexity of tuning ANN parameters.
Recall benchmarking of ANN queries: ANN queries are fast, particularly as the scale of your indexed vectors increases, but it may not be easy to know whether the retrieved documents by vector relevance correspond to the guaranteed closest vectors in your index. Using ENN can help provide that exact result set for comparison with your approximate result set, using jaccard similarity or other rank-aware recall metrics. This will allow you to have much greater confidence that your ANN queries are accurate since you can build quantitative benchmarks as your data evolves.
Multi-tenant architectures: Imagine a scenario with millions of vectors categorized by tenants. You might search for the closest vectors within a specific tenant (identified by a tenant ID). In cases where the overall vector collection is large (in the millions) but the number of vectors per tenant is small (a few thousand), ANN’s accuracy suffers when applying highly selective filters. ENN vector search thrives in this multi-tenant scenario, delivering precise results even with small result sets.
Example use cases
The small dataset size allows for exhaustive search within a reasonable timeframe, making exact nearest neighbor approach a viable option for finding the most similar data point, improving accuracy confidence in a number of use cases, such as:
Multi-tenant data service: You might be building a business providing an agentic service that understands your customers’ data and takes actions on their behalf. When retrieving relevant proprietary data for that agent, it is critical that the right metadata filter be applied and that ENN be executed to retrieve the right sets of documents only corresponding to the appropriate data tenant IDs.
Proof of concept development: For instance, a new recommendation engine might have a limited library compared to established ones. Here, ENN vector search can be used to recommend products to a small set of early adopters. Since the data is limited, an exhaustive search becomes practical, ensuring the user gets the most relevant recommendations from the available options.
How ENN vector search works on MongoDB Atlas
The ENN vector search feature in Atlas integrates seamlessly with the existing $vectorSearch stage within your Atlas aggregation pipelines.
Its key characteristics include:
Guaranteed accuracy: Unlike ANN, ENN always returns the closest vectors to your query, adhering to the specified limit.
Eventual consistency: Similar to approximate vector search, ENN vector search follows an eventual consistency model.
Simplified configuration: Unlike approximate vector search, where tuning numCandidates is crucial, ENN vector search only requires specifying the desired limit of returned vectors.
Scalable recall evaluation: Atlas allows querying a large number of indexed vectors, facilitating the calculation of comprehensive recall sets for effective evaluation.
Fast query execution: ENN vector search query execution can maintain sub-second latency for unfiltered queries up to 10,000 documents. It can also provide low-latency responses for highly selective filters that restrict a broad set of documents into 10,000 documents or less, ordered by vector relevance.
Build more with ENN vector search
ENN vector search can be a powerful tool when building a proof of concept for retrieval-augmented generation (RAG), semantic search, or recommendation systems powered by vector search. It simplifies the developer experience by minimizing overhead complexity and latency while giving you the flexibility to implement and benchmark precise retrieval.
Explore more use cases and build applications faster, start experimenting with ENN vector search.
Source: Read More