Binary Quantization & Rescoring: 96% Less Memory, Faster Search

We are excited to share that several new vector quantization capabilities are now available in public preview in MongoDB Atlas Vector Search: support for binary quantized vector ingestion, automatic scalar quantization, and automatic binary quantization and rescoring.

Together with our recently released support for scalar quantized vector ingestion, these capabilities will empower developers to scale semantic search and generative AI applications more cost-effectively. For a primer on vector quantization, check out our previous blog post.

Enhanced developer experience with native quantization in Atlas Vector Search

Effective quantization methods—specifically scalar and binary quantization—can now be done automatically in Atlas Vector Search. This makes it easier and more cost-effective for developers to use Atlas Vector Search to unlock a wide range of applications, particularly those requiring over a million vectors.

With the new “quantization” index definition parameters, developers can choose to use full-fidelity vectors by specifying “none,” or they can quantize vector embeddings by specifying the desired quantization type—”scalar” or “binary” (Figure 1). This native quantization capability supports vector embeddings from any model provider as well as MongoDB’s BinData float32 vector subtype.

Figure 1: New index definition parameters for specifying automatic quantization type in
Atlas Vector Search

Screenshot of example index definition parameters

Scalar quantization—converting a float point into an integer—is generally used when it’s crucial to maintain search accuracy on par with full-precision vectors. Meanwhile, binary quantization—converting a float point into a single bit of 0 or 1—is more suitable for scenarios where storage and memory efficiency are paramount, and a slight reduction in search accuracy is acceptable. If you’re interested in learning more about this process, check out our documentation.

Binary quantization with rescoring: Balance cost and accuracy

Compared to scalar quantization, binary quantization further reduces memory usage, leading to lower costs and improved scalability—but also a decline in search accuracy. To mitigate this, when “binary” is chosen in the “quantization” index parameter, Atlas Vector Search incorporates an automatic rescoring step, which involves re-ranking a subset of the top binary vector search results using their full-precision counterparts, ensuring that the final search results are highly accurate despite the initial vector compression.

Empirical evidence demonstrates that incorporating a rescoring step when working with binary quantized vectors can dramatically enhance search accuracy, as shown in Figure 2 below.

Figure 2: Combining binary quantization and rescoring helps retain search accuracy by up to 95%

Chart showing that Scalar has the highest average recall over 50 queries and num candidates compared to float ANN, binary + rescoring, and binary.

And as Figure 3 shows, in our tests, binary quantization reduced processing memory requirement by 96% while retaining up to 95% search accuracy and improving query performance.

Figure 3: Improvements in Atlas Vector Search with the use of vector quantization

A chart showing the percentage improvements to Atlas Vector Search with the use of vector quantization.

It’s worth noting that even though the quantized vectors are used for indexing and search, their full-fidelity vectors are still stored on disk to support rescoring. Furthermore, retaining the full-fidelity vectors enables developers to perform exact vector search for experimental, high-precision use cases, such as evaluating the search accuracy of quantized vectors produced by different embedding model providers, as needed. For more on evaluating the accuracy of quantized vectors, please see our documentation.

So how can developers make the most of vector quantization? Here are some example use cases that can be made more efficient and scaled effectively with quantized vectors:

Massive knowledge bases can be used efficiently and cost-effectively for analysis and insight-oriented use cases, such as content summarization and sentiment analysis. Unstructured data like customer reviews, articles, audio, and videos can be processed and analyzed at a much larger scale, at a lower cost and faster speed.
Using quantized vectors can enhance the performance of retrieval-augmented generation (RAG) applications. The efficient processing can support query performance from large knowledge bases, and the cost-effectiveness advantage can enable a more scalable, robust RAG system, which can result in better customer and employee experience.
Developers can easily A/B test different embedding models using multiple vectors produced from the same source field during prototyping. MongoDB’s flexible document model lets developers quickly deploy and compare embedding models’ results without the need to rebuild the index or provision an entirely new data model or set of infrastructure.
The relevance of search results or context for large language models (LLMs) can be improved by incorporating larger volumes of vectors from multiple sources of relevance, such as different source fields (product descriptions, product images, etc.) embedded within the same or different models.

To get started with vector quantization in Atlas Vector Search, see the following developer resources:

Documentation: Vector Quantization in Atlas Vector Search
Documentation: How to Measure the Accuracy of Your Query Results
Tutorial: How to Use Cohere’s Quantized Vectors to Build Cost-effective AI Apps With MongoDB

Source: Read More

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

Oracle Fusion new Product Management Landing Page and AI (25B)

Oracle Fusion new Product Management Landing Page and AI (25B)

Filament Is Now Running Natively on Mobile

How Remix is shaking things up

Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

Binary Quantization & Rescoring: 96% Less Memory, Faster Search

Enhanced developer experience with native quantization in Atlas Vector Search

Binary quantization with rescoring: Balance cost and accuracy

Markus Buehler receives 2025 Washington Award

LWiAI Podcast #201 – GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

CVE-2025-3107 – “WordPress Newsletters SQL Injection Vulnerability”

How to Implement a PHP Face Recognition Login System

xAI Releases Grok 3 Beta: A Super Advanced AI Model Blending Strong Reasoning with Extensive Pretraining Knowledge

AMD is creeping into gaming territory at CES 2025. Here’s why Intel should be worried

FermiNet: Quantum physics and chemistry from first principles

Samsung Galaxy Unpacked July 2024 dates confirmed. More AI smarts are expected?

Microsoft confirms Surface announcement for later this month — teases ‘major’ news for business portfolio

North Korean Hackers Deploy FERRET Malware via Fake Job Interviews on macOS

Binary Quantization & Rescoring: 96% Less Memory, Faster Search

Enhanced developer experience with native quantization in Atlas Vector Search

Binary quantization with rescoring: Balance cost and accuracy

Related Posts