Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 31, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 31, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 31, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 31, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025

      How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025
      Recent

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025

      Filament Is Now Running Natively on Mobile

      May 31, 2025

      How Remix is shaking things up

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025
      Recent

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Databases»Binary Quantization & Rescoring: 96% Less Memory, Faster Search

    Binary Quantization & Rescoring: 96% Less Memory, Faster Search

    February 26, 2025

    We are excited to share that several new vector quantization capabilities are now available in public preview in MongoDB Atlas Vector Search: support for binary quantized vector ingestion, automatic scalar quantization, and automatic binary quantization and rescoring.

    Together with our recently released support for scalar quantized vector ingestion, these capabilities will empower developers to scale semantic search and generative AI applications more cost-effectively. For a primer on vector quantization, check out our previous blog post.

    Enhanced developer experience with native quantization in Atlas Vector Search

    Effective quantization methods—specifically scalar and binary quantization—can now be done automatically in Atlas Vector Search. This makes it easier and more cost-effective for developers to use Atlas Vector Search to unlock a wide range of applications, particularly those requiring over a million vectors.

    With the new “quantization” index definition parameters, developers can choose to use full-fidelity vectors by specifying “none,” or they can quantize vector embeddings by specifying the desired quantization type—”scalar” or “binary” (Figure 1). This native quantization capability supports vector embeddings from any model provider as well as MongoDB’s BinData float32 vector subtype.

    Figure 1: New index definition parameters for specifying automatic quantization type in
    Atlas Vector Search

    Screenshot of example index definition parameters

    Scalar quantization—converting a float point into an integer—is generally used when it’s crucial to maintain search accuracy on par with full-precision vectors. Meanwhile, binary quantization—converting a float point into a single bit of 0 or 1—is more suitable for scenarios where storage and memory efficiency are paramount, and a slight reduction in search accuracy is acceptable. If you’re interested in learning more about this process, check out our documentation.

    Binary quantization with rescoring: Balance cost and accuracy

    Compared to scalar quantization, binary quantization further reduces memory usage, leading to lower costs and improved scalability—but also a decline in search accuracy. To mitigate this, when “binary” is chosen in the “quantization” index parameter, Atlas Vector Search incorporates an automatic rescoring step, which involves re-ranking a subset of the top binary vector search results using their full-precision counterparts, ensuring that the final search results are highly accurate despite the initial vector compression.

    Hostinger

    Empirical evidence demonstrates that incorporating a rescoring step when working with binary quantized vectors can dramatically enhance search accuracy, as shown in Figure 2 below.

    Figure 2: Combining binary quantization and rescoring helps retain search accuracy by up to 95%

    Chart showing that Scalar has the highest average recall over 50 queries and num candidates compared to float ANN, binary + rescoring, and binary.

    And as Figure 3 shows, in our tests, binary quantization reduced processing memory requirement by 96% while retaining up to 95% search accuracy and improving query performance.

    Figure 3: Improvements in Atlas Vector Search with the use of vector quantization

    A chart showing the percentage improvements to Atlas Vector Search with the use of vector quantization.

    It’s worth noting that even though the quantized vectors are used for indexing and search, their full-fidelity vectors are still stored on disk to support rescoring. Furthermore, retaining the full-fidelity vectors enables developers to perform exact vector search for experimental, high-precision use cases, such as evaluating the search accuracy of quantized vectors produced by different embedding model providers, as needed. For more on evaluating the accuracy of quantized vectors, please see our documentation.

    So how can developers make the most of vector quantization? Here are some example use cases that can be made more efficient and scaled effectively with quantized vectors:

    • Massive knowledge bases can be used efficiently and cost-effectively for analysis and insight-oriented use cases, such as content summarization and sentiment analysis. Unstructured data like customer reviews, articles, audio, and videos can be processed and analyzed at a much larger scale, at a lower cost and faster speed.

    • Using quantized vectors can enhance the performance of retrieval-augmented generation (RAG) applications. The efficient processing can support query performance from large knowledge bases, and the cost-effectiveness advantage can enable a more scalable, robust RAG system, which can result in better customer and employee experience.

    • Developers can easily A/B test different embedding models using multiple vectors produced from the same source field during prototyping. MongoDB’s flexible document model lets developers quickly deploy and compare embedding models’ results without the need to rebuild the index or provision an entirely new data model or set of infrastructure.

    • The relevance of search results or context for large language models (LLMs) can be improved by incorporating larger volumes of vectors from multiple sources of relevance, such as different source fields (product descriptions, product images, etc.) embedded within the same or different models.

    To get started with vector quantization in Atlas Vector Search, see the following developer resources:

    • Documentation: Vector Quantization in Atlas Vector Search

    • Documentation: How to Measure the Accuracy of Your Query Results

    • Tutorial: How to Use Cohere’s Quantized Vectors to Build Cost-effective AI Apps With MongoDB

    Source: Read More

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuilding Gen AI with MongoDB & AI Partners | November 2024
    Next Article IntellectAI Unleashes AI at Scale With MongoDB

    Related Posts

    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 31, 2025
    Artificial Intelligence

    LWiAI Podcast #201 – GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

    May 31, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-3107 – “WordPress Newsletters SQL Injection Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    How to Implement a PHP Face Recognition Login System

    Development

    xAI Releases Grok 3 Beta: A Super Advanced AI Model Blending Strong Reasoning with Extensive Pretraining Knowledge

    Machine Learning

    AMD is creeping into gaming territory at CES 2025. Here’s why Intel should be worried

    News & Updates
    Hostinger

    Highlights

    Artificial Intelligence

    FermiNet: Quantum physics and chemistry from first principles

    May 29, 2025

    Using deep learning to solve fundamental problems in computational quantum chemistry and explore how matter…

    Samsung Galaxy Unpacked July 2024 dates confirmed. More AI smarts are expected?

    June 27, 2024

    Microsoft confirms Surface announcement for later this month — teases ‘major’ news for business portfolio

    January 8, 2025

    North Korean Hackers Deploy FERRET Malware via Fake Job Interviews on macOS

    February 4, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.