Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 19, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 19, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 19, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 19, 2025

      Computex

      May 19, 2025

      DOOM: The Dark Ages gets Path Tracing update in June, bringing better visuals for PC players

      May 19, 2025

      Early Memorial Day deals are LIVE on Windows PCs, gaming accessories, and more — 6 hand-picked discounts on our favorites

      May 19, 2025

      Microsoft open sources the Windows Subsystem for Linux — invites developers to help more seamlessly integrate Linux with Windows

      May 19, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      How JavaScript’s at() method makes array indexing easier

      May 19, 2025
      Recent

      How JavaScript’s at() method makes array indexing easier

      May 19, 2025

      Motherhood and Career Balance in Tech: Stories from Perficient LATAM

      May 19, 2025

      ES6: Set Vs Array- What and When?

      May 19, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Computex

      May 19, 2025
      Recent

      Computex

      May 19, 2025

      DOOM: The Dark Ages gets Path Tracing update in June, bringing better visuals for PC players

      May 19, 2025

      Early Memorial Day deals are LIVE on Windows PCs, gaming accessories, and more — 6 hand-picked discounts on our favorites

      May 19, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques

    Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques

    November 18, 2024

    In recent times, Retrieval-augmented generation (RAG) has become popular due to its ability to solve challenges using Large Language Models, such as hallucinations and outdated training data. A RAG pipeline consists of two components: a retriever and a reader. The retriever component finds useful information from an exterior knowledge base, which is then included alongside a query in a prompt for the reader model. This process has been used as an effective alternative to expensive fine-tuning as it helps to reduce errors made by LLMs. However, it is unclear how much each part of an RAG pipeline contributes to its performance on specific tasks.

    Currently, retrieval models use Dense vector embedding models due to their better performance than older methods as they rely on word frequencies. These models use nearest-neighbor search algorithms to find documents matching a query, with most dense retrievers encoding each document as a single vector. Advanced multi-vector models like ColBERT allow better interactions between document and query terms, potentially generalizing better to new datasets. However, dense vector embeddings are inefficient, especially with high-dimensional data, slowing down searches in large databases. The RAG pipelines use an approximate nearest neighbor (ANN) search to improve this by sacrificing some accuracy for faster results. However, no clear guidance exists on configuring ANN search to balance speed and accuracy.

    A group of researchers from the University of Colorado Boulder and Intel Labs conducted detailed research on optimizing RAG pipelines for common tasks such as Question Answering (QA). Focusing on understanding the impact of retrieval on downstream performance in RAG pipelines, pipelines were evaluated in which the retriever and LLM components were separately trained. It was found that the approach avoids the high resource costs of end-to-end training and clarifies the retriever’s contribution.

    Experiments were conducted to evaluate the performance of two instruction-tuned LLMs, LLaMA and Mistral, in Retrieval-Augmented Generation (RAG) pipelines without fine-tuning or further training. The evaluation mainly focused on standard QA and attributed QA tasks, where models generated answers using retrieved documents, and it included specific document citations in the case of attributed QA. Dense retrieval models such as BGE-base and ColBERTv2 were used to leverage efficient ANN search for dense embeddings. The tested datasets included ASQA, QAMPARI, and Natural Questions (NQ), designed to assess retrieval and generation capabilities. Retrieval metrics relied on recall (retriever and search recall), while QA accuracy was measured using exact match recall, and established frameworks assessed citation quality through citation recall and precision. Confidence intervals were computed using bootstrapping to determine statistical significance across various queries.

    After evaluating the performance, the researchers found that retrieval generally improves performance, with ColBERT slightly outperforming BGE by a small margin. The analysis showed optimal correctness with 5-10 retrieved documents for Mistral, and 4-10 for LLaMA was achieved depending on the dataset. Notably, adding a citation prompt only significantly impacted results when the number of retrieved documents (k) exceeded 10. For some documents, the citation precision was highest, and adding more led to too many citations. Including gold documents greatly improved QA performance, and lowering the search recall from 1.0 to 0.7 had only a small impact. Thus, the researchers found that reducing the accuracy of the approximate nearest neighbor (ANN) search in the retriever has minimal effects on task performance. Adding noise to retrieval results also leads to a decline in performance. And the configuration was not found to surpass the gold standard. 

    In conclusion, this research provided useful insights on improving retrieval strategies for RAG pipelines and highlighted the importance of retrievers in boosting performance and efficiency, especially for QA tasks. It also showed that injecting noisy documents alongside gold or retrieved documents degrades correctness compared to the gold ceiling. In the future, the generality of this research’s findings can be tested in other settings and can serve as a baseline for future research in the field of RAG pipelines!


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions– From Framework to Production

    The post Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDBgDel: Database-Enhanced Gene Deletion Framework for Growth-Coupled Production in Genome-Scale Metabolic Models
    Next Article Kinetix: An Open-Ended Universe of Physics-based Tasks for Reinforcement Learning

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 20, 2025
    Development

    February 2025 Baseline monthly digest

    May 19, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    MIT students’ works redefine human-AI collaboration

    Artificial Intelligence

    PhotoPrism 250223: La Nuova Versione 2025 con Interfaccia Rinnovata e Miglioramenti Prestazionali

    Linux

    Finally! A robot vacuum that can climb stairs (well, kind of)

    News & Updates

    Re-Highlight – powerful syntax highlighter

    Linux
    Hostinger

    Highlights

    ⚡ THN Weekly Recap: iOS Zero-Days, 4Chan Breach, NTLM Exploits, WhatsApp Spyware & More Security

    ⚡ THN Weekly Recap: iOS Zero-Days, 4Chan Breach, NTLM Exploits, WhatsApp Spyware & More

    April 21, 2025

    ⚡ THN Weekly Recap: iOS Zero-Days, 4Chan Breach, NTLM Exploits, WhatsApp Spyware & More

    Cybersecurity / Hacking News
    Can a harmless click really lead to a full-blown cyberattack?
    Surprisingly, yes — and that’s exactly what we saw in last week’s activity. Hackers are getting better at hid …
    Read more

    Published Date:
    Apr 21, 2025 (11 hours, 46 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-1093

    CVE-2025-3278

    CVE-2025-2492

    CVE-2025-32433

    CVE-2025-31201

    CVE-2025-31200

    CVE-2025-24859

    CVE-2025-24054

    CVE-2024-43451

    CVE-2021-20035

    How to fetch header values from columns that are not visible in ag-grid using python selenium

    July 1, 2024

    From AI trainers to ethicists: AI may obsolete some jobs but generate new ones

    June 17, 2024

    I Found a New Lazy Way to Make Money Online using AI (That Not Many Talked About)

    May 30, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.