Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 20, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 20, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 20, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 20, 2025

      GPT-5 should have a higher “degree of scientific certainty” than the current ChatGPT — but with less model switching

      May 20, 2025

      Elon Musk’s Grok 3 AI coming to Azure proves Satya Nadella’s allegiance isn’t to OpenAI, but to maximizing Microsoft’s profit gains by heeding consumer demands

      May 20, 2025

      One of the most promising open-world RPGs in years is releasing next week on Xbox and PC

      May 20, 2025

      NVIDIA’s latest driver fixes some big issues with DOOM: The Dark Ages

      May 20, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Community News: Latest PECL Releases (05.20.2025)

      May 20, 2025
      Recent

      Community News: Latest PECL Releases (05.20.2025)

      May 20, 2025

      Getting Started with Personalization in Sitecore XM Cloud: Enable, Extend, and Execute

      May 20, 2025

      Universal Design and Global Accessibility Awareness Day (GAAD)

      May 20, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      GPT-5 should have a higher “degree of scientific certainty” than the current ChatGPT — but with less model switching

      May 20, 2025
      Recent

      GPT-5 should have a higher “degree of scientific certainty” than the current ChatGPT — but with less model switching

      May 20, 2025

      Elon Musk’s Grok 3 AI coming to Azure proves Satya Nadella’s allegiance isn’t to OpenAI, but to maximizing Microsoft’s profit gains by heeding consumer demands

      May 20, 2025

      One of the most promising open-world RPGs in years is releasing next week on Xbox and PC

      May 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques

    Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques

    November 18, 2024

    In recent times, Retrieval-augmented generation (RAG) has become popular due to its ability to solve challenges using Large Language Models, such as hallucinations and outdated training data. A RAG pipeline consists of two components: a retriever and a reader. The retriever component finds useful information from an exterior knowledge base, which is then included alongside a query in a prompt for the reader model. This process has been used as an effective alternative to expensive fine-tuning as it helps to reduce errors made by LLMs. However, it is unclear how much each part of an RAG pipeline contributes to its performance on specific tasks.

    Currently, retrieval models use Dense vector embedding models due to their better performance than older methods as they rely on word frequencies. These models use nearest-neighbor search algorithms to find documents matching a query, with most dense retrievers encoding each document as a single vector. Advanced multi-vector models like ColBERT allow better interactions between document and query terms, potentially generalizing better to new datasets. However, dense vector embeddings are inefficient, especially with high-dimensional data, slowing down searches in large databases. The RAG pipelines use an approximate nearest neighbor (ANN) search to improve this by sacrificing some accuracy for faster results. However, no clear guidance exists on configuring ANN search to balance speed and accuracy.

    A group of researchers from the University of Colorado Boulder and Intel Labs conducted detailed research on optimizing RAG pipelines for common tasks such as Question Answering (QA). Focusing on understanding the impact of retrieval on downstream performance in RAG pipelines, pipelines were evaluated in which the retriever and LLM components were separately trained. It was found that the approach avoids the high resource costs of end-to-end training and clarifies the retriever’s contribution.

    Experiments were conducted to evaluate the performance of two instruction-tuned LLMs, LLaMA and Mistral, in Retrieval-Augmented Generation (RAG) pipelines without fine-tuning or further training. The evaluation mainly focused on standard QA and attributed QA tasks, where models generated answers using retrieved documents, and it included specific document citations in the case of attributed QA. Dense retrieval models such as BGE-base and ColBERTv2 were used to leverage efficient ANN search for dense embeddings. The tested datasets included ASQA, QAMPARI, and Natural Questions (NQ), designed to assess retrieval and generation capabilities. Retrieval metrics relied on recall (retriever and search recall), while QA accuracy was measured using exact match recall, and established frameworks assessed citation quality through citation recall and precision. Confidence intervals were computed using bootstrapping to determine statistical significance across various queries.

    After evaluating the performance, the researchers found that retrieval generally improves performance, with ColBERT slightly outperforming BGE by a small margin. The analysis showed optimal correctness with 5-10 retrieved documents for Mistral, and 4-10 for LLaMA was achieved depending on the dataset. Notably, adding a citation prompt only significantly impacted results when the number of retrieved documents (k) exceeded 10. For some documents, the citation precision was highest, and adding more led to too many citations. Including gold documents greatly improved QA performance, and lowering the search recall from 1.0 to 0.7 had only a small impact. Thus, the researchers found that reducing the accuracy of the approximate nearest neighbor (ANN) search in the retriever has minimal effects on task performance. Adding noise to retrieval results also leads to a decline in performance. And the configuration was not found to surpass the gold standard. 

    In conclusion, this research provided useful insights on improving retrieval strategies for RAG pipelines and highlighted the importance of retrievers in boosting performance and efficiency, especially for QA tasks. It also showed that injecting noisy documents alongside gold or retrieved documents degrades correctness compared to the gold ceiling. In the future, the generality of this research’s findings can be tested in other settings and can serve as a baseline for future research in the field of RAG pipelines!


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions– From Framework to Production

    The post Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDBgDel: Database-Enhanced Gene Deletion Framework for Growth-Coupled Production in Genome-Scale Metabolic Models
    Next Article Kinetix: An Open-Ended Universe of Physics-based Tasks for Reinforcement Learning

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 20, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 20, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    IBM’s next generation Granite models are now available

    Tech & Work

    Google researchers successfully found a zero-day vulnerability using LLM assisted vulnerability detection

    Development

    Automated Design of Agentic Systems(ADAS): A New Research Problem that Aims to Invent Novel Building Blocks and Design Powerful Agentic Systems Automatically

    Development

    CVE-2025-24346 – CtrlX OS Proxy Environment Variable Manipulation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-24270 – Apple macOS Network Information Leakage Vulnerability

    April 29, 2025

    CVE ID : CVE-2025-24270

    Published : April 29, 2025, 3:15 a.m. | 3 hours, 40 minutes ago

    Description : This issue was addressed by removing the vulnerable code. This issue is fixed in macOS Sequoia 15.4, tvOS 18.4, macOS Ventura 13.7.5, iPadOS 17.7.6, macOS Sonoma 14.7.5, iOS 18.4 and iPadOS 18.4, visionOS 2.4. An attacker on the local network may be able to leak sensitive user information.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Critical Flaws in Traccar GPS System Expose Users to Remote Attacks

    August 29, 2024

    Apps in Generative AI – Transforming the Digital Experience

    May 17, 2025

    DXC transforms data exploration for their oil and gas customers with LLM-powered tools

    November 18, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.