Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Advancing Reliable Question Answering with the CRAG Benchmark

    Advancing Reliable Question Answering with the CRAG Benchmark

    June 11, 2024

    Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP), particularly in Question Answering (QA). However, hallucination remains a significant obstacle as LLMs may generate factually inaccurate or ungrounded responses. Studies reveal that even state-of-the-art models like GPT-4 struggle with accurately answering questions involving changing facts or less popular entities. Overcoming hallucinations is crucial for developing reliable QA systems. Retrieval-Augmented Generation (RAG) has emerged as a promising approach to address LLMs’ knowledge deficiencies, but it faces challenges like selecting relevant information, reducing latency, and synthesizing information for complex queries.

    Researchers from Meta Reality Labs, FAIR, Meta, HKUST, and HKUST (GZ) proposed a benchmark called CRAG (Comprehensive benchmark for RAG), which aims to incorporate five critical features: realism, richness, insightfulness, reliability, and longevity. It contains 4,409 diverse QA pairs from five domains, including simple fact-based and seven types of complex questions. CRAG covers varying entity popularity and temporal spans to enable insights. The questions are manually verified and paraphrased for realism and reliability. Also, CRAG provides mock APIs simulating retrieval from web pages (via Brave Search API) and mock knowledge graphs with 2.6 million entities, reflecting realistic noise. The benchmark offers three tasks to evaluate the web retrieval, structured querying, and summarisation capabilities of RAG solutions.

    A RAG QA system involves three tasks designed to evaluate the different capabilities of the systems. All tasks share the same set of (question, answer) pairs but differ in the external data accessible for retrieval to augment answer generation. Task 1 (Retrieval Summarization) provides up to five potentially relevant web pages per question to test the answer generation capability. Task 2 (KG and Web Retrieval Augmentation) further provides mock APIs to access structured data from knowledge graphs (KGs), examining the system’s ability to query structured sources and synthesize information. Task 3 is similar to Task 2, but provides 50 web pages instead of 5 as retrieval candidates, testing the system’s ability to rank and utilize a larger set of potentially noisy but more comprehensive information.

    The results and comparisons demonstrate the effectiveness of the proposed CRAG benchmark. While advanced language models like GPT-4 achieve only around 34% accuracy on CRAG, incorporating straightforward RAG improves accuracy to 44%. However, even state-of-the-art industry RAG solutions answer only 63% of questions without hallucination, struggling with facts of higher dynamism, lower popularity, or greater complexity. These evaluations highlight that CRAG has an appropriate level of difficulty and enables insights from its diverse data. The evaluations also underscore the research gaps towards developing fully trustworthy question-answering systems, making CRAG a valuable benchmark for driving further progress in this field.

    In this study, the researchers introduce CRAG, a comprehensive benchmark that aims to propel research in RAG for question-answering systems. Through rigorous empirical evaluations, CRAG exposes shortcomings in existing RAG solutions and offers valuable insights for future improvements. The benchmark’s creators plan to continuously enhance and expand CRAG to include multi-lingual questions, multi-modal inputs, multi-turn conversations, and more. This ongoing development ensures CRAG remains at the vanguard of driving RAG research, adapting to emerging challenges, and evolving to address new research needs in this rapidly progressing field. The benchmark provides a robust foundation for advancing reliable, grounded language generation capabilities.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    The post Advancing Reliable Question Answering with the CRAG Benchmark appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOmost: An AI Project that Transfors LLM Coding Capabilities into Image Composition
    Next Article Can Machines Plan Like Us? NATURAL PLAN Sheds Light on the Limits and Potential of Large Language Models

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    CVE-2025-3272 – OpenText Operations Bridge Manager Password Change Bypass

    Common Vulnerabilities and Exposures (CVEs)

    Distribuzioni GNU/Linux: 5 tra le più insolite e originali!

    Linux

    Selenium OnClick identify button C#

    Development

    Russian money-laundering network linked to drugs and ransomware disrupted, 84 arrests

    Development

    Highlights

    Are Logos Becoming Irrelevant in Modern Branding?

    January 31, 2025

    Logos are no longer the sole defining element of a brand, as dynamic branding, AI…

    Decoding the AI mind: Anthropic researchers peer inside the “black box”

    May 22, 2024

    Microsoft should extend support for Classic Outlook on Copilot+ PCs, Windows users agree

    March 19, 2025

    Créer des applications modernes plus rapidement : nouvelles fonctionnalités au MongoDB.local NYC 2024

    May 2, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.