Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»RAG-Check: A Novel AI Framework for Hallucination Detection in Multi-Modal Retrieval-Augmented Generation Systems

    RAG-Check: A Novel AI Framework for Hallucination Detection in Multi-Modal Retrieval-Augmented Generation Systems

    January 12, 2025

    Large Language Models (LLMs) have revolutionized generative AI, showing remarkable capabilities in producing human-like responses. However, these models face a critical challenge known as hallucination, the tendency to generate incorrect or irrelevant information. This issue poses significant risks in high-stakes applications such as medical evaluations, insurance claim processing, and autonomous decision-making systems where accuracy is most important. The hallucination problem extends beyond text-based models to vision-language models (VLMs) that process images and text queries. Despite developing robust VLMs such as LLaVA, InstructBLIP, and VILA, these systems struggle with generating accurate responses based on image inputs and user queries.

    Existing research has introduced several methods to address hallucination in language models. For text-based systems, FactScore improved accuracy by breaking long statements into atomic units for better verification. Lookback Lens developed an attention score analysis approach to detect context hallucination, while MARS implemented a weighted system focusing on crucial statement components. For RAG systems specifically, RAGAS and LlamaIndex emerged as evaluation tools, with RAGAS focusing on response accuracy and relevance using human evaluators, while LlamaIndex employs GPT-4 for faithfulness assessment. However, no existing works provide hallucination scores specifically for multi-modal RAG systems, where the contexts include multiple pieces of multi-modal data.

    Researchers from the University of Maryland, College Park, MD, and NEC Laboratories America, Princeton, NJ have proposed RAG-check, a comprehensive method to evaluate multi-modal RAG systems. It consists of three key components designed to assess both relevance and accuracy. The first component involves a neural network that evaluates the relevancy of each retrieved piece of data to the user query. The second component implements an algorithm that segments and categorizes the RAG output into scorable (objective) and non-scorable (subjective) spans. The third component utilizes another neural network to evaluate the correctness of objective spans against the raw context, which can include both text and images converted to text-based format through VLMs.

    The RAG-check architecture uses two primary evaluation metrics: the Relevancy Score (RS) and Correctness Score (CS) to evaluate different aspects of RAG system performance. For evaluating selection mechanisms, the system analyzes the relevancy scores of the top 5 retrieved images across a test set of 1,000 questions, providing insights into the effectiveness of different retrieval methods. In terms of context generation, the architecture allows for flexible integration of various model combinations either separate VLMs (like LLaVA or GPT4) and LLMs (such as LLAMA or GPT-3.5), or unified MLLMs like GPT-4. This flexibility enables a comprehensive evaluation of different model architectures and their impact on response generation quality.

    The evaluation results demonstrate significant performance variations across different RAG system configurations. When using CLIP models as vision encoders with cosine similarity for image selection, the average relevancy scores ranged from 30% to 41%. However, implementing the RS model for query-image pair evaluation dramatically improves relevancy scores to between 71% and 89.5%, though at the cost of a 35-fold increase in computational requirements when using an A100 GPU. GPT-4o emerges as the superior configuration for context generation and error rates, outperforming other setups by 20%. The remaining RAG configurations show comparable performance, with an accuracy rate between 60% and 68%.

    Hostinger

    In conclusion, researchers RAG-check, a novel evaluation framework for multi-modal RAG systems to address the critical challenge of hallucination detection across multiple images and text inputs. The framework’s three-component architecture, comprising relevancy scoring, span categorization, and correctness assessment shows significant improvements in performance evaluation. The results reveal that while the RS model substantially enhances relevancy scores from 41% to 89.5%, it comes with increased computational costs. Among various configurations tested, GPT-4o emerged as the most effective model for context generation, highlighting the potential of unified multi-modal language models in improving RAG system accuracy and reliability.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

    The post RAG-Check: A Novel AI Framework for Hallucination Detection in Multi-Modal Retrieval-Augmented Generation Systems appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTest Automation Maintenance Costs: Smart Ways to Reduce
    Next Article Which Part of the Body Lose Weight First?

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    What software do you use to test desktop applications on macOS?

    Development

    Kobiton Delivers for Mobile Developers with Support for iOS 18 Beta

    Development

    CVE-2025-47729 – TeleMessage End-to-End Encryption Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-44885 – Fortinet Wireless Access Point Stack Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    The Importance of Content Moderation in Salesforce Communities

    February 12, 2025

    Content moderation is a key component of online communities, ensuring the platform remains safe, respectful,…

    Going Viral on Pinterest to Get 350K Followers

    January 29, 2025

    Microsoft is adding Clock to Windows 11 Calendar flyout after removing it in Windows 10

    April 15, 2025

    Fota Wildlife Park Confirms Cyberattack, Investigates Data Exposure

    August 30, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.