Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 20, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 20, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 20, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 20, 2025

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025

      Qualcomm’s new Adreno Control Panel will let you fine-tune the GPU for certain games on Snapdragon X Elite devices

      May 20, 2025

      Samsung takes on LG’s best gaming TVs — adds NVIDIA G-SYNC support to 2025 flagship

      May 20, 2025

      The biggest unanswered questions about Xbox’s next-gen consoles

      May 20, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      HCL Commerce V9.1 – The Power of HCL Commerce Search

      May 20, 2025
      Recent

      HCL Commerce V9.1 – The Power of HCL Commerce Search

      May 20, 2025

      Community News: Latest PECL Releases (05.20.2025)

      May 20, 2025

      Getting Started with Personalization in Sitecore XM Cloud: Enable, Extend, and Execute

      May 20, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025
      Recent

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025

      Qualcomm’s new Adreno Control Panel will let you fine-tune the GPU for certain games on Snapdragon X Elite devices

      May 20, 2025

      Samsung takes on LG’s best gaming TVs — adds NVIDIA G-SYNC support to 2025 flagship

      May 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»RAG-Check: A Novel AI Framework for Hallucination Detection in Multi-Modal Retrieval-Augmented Generation Systems

    RAG-Check: A Novel AI Framework for Hallucination Detection in Multi-Modal Retrieval-Augmented Generation Systems

    January 12, 2025

    Large Language Models (LLMs) have revolutionized generative AI, showing remarkable capabilities in producing human-like responses. However, these models face a critical challenge known as hallucination, the tendency to generate incorrect or irrelevant information. This issue poses significant risks in high-stakes applications such as medical evaluations, insurance claim processing, and autonomous decision-making systems where accuracy is most important. The hallucination problem extends beyond text-based models to vision-language models (VLMs) that process images and text queries. Despite developing robust VLMs such as LLaVA, InstructBLIP, and VILA, these systems struggle with generating accurate responses based on image inputs and user queries.

    Existing research has introduced several methods to address hallucination in language models. For text-based systems, FactScore improved accuracy by breaking long statements into atomic units for better verification. Lookback Lens developed an attention score analysis approach to detect context hallucination, while MARS implemented a weighted system focusing on crucial statement components. For RAG systems specifically, RAGAS and LlamaIndex emerged as evaluation tools, with RAGAS focusing on response accuracy and relevance using human evaluators, while LlamaIndex employs GPT-4 for faithfulness assessment. However, no existing works provide hallucination scores specifically for multi-modal RAG systems, where the contexts include multiple pieces of multi-modal data.

    Researchers from the University of Maryland, College Park, MD, and NEC Laboratories America, Princeton, NJ have proposed RAG-check, a comprehensive method to evaluate multi-modal RAG systems. It consists of three key components designed to assess both relevance and accuracy. The first component involves a neural network that evaluates the relevancy of each retrieved piece of data to the user query. The second component implements an algorithm that segments and categorizes the RAG output into scorable (objective) and non-scorable (subjective) spans. The third component utilizes another neural network to evaluate the correctness of objective spans against the raw context, which can include both text and images converted to text-based format through VLMs.

    The RAG-check architecture uses two primary evaluation metrics: the Relevancy Score (RS) and Correctness Score (CS) to evaluate different aspects of RAG system performance. For evaluating selection mechanisms, the system analyzes the relevancy scores of the top 5 retrieved images across a test set of 1,000 questions, providing insights into the effectiveness of different retrieval methods. In terms of context generation, the architecture allows for flexible integration of various model combinations either separate VLMs (like LLaVA or GPT4) and LLMs (such as LLAMA or GPT-3.5), or unified MLLMs like GPT-4. This flexibility enables a comprehensive evaluation of different model architectures and their impact on response generation quality.

    The evaluation results demonstrate significant performance variations across different RAG system configurations. When using CLIP models as vision encoders with cosine similarity for image selection, the average relevancy scores ranged from 30% to 41%. However, implementing the RS model for query-image pair evaluation dramatically improves relevancy scores to between 71% and 89.5%, though at the cost of a 35-fold increase in computational requirements when using an A100 GPU. GPT-4o emerges as the superior configuration for context generation and error rates, outperforming other setups by 20%. The remaining RAG configurations show comparable performance, with an accuracy rate between 60% and 68%.

    In conclusion, researchers RAG-check, a novel evaluation framework for multi-modal RAG systems to address the critical challenge of hallucination detection across multiple images and text inputs. The framework’s three-component architecture, comprising relevancy scoring, span categorization, and correctness assessment shows significant improvements in performance evaluation. The results reveal that while the RS model substantially enhances relevancy scores from 41% to 89.5%, it comes with increased computational costs. Among various configurations tested, GPT-4o emerged as the most effective model for context generation, highlighting the potential of unified multi-modal language models in improving RAG system accuracy and reliability.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

    The post RAG-Check: A Novel AI Framework for Hallucination Detection in Multi-Modal Retrieval-Augmented Generation Systems appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTest Automation Maintenance Costs: Smart Ways to Reduce
    Next Article Which Part of the Body Lose Weight First?

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 20, 2025
    Machine Learning

    Researchers from Renmin University and Huawei Propose MemEngine: A Unified Modular AI Library for Customizing Memory in LLM-Based Agents

    May 20, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Helldivers 2’s best new stratagem is the dumbest thing I’ve ever seen, but holy Liberty, it can wipe out armies in seconds

    Development

    CVE-2025-27533: Apache ActiveMQ Memory Allocation Bug Could Lead to Denial of Service

    Security

    TabTreeFormer: Enhancing Synthetic Tabular Data Generation Through Tree-Based Inductive Biases and Dual-Quantization Tokenization

    Machine Learning

    Challenges of Performance Testing: Insights from the Field

    Development
    Hostinger

    Highlights

    News & Updates

    Imaging startup Eyeo raises €15M to for colour-splitting sensor tech

    May 7, 2025

    Dutch startup Eyeo has secured €15M in an oversubscribed seed round to commercialise a new…

    Razer has announced its own AI to help you get good at games

    January 9, 2025

    Alabama Education Department Breach Raises Concerns About Student Data Security

    July 7, 2024

    MongoDB Powers M-DAQ’s Anti-Money Laundering Compliance Platform

    March 31, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.