Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models

    THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models

    May 12, 2024

    Understanding and mitigating hallucinations in vision-language models (VLVMs) is an emerging field of research that addresses the generation of coherent but factually incorrect responses by these advanced AI systems. As VLVMs increasingly integrate text and visual inputs to generate responses, the accuracy of these outputs becomes crucial, especially in settings where precision is paramount, such as medical diagnostics or autonomous driving.

    Hallucinations in VLVMs typically manifest as plausible yet incorrect details generated about an image. These inaccuracies pose significant risks, potentially misinforming decisions in critical applications. The challenge lies in detecting these errors and developing methods to mitigate them effectively, ensuring the reliability of VLVM outputs.

    Most existing benchmarks for evaluating hallucinations in VLVMs focus on responses to constrained query formats, such as yes/no questions about specific objects or attributes within an image. These benchmarks often fail to measure more complex, open-ended hallucinations that can occur in varied real-world applications. As a result, there is a significant gap in the ability to fully understand and mitigate the broader spectrum of hallucinations that VLVMs can produce.

    Researchers from the University of Oxford, AWS AI Labs, introduced a new framework called THRONE (Text-from-image Hallucination Recognition with Object-probes for open-ended Evaluation) to address this gap. THRONE is designed to assess Type I hallucinations, those that occur in response to open-ended prompts requiring detailed image descriptions. Unlike previous methods, THRONE uses publicly available language models to evaluate the hallucinations in free-form responses generated by various VLVMs, offering a more comprehensive and rigorous approach.

    THRONE leverages multiple metrics to measure hallucinations across different VLVMs quantitatively. For example, it employs precision and recall metrics alongside a class-wise F0.5 score, emphasizing precision twice as much as recall. This scoring is particularly relevant in scenarios where false positives, incorrect but plausible responses, are more detrimental than false negatives.

    An evaluation of THRONE’s effectiveness revealed insightful data about the prevalence and characteristics of hallucinations in current VLVMs. Despite the framework’s advanced approach, the results indicate that many VLVMs still struggle with a high rate of hallucinations. For instance, the framework detected that some of the evaluated models produce responses, with about 20% of the objects mentioned being hallucinations. This high rate of inaccuracies underscores the persistent challenge of reducing hallucinations and improving the reliability of VLVM outputs.

    In conclusion, the THRONE framework represents a significant step forward in evaluating hallucinations in vision-language models, particularly addressing the complex issue of Type I hallucinations in free-form responses. While existing benchmarks have struggled to effectively measure these more nuanced errors, THRONE utilizes a novel combination of publicly available language models and a robust metric system, including precision, recall, and class-wise F0.5 scores. Despite these advances, the high rate of detected hallucinations, around 20% in some models, underscores the ongoing challenges and the necessity for further research to enhance the accuracy and reliability of VLVMs in practical applications.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    The post THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLeveraging Linguistic Expertise in NLP: A Deep Dive into RELIES and Its Impact on Large Language Models
    Next Article Safe Marine Navigation Using Vision AI: Enhancing Maritime Safety and Efficiency

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    I thought coding was dead after Anthropic’s CEO claimed that AI may take over within 12 months — I was wrong, and AI thinks so, too

    News & Updates

    11 Best Mobile App Development Tools for React Native in 2025

    Web Development

    Update Chrome and Firefox now to patch these critical security flaws

    News & Updates

    Git security vulnerabilities announced

    News & Updates

    Highlights

    Development

    Fast Flux is the New Cyber Weapon—And It’s Hard to Stop, Warns CISA

    April 7, 2025

    The U.S. Cybersecurity and Infrastructure Security Agency (CISA), alongside the National Security Agency (NSA), the…

    Windows Security in 2025: Battling Sophisticated Threats with Advanced Defenses

    May 2, 2025

    LLMLean: An AI Tool that Integrates LLMs and Lean for Tactic Suggestions and Proof Completion

    August 1, 2024

    LWiAI Podcast #174 – Odyssey Text-to-Video, Groq LLM Engine, OpenAI Security Issues, and More!

    July 27, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.