Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 17, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 17, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 17, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 17, 2025

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025

      Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

      May 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025
      Recent

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025

      Apps in Generative AI – Transforming the Digital Experience

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025
      Recent

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

    Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

    August 12, 2024

    Large Language Models (LLMs) have gained significant attention for their versatility, but their factualness remains a critical concern. Studies have revealed that LLMs can produce nonfactual, hallucinated, or outdated information, undermining reliability. Current evaluation methods, such as fact-checking and fact-QA, face several challenges. Fact-checking struggles to assess the factualness of generated content, while fact-QA encounters difficulties scaling up evaluation data due to expensive annotation processes. Both approaches also face the risk of data contamination from web-crawled pretraining corpora. Also, LLMs often respond inconsistently to the same fact when presented in different forms, a challenge is that existing evaluation datasets need to be equipped to address.

    Existing attempts to evaluate LLMs’ knowledge primarily use specific datasets, but face challenges like data leakage, static content, and limited metrics. Knowledge graphs (KGs) offer advantages in customization, evolving knowledge, and reduced test set leakage. Methods like LAMA and LPAQA use KGs for evaluation but struggle with unnatural question formats and impracticality for large KGs. KaRR overcomes some issues but remains inefficient for large graphs and lacks generalizability. Current approaches focus on accuracy over reliability, failing to address LLMs’ inconsistent responses to the same fact. Also, no existing work visualizes LLMs’ knowledge using KGs, presenting an opportunity for improvement. These limitations highlight the need for more comprehensive and efficient methods to evaluate and understand LLMs’ knowledge retention and accuracy.

    Researchers from Apple introduced KGLENS, an innovative knowledge probing framework that has been developed to measure knowledge alignment between KGs and LLMs and identify LLMs’ knowledge blind spots. The framework employs a Thompson sampling-inspired method with a parameterized knowledge graph (PKG) to probe LLMs efficiently. KGLENS features a graph-guided question generator that converts KGs into natural language using GPT-4, designing two types of questions (fact-checking and fact-QA) to reduce answer ambiguity. Human evaluation shows that 97.7% of generated questions are sensible to annotators.

    KGLENS employs a unique approach to efficiently probe LLMs’ knowledge using a PKG and Thompson sampling-inspired method. The framework initializes a PKG where each edge is augmented with a beta distribution, indicating the LLM’s potential deficiency on that edge. It then samples edges based on their probability, generates questions from these edges, and examines the LLM through a question-answering task. The PKG is updated based on the results, and this process iterates until convergence. Also, This framework features a graph-guided question generator that converts KG edges into natural language questions using GPT-4. It creates two types of questions: Yes/No questions for judgment and Wh-questions for generation, with the question type controlled by the graph structure. Entity aliases are included to reduce ambiguity.

    For answer verification, KGLENS instructs LLMs to generate specific response formats and employs GPT-4 to check the correctness of responses for Wh-questions. The framework’s efficiency is evaluated through various sampling methods, demonstrating its effectiveness in identifying LLMs’ knowledge blind spots across diverse topics and relationships.

    KGLENS evaluation across various LLMs reveals that the GPT-4 family consistently outperforms other models. GPT-4, GPT-4o, and GPT-4-turbo show comparable performance, with GPT-4o being more cautious with personal information. A significant gap exists between GPT-3.5-turbo and GPT-4, with GPT-3.5-turbo sometimes performing worse than legacy LLMs due to its conservative approach. Legacy models like Babbage-002 and Davinci-002 show only slight improvement over random guessing, highlighting the progress in recent LLMs. The evaluation provides insights into different error types and model behaviors, demonstrating the varying capabilities of LLMs in handling diverse knowledge domains and difficulty levels.

    KGLENS introduces an efficient method for evaluating factual knowledge in LLMs using a Thompson sampling-inspired approach with parameterized Knowledge Graphs. The framework outperforms existing methods in revealing knowledge blind spots and demonstrates adaptability across various domains. Human evaluation confirms its effectiveness, achieving 95.7% accuracy. KGLENS and its assessment of KGs will be made available to the research community, fostering collaboration. For businesses, this tool facilitates the development of more reliable AI systems, enhancing user experiences and improving model knowledge. KGLENS represents a significant advancement in creating more accurate and dependable AI applications.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleVulScribeR: A Large Language Model-Based Approach for Generating Diverse and Realistic Vulnerable Code Samples
    Next Article Img-Diff: A Novel Dataset for Enhancing Multimodal Language Models through Contrastive Learning and Image Difference Analysis

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 17, 2025
    Development

    Learn A1 Level Spanish

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Azure GPT-4 Analysis of the New CRA: Part 3

    Development

    How to Zoom in and out of a Video in VLC Player

    Development

    Vision-R1: Redefining Reinforcement Learning for Large Vision-Language Models

    Machine Learning

    AEM Front-End Developer: 10 Essential Tips for Beginners

    Development

    Highlights

    TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

    April 9, 2025

    This paper was accepted at the Scalable Continual Learning for Lifelong Foundation Models (SCLLFM) Workshop…

    Amazon SageMaker now integrates with Amazon DataZone to streamline machine learning governance

    May 8, 2024

    Linear Attention Sequence Parallel (LASP): An Efficient Machine Learning Method Tailored to Linear Attention-Based Language Models

    April 7, 2024

    Règles métier : guide complet avec des exemples pour une automatisation efficace

    June 28, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.