Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

Large Language Models (LLMs) have gained significant attention for their versatility, but their factualness remains a critical concern. Studies have revealed that LLMs can produce nonfactual, hallucinated, or outdated information, undermining reliability. Current evaluation methods, such as fact-checking and fact-QA, face several challenges. Fact-checking struggles to assess the factualness of generated content, while fact-QA encounters difficulties scaling up evaluation data due to expensive annotation processes. Both approaches also face the risk of data contamination from web-crawled pretraining corpora. Also, LLMs often respond inconsistently to the same fact when presented in different forms, a challenge is that existing evaluation datasets need to be equipped to address.

Existing attempts to evaluate LLMsâ€™ knowledge primarily use specific datasets, but face challenges like data leakage, static content, and limited metrics. Knowledge graphs (KGs) offer advantages in customization, evolving knowledge, and reduced test set leakage. Methods like LAMA and LPAQA use KGs for evaluation but struggle with unnatural question formats and impracticality for large KGs. KaRR overcomes some issues but remains inefficient for large graphs and lacks generalizability. Current approaches focus on accuracy over reliability, failing to address LLMsâ€™ inconsistent responses to the same fact. Also, no existing work visualizes LLMsâ€™ knowledge using KGs, presenting an opportunity for improvement. These limitations highlight the need for more comprehensive and efficient methods to evaluate and understand LLMsâ€™ knowledge retention and accuracy.

Researchers from Apple introduced KGLENS, an innovative knowledge probing framework that has been developed to measure knowledge alignment between KGs and LLMs and identify LLMsâ€™ knowledge blind spots. The framework employs a Thompson sampling-inspired method with a parameterized knowledge graph (PKG) to probe LLMs efficiently. KGLENS features a graph-guided question generator that converts KGs into natural language using GPT-4, designing two types of questions (fact-checking and fact-QA) to reduce answer ambiguity. Human evaluation shows that 97.7% of generated questions are sensible to annotators.

KGLENS employs a unique approach to efficiently probe LLMsâ€™ knowledge using a PKG and Thompson sampling-inspired method. The framework initializes a PKG where each edge is augmented with a beta distribution, indicating the LLMâ€™s potential deficiency on that edge. It then samples edges based on their probability, generates questions from these edges, and examines the LLM through a question-answering task. The PKG is updated based on the results, and this process iterates until convergence. Also, This framework features a graph-guided question generator that converts KG edges into natural language questions using GPT-4. It creates two types of questions: Yes/No questions for judgment and Wh-questions for generation, with the question type controlled by the graph structure. Entity aliases are included to reduce ambiguity.

For answer verification, KGLENS instructs LLMs to generate specific response formats and employs GPT-4 to check the correctness of responses for Wh-questions. The frameworkâ€™s efficiency is evaluated through various sampling methods, demonstrating its effectiveness in identifying LLMsâ€™ knowledge blind spots across diverse topics and relationships.

KGLENS evaluation across various LLMs reveals that the GPT-4 family consistently outperforms other models. GPT-4, GPT-4o, and GPT-4-turbo show comparable performance, with GPT-4o being more cautious with personal information. A significant gap exists between GPT-3.5-turbo and GPT-4, with GPT-3.5-turbo sometimes performing worse than legacy LLMs due to its conservative approach. Legacy models like Babbage-002 and Davinci-002 show only slight improvement over random guessing, highlighting the progress in recent LLMs. The evaluation provides insights into different error types and model behaviors, demonstrating the varying capabilities of LLMs in handling diverse knowledge domains and difficulty levels.

KGLENS introduces an efficient method for evaluating factual knowledge in LLMs using a Thompson sampling-inspired approach with parameterized Knowledge Graphs. The framework outperforms existing methods in revealing knowledge blind spots and demonstrates adaptability across various domains. Human evaluation confirms its effectiveness, achieving 95.7% accuracy. KGLENS and its assessment of KGs will be made available to the research community, fostering collaboration. For businesses, this tool facilitates the development of more reliable AI systems, enhancing user experiences and improving model knowledge. KGLENS represents a significant advancement in creating more accurate and dependable AI applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Apps in Generative AI – Transforming the Digital Experience

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

February 2025 Baseline monthly digest

Learn A1 Level Spanish

Azure GPT-4 Analysis of the New CRA: Part 3

How to Zoom in and out of a Video in VLC Player

Vision-R1: Redefining Reinforcement Learning for Large Vision-Language Models

AEM Front-End Developer: 10 Essential Tips for Beginners

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

Amazon SageMaker now integrates with Amazon DataZone to streamline machine learning governance

Linear Attention Sequence Parallel (LASP): An Efficient Machine Learning Method Tailored to Linear Attention-Based Language Models

RÃ¨gles mÃ©tier : guide complet avec des exemples pour une automatisation efficace

Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

Related Posts