Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

Large Language Models (LLMs) have gained significant attention for their versatility, but their factualness remains a critical concern. Studies have revealed that LLMs can produce nonfactual, hallucinated, or outdated information, undermining reliability. Current evaluation methods, such as fact-checking and fact-QA, face several challenges. Fact-checking struggles to assess the factualness of generated content, while fact-QA encounters difficulties scaling up evaluation data due to expensive annotation processes. Both approaches also face the risk of data contamination from web-crawled pretraining corpora. Also, LLMs often respond inconsistently to the same fact when presented in different forms, a challenge is that existing evaluation datasets need to be equipped to address.

Existing attempts to evaluate LLMsâ€™ knowledge primarily use specific datasets, but face challenges like data leakage, static content, and limited metrics. Knowledge graphs (KGs) offer advantages in customization, evolving knowledge, and reduced test set leakage. Methods like LAMA and LPAQA use KGs for evaluation but struggle with unnatural question formats and impracticality for large KGs. KaRR overcomes some issues but remains inefficient for large graphs and lacks generalizability. Current approaches focus on accuracy over reliability, failing to address LLMsâ€™ inconsistent responses to the same fact. Also, no existing work visualizes LLMsâ€™ knowledge using KGs, presenting an opportunity for improvement. These limitations highlight the need for more comprehensive and efficient methods to evaluate and understand LLMsâ€™ knowledge retention and accuracy.

Researchers from Apple introduced KGLENS, an innovative knowledge probing framework that has been developed to measure knowledge alignment between KGs and LLMs and identify LLMsâ€™ knowledge blind spots. The framework employs a Thompson sampling-inspired method with a parameterized knowledge graph (PKG) to probe LLMs efficiently. KGLENS features a graph-guided question generator that converts KGs into natural language using GPT-4, designing two types of questions (fact-checking and fact-QA) to reduce answer ambiguity. Human evaluation shows that 97.7% of generated questions are sensible to annotators.

KGLENS employs a unique approach to efficiently probe LLMsâ€™ knowledge using a PKG and Thompson sampling-inspired method. The framework initializes a PKG where each edge is augmented with a beta distribution, indicating the LLMâ€™s potential deficiency on that edge. It then samples edges based on their probability, generates questions from these edges, and examines the LLM through a question-answering task. The PKG is updated based on the results, and this process iterates until convergence. Also, This framework features a graph-guided question generator that converts KG edges into natural language questions using GPT-4. It creates two types of questions: Yes/No questions for judgment and Wh-questions for generation, with the question type controlled by the graph structure. Entity aliases are included to reduce ambiguity.

For answer verification, KGLENS instructs LLMs to generate specific response formats and employs GPT-4 to check the correctness of responses for Wh-questions. The frameworkâ€™s efficiency is evaluated through various sampling methods, demonstrating its effectiveness in identifying LLMsâ€™ knowledge blind spots across diverse topics and relationships.

KGLENS evaluation across various LLMs reveals that the GPT-4 family consistently outperforms other models. GPT-4, GPT-4o, and GPT-4-turbo show comparable performance, with GPT-4o being more cautious with personal information. A significant gap exists between GPT-3.5-turbo and GPT-4, with GPT-3.5-turbo sometimes performing worse than legacy LLMs due to its conservative approach. Legacy models like Babbage-002 and Davinci-002 show only slight improvement over random guessing, highlighting the progress in recent LLMs. The evaluation provides insights into different error types and model behaviors, demonstrating the varying capabilities of LLMs in handling diverse knowledge domains and difficulty levels.

KGLENS introduces an efficient method for evaluating factual knowledge in LLMs using a Thompson sampling-inspired approach with parameterized Knowledge Graphs. The framework outperforms existing methods in revealing knowledge blind spots and demonstrates adaptability across various domains. Human evaluation confirms its effectiveness, achieving 95.7% accuracy. KGLENS and its assessment of KGs will be made available to the research community, fostering collaboration. For businesses, this tool facilitates the development of more reliable AI systems, enhancing user experiences and improving model knowledge. KGLENS represents a significant advancement in creating more accurate and dependable AI applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Seven years after launch, Fallout 76 is offishally getting its most requested, most critical feature

Minecraft friendly ghasts and Vibrant Visuals are rolling out to players

The best Minecraft game is free on Amazon Prime

Xbox is getting another popular PlayStation-exclusive with a gameplay reveal coming soon

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Brisa v0.2.10 release

Seven years after launch, Fallout 76 is offishally getting its most requested, most critical feature

Seven years after launch, Fallout 76 is offishally getting its most requested, most critical feature

Minecraft friendly ghasts and Vibrant Visuals are rolling out to players

After ads, Microsoft Copilot on Android is testing “MSN feed” to make some money

Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

AI Overviews: What They Are and How They Impact Your SEO

Speech-to-Text security: Top foundational security questions to consider for your next project using speech

The Role of Specifications in Modularizing Large Language Models

Twilio’s Authy App Breach Exposes Millions of Phone Numbers

Trinity-2-Codestral-22B and Tess-3-Mistral-Large-2-123B Released: Pioneering Open Source Advances in Computational Power and AI Integration

How to Build a Simple Portfolio Website With HTML and CSS

TensorOpera AI Releases Fox-1: A Series of Small Language Models (SLMs) that Includes Fox-1-1.6B and Fox-1-1.6B-Instruct-v0.1

Error’d: Alternative Maths

Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs

Related Posts