University of Oxford study identifies when AI hallucinations are more likely to occur

A University of Oxford study developed a means of testing when language models are â€œunsureâ€ of their output or hallucinating.Â

AI â€œhallucinationsâ€ refer to a phenomenon where large language models (LLMs) generate fluent and plausible responses that are not grounded in truth or consistent across conversations.Â

In other words, an LLM is said to be hallucinating when it produces content that appears convincing on the surface but is fabricated or inconsistent with previous statements.

Hallucinations are tough â€“ if not impossible â€“ to separate from AI models. AI developers like OpenAI, Google, and Anthropic have all admitted that hallucinations will likely remain a byproduct of interacting with AI.Â

As Dr. Sebastian Farquhar, one of the studyâ€™s authors, explains in a blog post, â€œLLMs are highly capable of saying the same thing in many different ways, which can make it difficult to tell when they are certain about an answer and when they are literally just making something up.â€Â

The Cambridge Dictionary even added an AI-related definition to the word in 2023 and named it â€œWord of the Year.â€Â

The question this University of Oxford study sought to answer is: whatâ€™s really going on under the hood when an LLM hallucinates? And how can we detect when itâ€™s likely to happen?

The researchers aimed to address the problem of hallucinations by developing a novel method to detect exactly when an LLM is likely to generate fabricated or inconsistent information.

The study, published in Nature, introduces a concept called â€œsemantic entropy,â€ which measures the uncertainty of an LLMâ€™s output at the level of meaning rather than just the specific words or phrases used.Â

By computing the semantic entropy of an LLMâ€™s responses, the researchers can estimate the modelâ€™s confidence in its outputs and identify instances when itâ€™s likely to hallucinate.

Identifying exactly when a model is likely to hallucinate enables the preemptive detection of those hallucinations.

In high-stakes applications like finance or law, such detection would enable users to shut down the model or probe its responses for accuracy before using them in the real world.

Semantic entropy in LLMs

Semantic entropy, as defined by the study, measures the uncertainty or inconsistency in the meaning of an LLMâ€™s responses. It helps detect when an LLM might be hallucinating or generating unreliable information.

Hereâ€™s how it works:

The researchers actively prompted the LLM to generate several possible responses to the same question. This is achieved by feeding the question to the LLM multiple times, each time with a different random seed or slight variation in the input.
Semantic entropy examines responses and groups those with the same underlying meaning, even if they use different words or phrasing.
If the LLM is confident about the answer, its responses should have similar meanings, resulting in a low semantic entropy score. This suggests that the LLM clearly and consistently understands the information.
However, if the LLM is uncertain or confused, its responses will have a wider variety of meanings, some of which might be inconsistent or unrelated to the question. This results in a high semantic entropy score, indicating that the LLM may hallucinate or generate unreliable information.

To evaluate semantic entropyâ€™s effectiveness, the researchers applied it to a diverse set of question-answering tasks.Â

This involved benchmarks like trivia questions, reading comprehension, word problems, and biographies.Â

Across the board, semantic entropy outperformed existing methods for detecting when an LLM was likely to generate an incorrect or inconsistent answer.

Semantic entropy clusters answers with shared meanings before calculating entropy, making it suitable for language tasks where different answers can mean the same thing. Low semantic entropy indicates the LLMâ€™s confidence in the meaning. For longer passages, the text is decomposed into factoids, questions are generated that could yield each factoid, and the LLM generates multiple answers. Semantic entropy, including the original factoid, is computed for each questionâ€™s answers. High average semantic entropy suggests confabulation (essentially hallucinated facts stated as real), while low entropy, despite varying wording, indicates a likely true factoid. Source: Nature (open access)

In simpler terms, semantic entropy measures how â€œconfusedâ€ an LLMâ€™s output is.Â

You can see in the above diagram how some prompts push the LLM to generate a confabulated (inaccurate) response, such as it produces a day and month of birth when this wasnâ€™t provided in the initial information.

The LLM will likely provide reliable information if the meanings are closely related and consistent.Â But if the meanings are scattered and inconsistent, itâ€™s a red flag that the LLM might be hallucinating or generating inaccurate information.

By calculating the semantic entropy of an LLMâ€™s responses, researchers can detect when the model will likely produce unreliable or inconsistent information, even if the generated text seems fluent and plausible on the surface.Â

Implications

This work can help explain hallucinations and make LLMs more reliable and trustworthy.Â

By providing a way to detect when an LLM is uncertain or prone to hallucination, semantic entropy paves the way for deploying these AI tools in high-stakes domains where factual accuracy is critical, like healthcare, law, and finance.

Erroneous results can potentially have catastrophic impacts in these areas, as shown by some failed predictive policing and healthcare systems.Â

However, itâ€™s important to remember that hallucination is just one type of error that LLMs can make.Â

As Dr. Farquhar notes, â€œIf an LLM makes consistent mistakes, this new method wonâ€™t catch that. The most dangerous failures of AI come when a system does something bad but is confident and systematic. There is still a lot of work to do.â€

Nevertheless, the Oxford teamâ€™s semantic entropy method represents a major step forward in our ability to understand and mitigate the limitations of AI language models.Â

Providing an objective means to detect them brings us closer to a future where we can harness AIâ€™s potential while ensuring it remains a reliable and trustworthy tool in the service of humanity.

The post University of Oxford study identifies when AI hallucinations are more likely to occur appeared first on DailyAI.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

University of Oxford study identifies when AI hallucinations are more likely to occur

Semantic entropy in LLMs

Implications

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

Researchers from Meta AI and UT Austin Explored Scaling in Auto-Encoders and Introduced ViTok: A ViT-Style Auto-Encoder to Perform Exploration

Microsoft desperately urges Windows 11 installs even on unsupported hardware

Distribution Release: Nobara Project 41

Dummy – Generate PHP class instances populated with dummy data using Faker

How Linux is Revolutionizing Education with Open Source Learning

Windows 11 Photos app finally adds a Microsoft Paint-like dynamic zoom slider

Enhancing DevSecOps Workflows with Generative AI: A Comprehensive Guide

vv – image viewer for sixel terminals

University of Oxford study identifies when AI hallucinations are more likely to occur

Semantic entropy in LLMs

Implications

Related Posts