Evaluating LLM Trustworthiness: Insights from Harmoniticity Analysis Research from VISA Team

Large Language Models (LLMs) often provide confident answers, raising concerns about their reliability, especially for factual questions. Despite widespread hallucination in LLM-generated content, no established method to assess response trustworthiness exists. Users lack a â€œtrustworthiness scoreâ€ to determine response reliability without further research or verification. The aim is for LLMs to yield predominantly high trust scores, reducing the need for extensive user verification.

LLM evaluation has become pivotal in assessing model performance and resilience to input variations, which is crucial for real-world applications. The FLASK method evaluates LLMsâ€™ consistency across stylistic inputs, emphasizing alignment skills for precise model evaluation. Concerns over vulnerabilities in model-graded evaluations raise doubts about their reliability. Challenges in maintaining performance across rephrased instructions prompt the development of methods to enhance zero-shot robustness. PromptBench framework systematically evaluates LLMsâ€™ resilience to adversarial prompts, stressing the need to understand model responses to input changes. Recent studies explore adding noise to prompts to assess LLM robustness, proposing unified frameworks and privacy-preserving prompt learning techniques. Addressing LLM vulnerabilities to noisy inputs, especially in high-stakes scenarios, underscores the importance of consistent predictions. Methods for measuring LLM confidence, such as black-box and reflection-based methods, are gaining momentum. NLP literature suggests enduring sensitivity to perturbations, emphasizing the ongoing relevance of input robustness studies.

Researchers from VISA introduce an innovative approach to assess the real-time robustness of any black-box LLM, both in stability and explainability. This method relies on measuring local deviation from harmoniticity, denoted as Î³, offering a model-agnostic and unsupervised means of evaluating response robustness. Human annotation experiments establish a positive correlation between Î³ and false or misleading answers. Also, employing stochastic gradient ascent along the gradient of Î³ efficiently reveals adversarial prompts, demonstrating the methodâ€™s effectiveness. The proposed work extends the application of Harmonic Robustness (The prior method developed by authors to measure the robustness of predictive machine learning models) to LLMs.

The researchers present an algorithm for computing Î³, a measure of robustness, for input to LLMs. This method calculates the angle between the average output embedding of perturbed inputs and the original output embedding. Human annotation experiments demonstrate the correlation between Î³ and false or misleading answers. Examples illustrate the stability of GPT-4 outputs under perturbations, showing Î³ = 0 for stable answers. However, slight grammatical variations lead to small, non-zero Î³ values, indicating trustworthy responses. For significant variations, Î³ increases, suggesting decreased trustworthiness, though not always indicating incorrectness. Empirical measurement across models and domains is proposed to clarify the correlation between Î³ and trustworthiness.

The researchers measure the correlation between Î³, LLM robustness, and trustworthiness across various LLMs and question-answer (QA) corpora. Five leading LLMs, GPT-4, ChatGPT, Claude 2.1, Mixtral-8x7B, and Smaug-72B, and two older, smaller models, Llama2-7B and MPT-7B are evaluated. Three QA corpora, Web QA, TruthfulQA, and Programming QA, are considered to capture different domains. Human annotators rate the truthfulness and relevance of LLM answers using a 5-point scale. Fleissâ€™ Kappa indicates consistent inter-annotator agreement. Î³ values below 0.05 generally correspond to trustworthy responses, while increasing Î³ tends to correlate with decreased quality, although model and domain-dependent. Larger LLMs exhibit lower Î³ values, suggesting higher trustworthiness, with GPT-4 generally leading in quality and certified trustworthiness.

In conclusion, this study presents a robust approach to assess LLM response robustness using Î³ values, offering insights into their trustworthiness. Correlating Î³ with human annotations provides a practical metric for evaluating LLM reliability across various models and domains. Across all models and domains tested by researchers, human ratings confirm that Î³ â†’ 0 indicates trustworthiness, and the low-Î³ leaders among the tested models are GPT-4, ChatGPT, and Smaug-72B.Â

Check out theÂ Paper.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 40k+ ML SubReddit

The post Evaluating LLM Trustworthiness: Insights from Harmoniticity Analysis Research from VISA Team appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Evaluating LLM Trustworthiness: Insights from Harmoniticity Analysis Research from VISA Team

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

Vibe Coding: Game Changer or Catastrophe For App/Game Dev?

Last Week in AI #301 – Claude 3.7, Grok 3, Figure Helix

Breaking Barriers in Audio Quality: Introducing PeriodWave-Turbo for Efficient Waveform Synthesis

How GitHub uses CodeQL to secure GitHub

Veed co-founders turn to Speech AI to democratize AI video editing

UK Trails Behind Europe in Technical Skills Proficiency, Coursera Report Finds

Celebrate 30 years of PHP at PHPverse

How do you ensure your ‘passing’ tests(UI automated) actually pass?

Evaluating LLM Trustworthiness: Insights from Harmoniticity Analysis Research from VISA Team

Related Posts