Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Evaluating LLM Trustworthiness: Insights from Harmoniticity Analysis Research from VISA Team

    Evaluating LLM Trustworthiness: Insights from Harmoniticity Analysis Research from VISA Team

    May 2, 2024

    Large Language Models (LLMs) often provide confident answers, raising concerns about their reliability, especially for factual questions. Despite widespread hallucination in LLM-generated content, no established method to assess response trustworthiness exists. Users lack a “trustworthiness score” to determine response reliability without further research or verification. The aim is for LLMs to yield predominantly high trust scores, reducing the need for extensive user verification.

    LLM evaluation has become pivotal in assessing model performance and resilience to input variations, which is crucial for real-world applications. The FLASK method evaluates LLMs’ consistency across stylistic inputs, emphasizing alignment skills for precise model evaluation. Concerns over vulnerabilities in model-graded evaluations raise doubts about their reliability. Challenges in maintaining performance across rephrased instructions prompt the development of methods to enhance zero-shot robustness. PromptBench framework systematically evaluates LLMs’ resilience to adversarial prompts, stressing the need to understand model responses to input changes. Recent studies explore adding noise to prompts to assess LLM robustness, proposing unified frameworks and privacy-preserving prompt learning techniques. Addressing LLM vulnerabilities to noisy inputs, especially in high-stakes scenarios, underscores the importance of consistent predictions. Methods for measuring LLM confidence, such as black-box and reflection-based methods, are gaining momentum. NLP literature suggests enduring sensitivity to perturbations, emphasizing the ongoing relevance of input robustness studies.

    Researchers from VISA introduce an innovative approach to assess the real-time robustness of any black-box LLM, both in stability and explainability. This method relies on measuring local deviation from harmoniticity, denoted as γ, offering a model-agnostic and unsupervised means of evaluating response robustness. Human annotation experiments establish a positive correlation between γ and false or misleading answers. Also, employing stochastic gradient ascent along the gradient of γ efficiently reveals adversarial prompts, demonstrating the method’s effectiveness. The proposed work extends the application of Harmonic Robustness (The prior method developed by authors to measure the robustness of predictive machine learning models) to LLMs.

    The researchers present an algorithm for computing γ, a measure of robustness, for input to LLMs. This method calculates the angle between the average output embedding of perturbed inputs and the original output embedding. Human annotation experiments demonstrate the correlation between γ and false or misleading answers. Examples illustrate the stability of GPT-4 outputs under perturbations, showing γ = 0 for stable answers. However, slight grammatical variations lead to small, non-zero γ values, indicating trustworthy responses. For significant variations, γ increases, suggesting decreased trustworthiness, though not always indicating incorrectness. Empirical measurement across models and domains is proposed to clarify the correlation between γ and trustworthiness.

    The researchers measure the correlation between γ, LLM robustness, and trustworthiness across various LLMs and question-answer (QA) corpora. Five leading LLMs, GPT-4, ChatGPT, Claude 2.1, Mixtral-8x7B, and Smaug-72B, and two older, smaller models, Llama2-7B and MPT-7B are evaluated. Three QA corpora, Web QA, TruthfulQA, and Programming QA, are considered to capture different domains. Human annotators rate the truthfulness and relevance of LLM answers using a 5-point scale. Fleiss’ Kappa indicates consistent inter-annotator agreement. γ values below 0.05 generally correspond to trustworthy responses, while increasing γ tends to correlate with decreased quality, although model and domain-dependent. Larger LLMs exhibit lower γ values, suggesting higher trustworthiness, with GPT-4 generally leading in quality and certified trustworthiness.

    In conclusion, this study presents a robust approach to assess LLM response robustness using γ values, offering insights into their trustworthiness. Correlating γ with human annotations provides a practical metric for evaluating LLM reliability across various models and domains. Across all models and domains tested by researchers, human ratings confirm that γ → 0 indicates trustworthiness, and the low-γ leaders among the tested models are GPT-4, ChatGPT, and Smaug-72B. 

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post Evaluating LLM Trustworthiness: Insights from Harmoniticity Analysis Research from VISA Team appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop Artificial Intelligence (AI) Governance Laws and Frameworks
    Next Article ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Vibe Coding: Game Changer or Catastrophe For App/Game Dev?

    Web Development

    Last Week in AI #301 – Claude 3.7, Grok 3, Figure Helix

    Artificial Intelligence

    Breaking Barriers in Audio Quality: Introducing PeriodWave-Turbo for Efficient Waveform Synthesis

    Development

    How GitHub uses CodeQL to secure GitHub

    News & Updates

    Highlights

    Artificial Intelligence

    Veed co-founders turn to Speech AI to democratize AI video editing

    November 15, 2024

    Wanting to reduce the average user’s barriers to producing high-quality videos, co-founders Sabba Keynejad and…

    UK Trails Behind Europe in Technical Skills Proficiency, Coursera Report Finds

    June 12, 2024

    Celebrate 30 years of PHP at PHPverse

    April 30, 2025

    How do you ensure your ‘passing’ tests(UI automated) actually pass?

    June 18, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.