Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025

      New Xbox games launching this week, from June 2 through June 8 — Zenless Zone Zero finally comes to Xbox

      June 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025
      Recent

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»CMU Researchers Propose QueRE: An AI Approach to Extract Useful Features from a LLM

    CMU Researchers Propose QueRE: An AI Approach to Extract Useful Features from a LLM

    January 17, 2025

    Large Language Models (LLMs) have become integral to various artificial intelligence applications, demonstrating capabilities in natural language processing, decision-making, and creative tasks. However, critical challenges remain in understanding and predicting their behaviors. Treating LLMs as black boxes complicates efforts to assess their reliability, particularly in contexts where errors can have significant consequences. Traditional approaches often rely on internal model states or gradients to interpret behaviors, which are unavailable for closed-source, API-based models. This limitation raises an important question: how can we effectively evaluate LLM behavior with only black-box access? The problem is further compounded by adversarial influences and potential misrepresentation of models through APIs, highlighting the need for robust and generalizable solutions.

    To address these challenges, researchers at Carnegie Mellon University have developed QueRE (Question Representation Elicitation). This method is tailored for black-box LLMs and extracts low-dimensional, task-agnostic representations by querying models with follow-up prompts about their outputs. These representations, based on probabilities associated with elicited responses, are used to train predictors of model performance. Notably, QueRE performs comparably to or even better than some white-box techniques in reliability and generalizability.

    Unlike methods dependent on internal model states or full output distributions, QueRE relies on accessible outputs, such as top-k probabilities available through most APIs. When such probabilities are unavailable, they can be approximated through sampling. QueRE’s features also enable evaluations such as detecting adversarially influenced models and distinguishing between architectures and sizes, making it a versatile tool for understanding and utilizing LLMs.

    Technical Details and Benefits of QueRE

    QueRE operates by constructing feature vectors derived from elicitation questions posed to the LLM. For a given input and the model’s response, these questions assess aspects such as confidence and correctness. Questions like “Are you confident in your answer?” or “Can you explain your answer?” enable the extraction of probabilities that reflect the model’s reasoning.

    The extracted features are then used to train linear predictors for various tasks:

    1. Performance Prediction: Evaluating whether a model’s output is correct at an instance level.
    2. Adversarial Detection: Identifying when responses are influenced by malicious prompts.
    3. Model Differentiation: Distinguishing between different architectures or configurations, such as identifying smaller models misrepresented as larger ones.

    By relying on low-dimensional representations, QueRE supports strong generalization across tasks. Its simplicity ensures scalability and reduces the risk of overfitting, making it a practical tool for auditing and deploying LLMs in diverse applications.

    Results and Insights

    Experimental evaluations demonstrate QueRE’s effectiveness across several dimensions. In predicting LLM performance on question-answering (QA) tasks, QueRE consistently outperformed baselines relying on internal states. For instance, on open-ended QA benchmarks like SQuAD and Natural Questions (NQ), QueRE achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) exceeding 0.95. Similarly, it excelled in detecting adversarially influenced models, outperforming other black-box methods.

    QueRE also proved robust and transferable. Its features were successfully applied to out-of-distribution tasks and different LLM configurations, validating its adaptability. The low-dimensional representations facilitated efficient training of simple models, ensuring computational feasibility and robust generalization bounds.

    Another notable result was QueRE’s ability to use random sequences of natural language as elicitation prompts. These sequences often matched or exceeded the performance of structured queries, highlighting the method’s flexibility and potential for diverse applications without extensive manual prompt engineering.

    Conclusion

    QueRE offers a practical and effective approach to understanding and optimizing black-box LLMs. By transforming elicitation responses into actionable features, QueRE provides a scalable and robust framework for predicting model behavior, detecting adversarial influences, and differentiating architectures. Its success in empirical evaluations suggests it is a valuable tool for researchers and practitioners aiming to enhance the reliability and safety of LLMs.

    As AI systems evolve, methods like QueRE will play a crucial role in ensuring transparency and trustworthiness. Future work could explore extending QueRE’s applicability to other modalities or refining its elicitation strategies for enhanced performance. For now, QueRE represents a thoughtful response to the challenges posed by modern AI systems.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 Recommend Open-Source Platform: Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios. (Promoted)

    The post CMU Researchers Propose QueRE: An AI Approach to Extract Useful Features from a LLM appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleVLC Introduce i Sottotitoli Generati da AI!
    Next Article Meet Tensor Product Attention (TPA): Revolutionizing Memory Efficiency in Language Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    Enigmata’s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    US charges two Russian men in connection with Phobos ransomware operation

    Development

    CVE-2025-48340 – Danny Vink User Profile Meta Manager CSRF Privilege Escalation

    Common Vulnerabilities and Exposures (CVEs)

    Russian Cybercrime Groups Exploiting 7-Zip Flaw to Bypass Windows MotW Protections

    Development

    Best CPU for NVIDIA RTX 4080 in 2024

    Development
    GetResponse

    Highlights

    Development

    Rilasciato SparkyLinux 7.6: Scopri le Novità

    January 5, 2025

    SparkyLinux 7.6, la nuova versione della distribuzione GNU/Linux leggera basata su Debian, è ora disponibile…

    Universal Design and Global Accessibility Awareness Day (GAAD)

    May 20, 2025

    Researchers Track ManticoraLoader Malware to Ares Malware Developer

    August 31, 2024

    The best SSH clients for Linux (and why you need them)

    August 21, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.