Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Rethinking QA Dataset Design: How Popular Knowledge Enhances LLM Accuracy?

    Rethinking QA Dataset Design: How Popular Knowledge Enhances LLM Accuracy?

    July 4, 2024

    Large language models (LLMs) have gained significant attention for their ability to store vast amounts of factual knowledge within their weights during pretraining. This capability has led to promising results in knowledge-intensive tasks, particularly factual question-answering. However, a critical challenge persists: LLMs often generate plausible but incorrect responses to queries, undermining their reliability. This inconsistency in factual accuracy poses a significant hurdle in the widespread adoption and trust of LLMs for knowledge-based applications. Researchers are grappling with the challenge of improving the factuality of LLM outputs while maintaining their versatility and generative capabilities. The problem is further complicated by the observation that even when LLMs possess the correct information, they may still produce inaccurate answers, suggesting underlying issues in knowledge retrieval and application.

    Researchers have attempted various approaches to improve factuality in LLMs. Some studies focus on the impact of unfamiliar examples during fine-tuning, revealing that these can potentially worsen factuality due to overfitting. Other approaches examine the reliability of factual knowledge, showing LLMs often underperform on obscure information. Techniques to enhance factuality include manipulating attention mechanisms, using unsupervised internal probes, and developing methods for LLMs to abstain from answering uncertain questions. Some researchers have introduced fine-tuning techniques to encourage LLMs to refuse questions outside their knowledge boundaries. Also,  studies have investigated LLM mechanisms and training dynamics, examining how facts are stored and extracted, and analyzing pretraining dynamics of syntax acquisition and attention patterns. Despite these efforts, challenges in achieving consistent factual accuracy persist.

    In this study, researchers from the Department of Machine Learning, at Carnegie Mellon University and the Department of Computer Science, at Stanford University found that the impact of fine-tuning examples on LLMs depends critically on how well the facts are encoded in the pre-trained model. Fine-tuning on well-encoded facts significantly improves factuality, while using less well-encoded facts can harm performance. This phenomenon occurs because LLMs can either use memorized knowledge or rely on general “shortcuts” to answer questions. The composition of fine-tuning data determines which mechanism is amplified. Well-known facts reinforce the use of memorized knowledge, while less familiar facts encourage shortcut usage. This insight provides a new perspective on improving LLM factuality through strategic selection of fine-tuning data.

    The method utilizes a synthetic setup to study the impact of fine-tuning data on LLM factuality. This setup simulates a simplified token space for subjects, relations, and answers, with different formatting between pretraining and downstream tasks. Pretraining samples are drawn from a Zipf distribution for subjects and a uniform distribution for relations. Key findings reveal that fine-tuning popular facts significantly improves factuality, with effects amplified for less popular entities. The study examines the influence of the Zipf distribution parameter and pretraining steps on this phenomenon. These observations lead to the concept of “fact salience,” representing how well a model knows a fact, which influences fine-tuning behavior and downstream performance. This synthetic approach allows for a controlled investigation of pretraining processes that would be impractical with real large language models.

    Experimental results across multiple datasets (PopQA, Entity-Questions, and MMLU) and models (Llama-7B and Mistral) consistently show that fine-tuning on less popular or less confident examples underperforms compared to using popular knowledge. This performance gap widens for less popular test points, supporting the hypothesis that less popular facts are more sensitive to fine-tuning choices. Surprisingly, even randomly selected subsets outperform fine-tuning on the least popular knowledge, suggesting that including some popular facts can mitigate the negative impact of less popular ones. Also, training on a smaller subset of the most popular facts often performs comparably or better than using the entire dataset. These findings indicate that careful selection of fine-tuning data, focusing on well-known facts, can lead to improved factual accuracy in LLMs, potentially allowing for more efficient and effective training processes.

    The study provides significant insights into improving language model factuality through strategic QA dataset composition. Contrary to intuitive assumptions, finetuning on well-known facts consistently enhances overall factuality. This finding, observed across various settings and supported by a conceptual model, challenges conventional approaches to QA dataset design. The research opens new avenues for improving language model performance, suggesting potential benefits in regularization techniques to overcome attention imbalance, curriculum learning strategies, and the development of synthetic data for efficient knowledge extraction. These findings provide a foundation for future work aimed at enhancing the factual accuracy and reliability of language models in diverse applications.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 46k+ ML SubReddit

    The post Rethinking QA Dataset Design: How Popular Knowledge Enhances LLM Accuracy? appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to extract a number from response body in jmeter?
    Next Article CMU Researchers Propose XEUS: A Cross-lingual Encoder for Universal Speech trained in 4000+ Languages

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2022-4363 – Wholesale Market WooCommerce CSRF Vulnerability

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-4127 – “WP SEO Structured Data Schema Stored Cross-Site Scripting Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    Improving Robustness Against Bias in Social Science Machine Learning: The Promise of Instruction-Based Models

    Development

    CVE-2025-4467 – SourceCodester Online Student Clearance System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Blockchain-Powered Digital Twins: Driving Efficiency & Innovation in Operations

    Web Development

    Highlights

    Development

    Massive Data Breach in Tamil Nadu: 600,000 Migrant Workers’ Data Allegedly Leaked on Dark Web

    June 7, 2024

    A massive data breach has allegedly been reported in the Indian state of Tamil Nadu,…

    $577 Million Cryptocurrency Fraud: Two Estonians Admit Role in Global Ponzi Scheme

    February 17, 2025

    Budget 2025 on the Horizon: Will India Take the Leap in Enhancing Cybersecurity and Data Privacy?

    January 20, 2025

    CVE-2025-4209 – “Apache HTTP Server Command Injection Vulnerability”

    May 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.