Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Turning User Research Into Real Organizational Change

      July 1, 2025

      June 2025: All AI updates from the past month

      June 30, 2025

      Building a culture that will drive platform engineering success

      June 30, 2025

      Gartner: More than 40% of agentic AI projects will be canceled in the next few years

      June 30, 2025

      I FINALLY got my hands on my most anticipated gaming laptop of 2025 — and it’s a 14-inch monster

      July 1, 2025

      This gimbal-tracking webcam has TWO cameras and a great price — but it may not be “private” enough

      July 1, 2025

      I spent two months using the massive Area-51 gaming rig — both a powerful beast PC and an RGB beauty queen

      July 1, 2025

      “Using AI is no longer optional” — Did Microsoft just make Copilot mandatory for its staff as a critical performance metric?

      July 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      June report 2025

      July 1, 2025
      Recent

      June report 2025

      July 1, 2025

      Make your JS functions smarter and cleaner with default parameters

      July 1, 2025

      Best Home Interiors in Hyderabad – Top Designers & Affordable Packages

      July 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I FINALLY got my hands on my most anticipated gaming laptop of 2025 — and it’s a 14-inch monster

      July 1, 2025
      Recent

      I FINALLY got my hands on my most anticipated gaming laptop of 2025 — and it’s a 14-inch monster

      July 1, 2025

      This gimbal-tracking webcam has TWO cameras and a great price — but it may not be “private” enough

      July 1, 2025

      I spent two months using the massive Area-51 gaming rig — both a powerful beast PC and an RGB beauty queen

      July 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning

    Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning

    June 6, 2025

    Reinforcement finetuning uses reward signals to guide the large language model toward desirable behavior. This method sharpens the model’s ability to produce logical and structured outputs by reinforcing correct responses. Yet, the challenge persists in ensuring that these models also know when not to respond—particularly when faced with incomplete or misleading questions that don’t have a definite answer.

    The problem arises when language models, after reinforcement finetuning, begin to lose their ability to refuse to answer unclear or ambiguous queries. Instead of signaling uncertainty, the models tend to produce confidently stated but incorrect responses. This phenomenon, identified in the paper as the “hallucination tax,” highlights a growing risk. As models are trained to perform better, they may also become more likely to hallucinate answers in situations where silence would be more appropriate. This is especially hazardous in domains that require high trust and precision.

    Tools currently used in training large language models often overlook the importance of refusal behavior. Reinforcement finetuning frameworks tend to reward only correct answers while penalizing incorrect ones, ignoring cases where a valid response should be no answer at all. The reward systems in use do not sufficiently reinforce refusal, resulting in overconfident models. For instance, the paper shows that refusal rates dropped to near zero across multiple models after standard RFT, demonstrating that current training fails to address hallucination properly.

    Researchers from the University of Southern California developed the Synthetic Unanswerable Math (SUM) dataset. SUM introduces implicitly unanswerable math problems by modifying existing questions through criteria such as missing key information or creating logical inconsistencies. The researchers used DeepScaleR as the base dataset and employed the o3-mini model to generate high-quality unanswerable questions. This synthetic dataset aims to teach models to recognize when a problem lacks sufficient information and respond accordingly.

    SUM’s core technique is to mix answerable and unanswerable problems during training. Questions are modified to become ambiguous or unsolvable while maintaining plausibility. The training prompts instruct models to say “I don’t know” for unanswerable inputs. By introducing only 10% of the SUM data into reinforcement finetuning, models begin to leverage inference-time reasoning to evaluate uncertainty. This structure allows them to refuse answers more appropriately without impairing their performance on solvable problems.

    Performance analysis shows significant improvements. After training with SUM, the Qwen2.5-7B model increased its refusal rate from 0.01 to 0.73 on the SUM benchmark and from 0.01 to 0.81 on the UMWP benchmark. On the SelfAware dataset, refusal accuracy rose dramatically from 0.01 to 0.94. Llama-3.1-8B-Instruct showed a similar trend, with refusal rates improving from 0.00 to 0.75 on SUM and from 0.01 to 0.79 on UMWP. Despite these gains in refusal behavior, accuracy on answerable datasets, such as GSM8K and MATH-500, remained stable, with most changes ranging from 0.00 to -0.05. The minimal drop indicates that refusal training can be introduced without major sacrifices in task performance.

    This study outlines a clear trade-off between improved reasoning and trustworthiness. Reinforcement finetuning, while powerful, tends to suppress cautious behavior. The SUM dataset corrects this by teaching models to recognize what they cannot solve. With only a small addition to training data, language models become better at identifying the boundaries of their knowledge. This approach marks a significant step in making AI systems not just smarter but also more careful and honest.


    Check out the Paper and Dataset on Hugging Face. All credit for this research goes to the researchers of this project.

    🆕 Did you know? Marktechpost is the fastest-growing AI media platform—trusted by over 1 million monthly readers. Book a strategy call to discuss your campaign goals. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAlibaba Qwen Team Releases Qwen3-Embedding and Qwen3-Reranker Series – Redefining Multilingual Embedding and Ranking Standards
    Next Article FLUX.1 Kontext — The First AI Image Editor I Can Actually Control

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 1, 2025
    Machine Learning

    Instruction-Following Pruning for Large Language Models

    June 30, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Perficient and DTE at DTECH Midwest: Powering AMI Networks with User-Centric Solutions

    Development

    CVE-2024-13956 – ASPECT SSL Verification Bypass Authentication Bypass

    Common Vulnerabilities and Exposures (CVEs)

    Live Shopping Explained: Trends, Growth & Brand Impact

    Web Development

    RoboCat: A self-improving robotic agent

    Artificial Intelligence

    Highlights

    CVE-2025-32441 – Rack Session Pool Session Hijacking Vulnerability

    May 7, 2025

    CVE ID : CVE-2025-32441

    Published : May 7, 2025, 11:15 p.m. | 20 minutes ago

    Description : Rack is a modular Ruby web server interface. Prior to version 2.2.14, when using the `Rack::Session::Pool` middleware, simultaneous rack requests can restore a deleted rack session, which allows the unauthenticated user to occupy that session. Rack session middleware prepares the session at the beginning of request, then saves is back to the store with possible changes applied by host rack application. This way the session becomes to be a subject of race conditions in general sense over concurrent rack requests. When using the `Rack::Session::Pool` middleware, and provided the attacker can acquire a session cookie (already a major issue), the session may be restored if the attacker can trigger a long running request (within that same session) adjacent to the user logging out, in order to retain illicit access even after a user has attempted to logout. Version 2.2.14 contains a patch for the issue. Some other mitigations are available. Either ensure the application invalidates sessions atomically by marking them as logged out e.g., using a `logged_out` flag, instead of deleting them, and check this flag on every request to prevent reuse; or implement a custom session store that tracks session invalidation timestamps and refuses to accept session data if the session was invalidated after the request began.

    Severity: 4.2 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-45617 – Production SSM User List Unrestricted Access

    May 5, 2025

    Understanding the :root Selector and CSS Variables in CSS3

    April 9, 2025

    CVE-2025-49511 – Civi Framework CSRF

    June 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.