Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      How Red Hat just quietly, radically transformed enterprise server Linux

      June 2, 2025

      OpenAI wants ChatGPT to be your ‘super assistant’ – what that means

      June 2, 2025

      The best Linux VPNs of 2025: Expert tested and reviewed

      June 2, 2025

      One of my favorite gaming PCs is 60% off right now

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      `document.currentScript` is more useful than I thought.

      June 2, 2025
      Recent

      `document.currentScript` is more useful than I thought.

      June 2, 2025

      Adobe Sensei and GenAI in Practice for Enterprise CMS

      June 2, 2025

      Over The Air Updates for React Native Apps

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025
      Recent

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025

      Microsoft says Copilot can use location to change Outlook’s UI on Android

      June 2, 2025

      TempoMail — Command Line Temporary Email in Linux

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning

    Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning

    February 3, 2025

    Traditional approaches to training language models heavily rely on supervised fine-tuning, where models learn by imitating correct responses. While effective for basic tasks, this method limits a model’s ability to develop deep reasoning skills. As artificial intelligence applications continue to evolve, there is a growing demand for models that can generate responses and critically evaluate their own outputs to ensure accuracy and logical consistency.

    A serious limitation of traditional training methods is that they are based on imitation of responses and restrict models from critical analysis of responses. As a result, imitation-based techniques fail to present proper logical depth when dealing with intricate reasoning problems, and generated outputs often resemble correct-sounding responses. More importantly, increases in dataset sizes do not automatically lead to improved generated response quality, negatively impacting the training of large models. These challenges draw attention to a need for different methods that better improve reasoning rather than increase computations.

    Existing solutions attempt to mitigate these issues using reinforcement learning and instruction tuning. Reinforcement learning with human feedback has shown promising results but requires large-scale computational resources. Another approach involves self-critique, where models assess their outputs for errors, but this often lacks consistency. Despite these advancements, most training techniques still focus on optimizing performance through sheer data volume rather than improving fundamental reasoning capabilities, which limits their effectiveness in complex problem-solving scenarios.

    A research team from the University of Waterloo, Carnegie Mellon University, and the Vector Institute proposed Critique Fine-Tuning (CFT) as an alternative to conventional supervised fine-tuning. This approach shifts the focus from imitation-based learning to critique-based learning, where models are trained to assess and refine responses rather than replicate them. To achieve this, researchers constructed a dataset of 50,000 critique samples using GPT-4o, enabling models to identify response flaws and suggest improvements. This method is particularly effective for domains requiring structured reasoning, such as mathematical problem-solving.

    The CFT methodology revolves around training models using structured critique datasets instead of conventional question-response pairs. During training, models are presented with a query and an initial response, followed by a critique that evaluates the response’s accuracy and logical coherence. By optimizing the model to generate critiques, researchers encourage a deeper analytical process that enhances reasoning capabilities. Unlike traditional fine-tuning, where models are rewarded for simply reproducing correct answers, CFT prioritizes identifying mistakes and suggesting improvements, leading to more reliable and explainable outputs.

    Experimental results demonstrate that CFT-trained models consistently outperform those trained using conventional methods. The researchers evaluated their approach across multiple mathematical reasoning benchmarks, including MATH, Minerva-Math, and OlympiadBench. Models trained using CFT showed a significant 4–10% performance improvement over their supervised fine-tuned counterparts. Specifically, Qwen2.5-Math-CFT, which was trained with as few as 50,000 examples, is comparable to and sometimes even superior to models competing against it with over 2 million samples in training. In addition, the framework yielded a 7.0% improvement in accuracy on the MATH benchmark and 16.6% on Minerva-Math compared to standard fine-tuning techniques. This significant improvement shows the efficiency of critique-based learning, which often promotes good results with significantly fewer training samples and computational resources.

    The findings from this study emphasize the advantages of critique-based learning in language model training. By shifting from response imitation to critique generation, researchers have introduced a method that enhances model accuracy and fosters deeper reasoning skills. The ability to critically assess and refine responses rather than generate them allows models to handle complex reasoning tasks more effectively. This research offers a promising direction for improving artificial intelligence training methodologies while reducing computational costs. Future work could refine the approach by integrating additional critique mechanisms to enhance model reliability and generalization across diverse problem-solving domains.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

     🚨 Marktechpost is inviting AI Companies/Startups/Groups to partner for its upcoming AI Magazines on ‘Open Source AI in Production’ and ‘Agentic AI’.

    The post Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpenAI Introduces Deep Research: An AI Agent that Uses Reasoning to Synthesize Large Amounts of Online Information and Complete Multi-Step Research Tasks
    Next Article Transformer-Based Modulation Recognition: A New Defense Against Adversarial Attacks

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    MiMo-VL-7B: A Powerful Vision-Language Model to Enhance General Visual Understanding and Multimodal Reasoning

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Best Free AI Tools To Boost Your Productivity

    Web Development

    “Not like us”: Samsung clowns Apple Intelligence’s AI eraser feature

    Operating Systems

    CVE-2025-39445 – Highwarden Super Store Finder SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Crypto Investors Alarmed as Coinstats Breach Impacts 1,590 Wallets

    Development

    Highlights

    News & Updates

    One of Elden Ring Nightreign’s biggest problems is (hopefully) getting fixed in a new update patch next week, but I’m not sure the changes will be enough

    May 30, 2025

    At launch, Elden Ring Nightreign’s solo play option has some major problems. FromSoftware is looking…

    LG’s 5K2K OLED gaming monitor is on sale for the best price we’ve seen yet

    May 20, 2025

    AI-Powered SaaS Security: Keeping Pace with an Expanding Attack Surface

    March 25, 2025

    [Easy] How to Install Brave Browser on Kali Linux

    January 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.