Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 4, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 4, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 4, 2025

      Smashing Animations Part 4: Optimising SVGs

      June 4, 2025

      I test AI tools for a living. Here are 3 image generators I actually use and how

      June 4, 2025

      The world’s smallest 65W USB-C charger is my latest travel essential

      June 4, 2025

      This Spotlight alternative for Mac is my secret weapon for AI-powered search

      June 4, 2025

      Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025
      Recent

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025

      Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

      June 4, 2025

      Cast Model Properties to a Uri Instance in 12.17

      June 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025
      Recent

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025

      Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

      June 4, 2025

      Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

      June 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning

    Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning

    February 3, 2025

    Traditional approaches to training language models heavily rely on supervised fine-tuning, where models learn by imitating correct responses. While effective for basic tasks, this method limits a model’s ability to develop deep reasoning skills. As artificial intelligence applications continue to evolve, there is a growing demand for models that can generate responses and critically evaluate their own outputs to ensure accuracy and logical consistency.

    A serious limitation of traditional training methods is that they are based on imitation of responses and restrict models from critical analysis of responses. As a result, imitation-based techniques fail to present proper logical depth when dealing with intricate reasoning problems, and generated outputs often resemble correct-sounding responses. More importantly, increases in dataset sizes do not automatically lead to improved generated response quality, negatively impacting the training of large models. These challenges draw attention to a need for different methods that better improve reasoning rather than increase computations.

    Existing solutions attempt to mitigate these issues using reinforcement learning and instruction tuning. Reinforcement learning with human feedback has shown promising results but requires large-scale computational resources. Another approach involves self-critique, where models assess their outputs for errors, but this often lacks consistency. Despite these advancements, most training techniques still focus on optimizing performance through sheer data volume rather than improving fundamental reasoning capabilities, which limits their effectiveness in complex problem-solving scenarios.

    A research team from the University of Waterloo, Carnegie Mellon University, and the Vector Institute proposed Critique Fine-Tuning (CFT) as an alternative to conventional supervised fine-tuning. This approach shifts the focus from imitation-based learning to critique-based learning, where models are trained to assess and refine responses rather than replicate them. To achieve this, researchers constructed a dataset of 50,000 critique samples using GPT-4o, enabling models to identify response flaws and suggest improvements. This method is particularly effective for domains requiring structured reasoning, such as mathematical problem-solving.

    The CFT methodology revolves around training models using structured critique datasets instead of conventional question-response pairs. During training, models are presented with a query and an initial response, followed by a critique that evaluates the response’s accuracy and logical coherence. By optimizing the model to generate critiques, researchers encourage a deeper analytical process that enhances reasoning capabilities. Unlike traditional fine-tuning, where models are rewarded for simply reproducing correct answers, CFT prioritizes identifying mistakes and suggesting improvements, leading to more reliable and explainable outputs.

    Experimental results demonstrate that CFT-trained models consistently outperform those trained using conventional methods. The researchers evaluated their approach across multiple mathematical reasoning benchmarks, including MATH, Minerva-Math, and OlympiadBench. Models trained using CFT showed a significant 4–10% performance improvement over their supervised fine-tuned counterparts. Specifically, Qwen2.5-Math-CFT, which was trained with as few as 50,000 examples, is comparable to and sometimes even superior to models competing against it with over 2 million samples in training. In addition, the framework yielded a 7.0% improvement in accuracy on the MATH benchmark and 16.6% on Minerva-Math compared to standard fine-tuning techniques. This significant improvement shows the efficiency of critique-based learning, which often promotes good results with significantly fewer training samples and computational resources.

    The findings from this study emphasize the advantages of critique-based learning in language model training. By shifting from response imitation to critique generation, researchers have introduced a method that enhances model accuracy and fosters deeper reasoning skills. The ability to critically assess and refine responses rather than generate them allows models to handle complex reasoning tasks more effectively. This research offers a promising direction for improving artificial intelligence training methodologies while reducing computational costs. Future work could refine the approach by integrating additional critique mechanisms to enhance model reliability and generalization across diverse problem-solving domains.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

     🚨 Marktechpost is inviting AI Companies/Startups/Groups to partner for its upcoming AI Magazines on ‘Open Source AI in Production’ and ‘Agentic AI’.

    The post Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpenAI Introduces Deep Research: An AI Agent that Uses Reasoning to Synthesize Large Amounts of Online Information and Complete Multi-Step Research Tasks
    Next Article Transformer-Based Modulation Recognition: A New Defense Against Adversarial Attacks

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 4, 2025
    Machine Learning

    A Coding Implementation to Build an Advanced Web Intelligence Agent with Tavily and Gemini AI

    June 4, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    OpenLogParser: A Breakthrough Unsupervised Log Parsing Approach Utilizing Open-Source LLMs for Enhanced Accuracy, Privacy, and Cost Efficiency in Large-Scale Data Processing

    Development

    Less Common HTML Elements and How to Use Them in Your Code

    Development

    Il dibattito su Rust nel kernel Linux si infiamma nuovamente, tra sviluppatori entusiasti e maintainer oppositori

    Linux

    Qwen2-Audio Released: A Revolutionary Audio-Language Model Overcoming Complex Audio Challenges with Unmatched Precision and Versatile Interaction Capabilities

    Development

    Highlights

    A Web2.5 approach to community building

    June 26, 2024

    Post Content Source: Read More 

    Measuring perception in AI models

    May 13, 2025

    One of the world’s most popular PC factory sims is finally exiting early access

    July 5, 2024

    ERROR_REGISTRY_QUOTA_LIMIT: 5 Steps to Fix

    January 23, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.