Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»O1-Pruner: Streamlining Long-Thought Reasoning in Language Models

    O1-Pruner: Streamlining Long-Thought Reasoning in Language Models

    January 24, 2025

    Large language models (LLMs) have introduced impressive capabilities, particularly in reasoning tasks. Models like OpenAI’s O1 utilize “long-thought reasoning,” where complex problems are broken into manageable steps and solutions are refined iteratively. While this approach enhances problem-solving, it comes at a cost: extended output sequences lead to increased computational time and energy use. These inefficiencies raise concerns about scalability and the practical usability of such models in real-world applications. Addressing this issue is essential for making LLMs more efficient and broadly applicable.

    Researchers from Sun Yat-sen University, China Agriculture University, Tsinghua University, the University of Oxford, Didichuxing, and NTU propose Length-Harmonizing Fine-Tuning (O1-Pruner). This technique seeks to reduce the inefficiencies in reasoning models while maintaining accuracy. The primary focus is on optimizing token usage, which is a significant bottleneck in current models. O1-Pruner uses reinforcement learning (RL) techniques to encourage the generation of shorter reasoning paths without sacrificing precision.

    The process begins with evaluating baseline performance through pre-sampling. A customized RL-style loss function then fine-tunes the model’s reasoning length, ensuring that the generated solutions are proportional to the complexity of the problem. By aligning reasoning length with task difficulty, O1-Pruner reduces computational costs without compromising on quality.

    Technical Details and Benefits of O1-Pruner

    At the heart of O1-Pruner is the Length-Harmonizing Fine-Tuning approach, which balances reasoning length and accuracy. The key steps include:

    1. Reference Model Sampling: A reference model evaluates reasoning quality and length by generating multiple solutions for each problem, creating a performance benchmark.
    2. Reward Function Design: This involves two components:
      • Length Reward: Shorter solutions relative to the reference model are encouraged.
      • Accuracy Reward: Ensures that shorter reasoning paths do not compromise correctness.
    3. Reinforcement Learning Framework: Proximal Policy Optimization (PPO) is used to train the model efficiently. Off-policy training further simplifies the workflow and reduces training complexity.

    The benefits of O1-Pruner include:

    • Improved Efficiency: Reduces redundant computations, leading to faster inference.
    • Accuracy Preservation: Ensures that shorter solutions maintain or even enhance accuracy.
    • Task Adaptability: Dynamically adjusts reasoning depth based on problem complexity, making it applicable to a variety of tasks.

    Results and Insights

    Experiments on mathematical reasoning benchmarks such as MATH, GSM8K, and GaoKao showcase O1-Pruner’s effectiveness. For example:

    • The Marco-o1-7B model, fine-tuned with O1-Pruner, achieved a 40.5% reduction in solution length while improving accuracy to 76.8%.
    • The QwQ-32B-Preview model demonstrated a 34.7% reduction in solution length alongside a slight accuracy increase to 89.3%.

    Inference time also improved significantly. On the MATH dataset:

    Hostinger
    • Marco-o1-7B reduced its inference time from 2 minutes to just over 1 minute.
    • QwQ-32B-Preview decreased from 6 minutes to approximately 4 minutes.

    These results highlight O1-Pruner’s ability to balance accuracy and efficiency. Its superior performance, as measured by the Accuracy-Efficiency Score (AES), establishes it as a better alternative to other methods like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

    Conclusion

    O1-Pruner demonstrates that efficient reasoning in LLMs is achievable without compromising accuracy. By harmonizing reasoning length with problem complexity, it addresses the computational inefficiencies inherent in long-thought reasoning. This work lays the groundwork for further advancements in optimizing reasoning models, enabling their application in diverse, real-world scenarios where efficiency and accuracy are equally critical.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post O1-Pruner: Streamlining Long-Thought Reasoning in Language Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMicrosoft AI Introduces Sigma: An Efficient Large Language Model Tailored for AI Infrastructure Optimization
    Next Article Mobile-Agent-E: A Hierarchical Multi-Agent Framework Combining Cognitive Science and AI to Redefine Complex Task Handling on Smartphones

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Top Parseur alternatives to automate data extraction

    Artificial Intelligence

    I’m an audiophile, and these $150 wireless earbuds for gaming had me fooled

    News & Updates

    为 AI 重新定义数据库:MongoDB 为何收购 Voyage AI

    Databases

    Qualtrics bets its new ’empathetic’ AI agents can fix customer service

    News & Updates
    GetResponse

    Highlights

    Development

    Hugging Face Researchers Introduce Idefics2: A Powerful 8B Vision-Language Model Elevating Multimodal AI Through Advanced OCR and Native Resolution Techniques

    April 18, 2024

    As digital interactions become increasingly complex, the demand for sophisticated analytical tools to understand and…

    Using Agents for Amazon Bedrock to interactively generate infrastructure as code

    July 11, 2024

    This AI Paper from King’s College London Introduces a Theoretical Analysis of Neural Network Architectures Through Topos Theory

    April 5, 2024

    Enjoying Self-Hosting Software Locally With CasaOS and Raspberry Pi

    March 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.