Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 4, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 4, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 4, 2025

      Smashing Animations Part 4: Optimising SVGs

      June 4, 2025

      I test AI tools for a living. Here are 3 image generators I actually use and how

      June 4, 2025

      The world’s smallest 65W USB-C charger is my latest travel essential

      June 4, 2025

      This Spotlight alternative for Mac is my secret weapon for AI-powered search

      June 4, 2025

      Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025
      Recent

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025

      Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

      June 4, 2025

      Cast Model Properties to a Uri Instance in 12.17

      June 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025
      Recent

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025

      Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

      June 4, 2025

      Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

      June 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Adaptive Inference Budget Management in Large Language Models through Constrained Policy Optimization

    Adaptive Inference Budget Management in Large Language Models through Constrained Policy Optimization

    February 10, 2025

    Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, particularly in mathematical problem-solving and coding applications. Research has shown a strong correlation between the length of reasoning chains and improved accuracy in problem-solving outcomes. However, they face significant challenges: while extended reasoning processes enhance problem-solving capabilities, they often lead to inefficient solutions. Models tend to generate unnecessarily lengthy reasoning chains even for simple questions that could be solved more directly. This one-size-fits-all approach to reasoning length creates computational inefficiency and reduces the practical utility of these systems in real-world applications.

    Various methodologies have emerged to enhance LLMs’ reasoning capabilities, with Chain-of-Thought (CoT) being a foundational approach that improves problem-solving by breaking down reasoning into discrete steps. Building upon CoT, researchers have developed more complex techniques such as extended CoT with additional steps, self-reflection mechanisms, multi-turn reasoning, and multi-agent debate systems. Recent developments have focused on scaling up reasoning length, as demonstrated by models like OpenAI-o1 and DeepSeek-R1. However, they generate extensive reasoning chains regardless of the problem’s complexity. This inefficient approach increases computational costs and larger carbon footprints.

    Researchers from Meta AI and The University of Illinois Chicago have proposed an innovative approach to address the inefficiencies in LLM reasoning by developing a system that automatically adjusts reasoning trace lengths based on query complexity. While previous heuristic methods have attempted to improve token efficiency for better accuracy with reduced overhead, this new research takes a reinforcement learning (RL) perspective. Instead of explicitly modeling response lengths or balancing intrinsic and extrinsic rewards, the researchers have developed a grouping methodology, that involves categorizing responses into distinct groups based on their characteristics, creating a comprehensive framework to cover the entire response space while maintaining efficiency.

    The proposed methodology employs a sequence-level notation system that simplifies the complex transition probabilities and intermediate rewards by treating each response as a complete unit. The architecture divides responses into two primary groups, one for regular-length Chain-of-Thought responses and the other for extended responses, each with distinct inference costs. The system operates through a bi-level optimization framework, where resource allocation constraints are defined within a convex polytope that limits the density mass of each group. Moreover, the algorithm uses an iterative approach, solving the upper-level problem through gradient updates while directly addressing the lower-level optimization at each iteration.

    The experimental results demonstrate significant performance improvements across different implementations of the proposed methodology. The supervised fine-tuning (SFT) constructions, SVSFT and ASV-SFT-1, achieve enhanced pass@1 metrics, though at the cost of increased inference requirements. More notably, the ASV-IuB-q+ formulation with parameters set at 50% and 75% show remarkable efficiency improvements, reducing costs by 4.14% at 2.16 times and 5.74% at 4.32 times respectively, matching the performance of SCoRe, a leading RL-based self-correction method. The findings also reveal a noteworthy limitation of prompting-based and SFT-based methods in both absolute improvement and efficiency metrics, suggesting that self-correction capabilities emerge more effectively through RL.

    Hostinger

    In conclusion, researchers introduced a method to overcome the inefficiencies in LLM reasoning. Moreover, they introduced IBPO, a constrained policy optimization framework that implements a weighted Supervised Fine-Tuning update mechanism. This approach determines optimal weights through an integer linear programming solution, in each iteration, built upon the CGPO framework. While the system shows effective constraint adherence and dynamic inference budget allocation in mathematical reasoning tasks, computational resource limitations can be addressed through sample accumulation across multiple steps. Future research directions include expanding the framework’s applicability across different LLM applications and scaling up experimental implementations to test its full potential in various contexts.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

    The post Adaptive Inference Budget Management in Large Language Models through Constrained Policy Optimization appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTutorial to Fine-Tuning Mistral 7B with QLoRA Using Axolotl for Efficient LLM Training
    Next Article Kanye Swastika Shirt

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 4, 2025
    Machine Learning

    A Coding Implementation to Build an Advanced Web Intelligence Agent with Tavily and Gemini AI

    June 4, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Microsoft accidentally says Windows 11 is free for a limited time, then deletes the statement

    Operating Systems

    CVE-2025-3999 – Seeyon Zhiyuan OA Web Application System Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-2300 – Hitachi Ops Center Common Services Information Exposure

    Common Vulnerabilities and Exposures (CVEs)

    Using Ollama to Run LLMs Locally [FREE]

    Learning Resources

    Highlights

    Development

    Taking the Pain Out of Cybersecurity Reporting: A Practical Guide for MSPs

    January 10, 2025

    Cybersecurity reporting is a critical yet often overlooked opportunity for service providers managing cybersecurity for…

    Empirical evidence for code modularity

    December 7, 2024

    Obsidian publishes Avowed roadmap, with the Xbox RPG getting arachnophobia mode, New Game Plus, and more

    May 15, 2025

    Coinbase Details Insider Data Theft in Remarkable Disclosure

    May 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.