Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Tiny Screens, Big Impact: The Forgotten Art Of Developing Web Apps For Feature Phones

      July 16, 2025

      Kong AI Gateway 3.11 introduces new method for reducing token costs

      July 16, 2025

      Native vs hybrid vs cross-platform: Resolving the trilemma

      July 16, 2025

      JetBrains updates Junie, Gemini API adds embedding model, and more – Daily News Digest

      July 16, 2025

      Cyberpunk 2077 Update 2.3 is bringing more vehicle customization, photo mode options, and one amazing new feature — launching this week

      July 16, 2025

      The cheapest place to get my games just got even cheaper — get an extra 10% off while you can

      July 16, 2025

      Destiny 2: The Edge of Fate reviews open ‘Mixed’ on Steam, with a player count only a fraction of The Final Shape’s — I’m surprised it’s this low after a new expansion

      July 16, 2025

      A rare opportunity is here to get an HP gaming laptop for only $500 — NVIDIA RTX graphics and a 144Hz display at a bargain price

      July 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      July 17, 2025
      Recent

      The details of TC39’s last meeting

      July 17, 2025

      Vector Search Embeddings and RAG

      July 16, 2025

      Python Meets Power Automate: Trigger via URL

      July 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      FOSS Weekly #25.29: End of Ubuntu 24.10, AUR Issue, Terminal Tips, Screenshot Editing and More Linux Stuff

      July 17, 2025
      Recent

      FOSS Weekly #25.29: End of Ubuntu 24.10, AUR Issue, Terminal Tips, Screenshot Editing and More Linux Stuff

      July 17, 2025

      Cyberpunk 2077 Update 2.3 is bringing more vehicle customization, photo mode options, and one amazing new feature — launching this week

      July 16, 2025

      The cheapest place to get my games just got even cheaper — get an extra 10% off while you can

      July 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces ARM and Ada-GRPO: Adaptive Reasoning Models for Efficient and Scalable Problem-Solving

    This AI Paper Introduces ARM and Ada-GRPO: Adaptive Reasoning Models for Efficient and Scalable Problem-Solving

    May 31, 2025

    Reasoning tasks are a fundamental aspect of artificial intelligence, encompassing areas like commonsense understanding, mathematical problem-solving, and symbolic reasoning. These tasks often involve multiple steps of logical inference, which large language models (LLMs) attempt to mimic through structured approaches such as chain-of-thought (CoT) prompting. However, as LLMs grow in size and complexity, they tend to produce longer outputs across all tasks, regardless of difficulty, leading to significant inefficiencies. The field has been striving to balance the depth of reasoning with computational cost while also ensuring that models can adapt their reasoning strategies to meet the unique needs of each problem.

    A key issue with current reasoning models is the inability to tailor the reasoning process to different task complexities. Most models, including well-known ones like OpenAI’s o1 and DeepSeek-R1, apply a uniform strategy—typically relying on Long CoT across all tasks. This causes the “overthinking” problem, where models generate unnecessarily verbose explanations for simpler tasks. Not only does this waste resources, but it also degrades accuracy, as excessive reasoning can introduce irrelevant information. Approaches such as prompt-guided generation or token budget estimation have attempted to mitigate this issue. Still, these methods are limited by their dependence on predefined assumptions, which are not always reliable for diverse tasks.

    Attempts to address these issues include methods like GRPO (Group Relative Policy Optimization), length-penalty mechanisms, and rule-based prompt controls. While GRPO enables models to learn different reasoning strategies by rewarding correct answers, it leads to a “format collapse,” where models increasingly rely on Long CoT, crowding out more efficient formats, such as Short CoT or Direct Answer. Length-penalty techniques, such as those applied in methods like THINKPRUNE, control output length during training or inference, but often at the cost of reduced accuracy, especially in complex problem-solving tasks. These solutions struggle to achieve a consistent trade-off between reasoning effectiveness and efficiency, highlighting the need for an adaptive approach.

    A team of researchers from Fudan University and Ohio State University introduced the Adaptive Reasoning Model (ARM), which dynamically adjusts reasoning formats based on task difficulty. ARM supports four distinct reasoning styles: Direct Answer for simple tasks, Short CoT for concise reasoning, Code for structured problem-solving, and Long CoT for deep multi-step reasoning. It operates in an Adaptive Mode by default, automatically selecting the appropriate format, and also provides Instruction-Guided and Consensus-Guided Modes for explicit control or aggregation across formats. The key innovation lies in its training process, which utilizes Ada-GRPO, an extension of GRPO that introduces a format diversity reward mechanism. This prevents the dominance of Long CoT and ensures that ARM continues to explore and use simpler reasoning formats when appropriate.

    The ARM methodology is built on a two-stage framework. First, the model undergoes Supervised Fine-Tuning (SFT) with 10.8K questions, each annotated across four reasoning formats, sourced from datasets like AQuA-Rat and generated with tools such as GPT-4o and DeepSeek-R1. This stage teaches the model the structure of each reasoning format but does not instill adaptiveness. The second stage applies Ada-GRPO, where the model receives scaled rewards for using less frequent formats, such as Direct Answer or Short CoT. A decaying factor ensures that this reward gradually shifts back to accuracy as training progresses, preventing long-term bias toward inefficient exploration. This structure enables ARM to avoid format collapse and dynamically match reasoning strategies to task difficulty, achieving a balance of efficiency and performance.

    ARM demonstrated impressive results across various benchmarks, including commonsense, mathematical, and symbolic reasoning tasks. It reduced token usage by an average of 30%, with reductions as high as 70% for simpler tasks, compared to models relying solely on Long CoT. ARM achieved a 2x training speedup over GRPO-based models, accelerating model development without sacrificing accuracy. For example, ARM-7B achieved 75.9% accuracy on the challenging AIME’25 task while using 32.5% fewer tokens. ARM-14B achieved 85.6% accuracy on OpenBookQA and 86.4% accuracy on the MATH dataset, with a token usage reduction of over 30% compared to Qwen2.5SFT+GRPO models. These numbers demonstrate ARM’s ability to maintain competitive performance while delivering significant efficiency gains.

    Overall, the Adaptive Reasoning Model addresses the persistent inefficiency of reasoning models by enabling the adaptive selection of reasoning formats based on task difficulty. The introduction of Ada-GRPO and the multi-format training framework ensures that models no longer waste resources on overthinking. Instead, ARM provides a flexible and practical solution for balancing accuracy and computational cost in reasoning tasks, making it a promising approach for scalable and efficient large language models.


    Check out the Paper, Models on Hugging Face and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post This AI Paper Introduces ARM and Ada-GRPO: Adaptive Reasoning Models for Efficient and Scalable Problem-Solving appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCVE-2025-5374 – PHPGurukul Online Birth Certificate System SQL Injection Vulnerability
    Next Article A Coding Guide to Building a Scalable Multi-Agent Communication Systems Using Agent Communication Protocol (ACP)

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 17, 2025
    Machine Learning

    Accenture scales video analysis with Amazon Nova and Amazon Bedrock Agents

    July 16, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    I replaced my Windows laptop with a ‘premium’ Chromebook – and can’t go back

    News & Updates

    Salesforce Industry Cloud Hit by 20 Vulnerabilities Including 0days

    Security

    Outlaw Group Uses SSH Brute-Force to Deploy Cryptojacking Malware on Linux Servers

    Development

    Transform JSON into Typed Collections with Laravel’s AsCollection::of()

    Development

    Highlights

    Development

    Universal Design Principles Supporting Operable Content – Low Physical Effort

    April 17, 2025

    The principle of Low Physical Effort is central to creating operable content and environments that…

    Lakeflow: Revolutionizing SCD2 Pipelines with Change Data Capture (CDC)

    June 21, 2025

    CVE-2025-37994 – “Linux Kernel USB TypeC UCSI NULL Pointer Access Vulnerability”

    May 29, 2025

    CVE-2025-4159 – PCMan FTP Server GLOB Command Handler Buffer Overflow Vulnerability

    May 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.