Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Top 10 Use Cases of Vibe Coding in Large-Scale Node.js Applications

      September 3, 2025

      Cloudsmith launches ML Model Registry to provide a single source of truth for AI models and datasets

      September 3, 2025

      Kong Acquires OpenMeter to Unlock AI and API Monetization for the Agentic Era

      September 3, 2025

      Microsoft Graph CLI to be retired

      September 2, 2025

      ‘Cronos: The New Dawn’ was by far my favorite experience at Gamescom 2025 — Bloober might have cooked an Xbox / PC horror masterpiece

      September 4, 2025

      ASUS built a desktop gaming PC around a mobile CPU — it’s an interesting, if flawed, idea

      September 4, 2025

      Hollow Knight: Silksong arrives on Xbox Game Pass this week — and Xbox’s September 1–7 lineup also packs in the horror. Here’s every new game.

      September 4, 2025

      The Xbox remaster that brought Gears to PlayStation just passed a huge milestone — “ending the console war” and proving the series still has serious pulling power

      September 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Magento (Adobe Commerce) or Optimizely Configured Commerce: Which One to Choose

      September 4, 2025
      Recent

      Magento (Adobe Commerce) or Optimizely Configured Commerce: Which One to Choose

      September 4, 2025

      Updates from N|Solid Runtime: The Best Open-Source Node.js RT Just Got Better

      September 3, 2025

      Scale Your Business with AI-Powered Solutions Built for Singapore’s Digital Economy

      September 3, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      ‘Cronos: The New Dawn’ was by far my favorite experience at Gamescom 2025 — Bloober might have cooked an Xbox / PC horror masterpiece

      September 4, 2025
      Recent

      ‘Cronos: The New Dawn’ was by far my favorite experience at Gamescom 2025 — Bloober might have cooked an Xbox / PC horror masterpiece

      September 4, 2025

      ASUS built a desktop gaming PC around a mobile CPU — it’s an interesting, if flawed, idea

      September 4, 2025

      Hollow Knight: Silksong arrives on Xbox Game Pass this week — and Xbox’s September 1–7 lineup also packs in the horror. Here’s every new game.

      September 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Do Reasoning Models Really Need Transformers?: Researchers from TogetherAI, Cornell, Geneva, and Princeton Introduce M1—A Hybrid Mamba-Based AI that Matches SOTA Performance at 3x Inference Speed

    Do Reasoning Models Really Need Transformers?: Researchers from TogetherAI, Cornell, Geneva, and Princeton Introduce M1—A Hybrid Mamba-Based AI that Matches SOTA Performance at 3x Inference Speed

    April 18, 2025

    Effective reasoning is crucial for solving complex problems in fields such as mathematics and programming, and LLMs have demonstrated significant improvements through long-chain-of-thought reasoning. However, transformer-based models face limitations due to their quadratic computational complexity and linear memory requirements, making it challenging to process long sequences efficiently. While techniques such as Chain of Thought (CoT) reasoning and adaptive compute allocation have helped boost model performance, these methods also increase computational costs. Additionally, generating multiple outputs and selecting the best one has been explored as a way to enhance reasoning accuracy. However, such methods still depend on transformer-based architectures, which struggle with scalability in large-batch, long-context tasks.

    To address these challenges, alternatives to the transformer architecture have been explored, including RNN-based models, state space models (SSMs), and linear attention mechanisms, which offer more efficient memory usage and faster inference. Hybrid models combining self-attention with subquadratic layers have also been developed to improve inference-time scaling. Moreover, knowledge distillation techniques, which transfer capabilities from large models to smaller ones, have shown promise in maintaining reasoning performance while reducing model size. Research into cross-architecture distillation, such as transferring knowledge from transformers to RNNs or SSMs, is ongoing to achieve high reasoning capabilities in smaller, more efficient models.

    Researchers from TogetherAI, Cornell University, the University of Geneva, and Princeton University present M1, a hybrid linear RNN reasoning model built on the Mamba architecture, which enhances memory-efficient inference. M1 is trained through a combination of distillation, supervised fine-tuning, and reinforcement learning. Experimental results on the AIME and MATH benchmarks show M1 outperforms previous linear RNN models and matches the performance of DeepSeek R1 distilled transformers. Additionally, M1 achieves a 3x speedup in inference compared to transformers of the same size, boosting reasoning accuracy through techniques like self-consistency and verification, making it a powerful model for large-scale inference.

    The M1 model is built through a three-stage process: distillation, SFT, and RL. First, a pretrained Transformer model is distilled into the Mamba architecture, with a modified approach to linear projections and additional parameters for better performance. In the SFT stage, the model is fine-tuned on math problem datasets, first with general datasets and then with reasoning-focused datasets from the R1 model series. Finally, RL is applied using GRPO, which enhances the model’s reasoning ability by training with advantage estimates and encouraging diversity in its responses, thereby further boosting its performance.

    The experiment uses the Llama3.2-3 B-Instruct models as the target for distillation, with the Mamba layers utilizing a 16-sized SSM state. The evaluation encompasses a range of math benchmarks, including MATH500, AIME25, and Olympiad Bench, assessing model performance based on coverage and accuracy. The pass@k metric is used for coverage, indicating the likelihood of a correct solution among generated samples. The model’s performance is compared with that of various state-of-the-art models, yielding competitive results, particularly in reasoning tasks. The inference speed and test-time scaling are evaluated, demonstrating M1’s efficiency in large-batch generation and longer sequence contexts.

    In conclusion, M1 is a hybrid reasoning model based on the Mamba architecture, designed to overcome scalability issues in Transformer models. By employing distillation and fine-tuning techniques, M1 achieves performance comparable to state-of-the-art reasoning models. It offers more than 3x faster inference than similar-sized Transformer models, especially with large batch sizes, making resource-intensive strategies like self-consistency more feasible. M1 outperforms linear RNN models and matches Deepseek R1’s performance on benchmarks such as AIME and MATH. Additionally, it demonstrates superior accuracy under fixed time budgets, making it a strong, efficient alternative to Transformer-based architectures for mathematical reasoning tasks.


    Here is the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Do Reasoning Models Really Need Transformers?: Researchers from TogetherAI, Cornell, Geneva, and Princeton Introduce M1—A Hybrid Mamba-Based AI that Matches SOTA Performance at 3x Inference Speed appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Hands-On Tutorial: Build a Modular LLM Evaluation Pipeline with Google Generative AI and LangChain
    Next Article Affordable and Reliable 4o Image API (The latest released)

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-52802 – EnguerranWS Import YouTube videos as WP Posts Missing Authorization Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-20148 – “Cisco Secure Firewall Management Center FMC HTML Injection Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    Recommend a Good QA Book [closed]

    Development

    These smart glasses can read menus and ‘see for you’, thanks to AI

    News & Updates

    Highlights

    Artificial Intelligence

    Robotic probe quickly measures key properties of new materials

    July 4, 2025

    Scientists are striving to discover new semiconductor materials that could boost the efficiency of solar…

    Critter – chess UCI engine

    June 20, 2025

    AboutHimachal

    July 15, 2025

    Gears of War: Reloaded — Release date, price, and everything you need to know

    May 18, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.