Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      UX Job Interview Helpers

      August 5, 2025

      .NET Aspire’s CLI reaches general availability in 9.4 release

      August 5, 2025

      15 Essential Skills to Look for When Hiring Node.js Developers for Enterprise Projects (2025-2026)

      August 4, 2025

      African training program creates developers with cloud-native skills

      August 4, 2025

      Why I’ll keep the Samsung Z Fold 7 over the Pixel 10 Pro Fold – especially if these rumors are true

      August 5, 2025

      You may soon get Starlink internet for a much lower ‘Community’ price – here’s how

      August 5, 2025

      uBlock Origin Lite has finally arrived for Safari – with one important caveat

      August 5, 2025

      Perplexity says Cloudflare’s accusations of ‘stealth’ AI scraping are based on embarrassing errors

      August 5, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Send Notifications in Laravel with Firebase Cloud Messaging and Notifire

      August 5, 2025
      Recent

      Send Notifications in Laravel with Firebase Cloud Messaging and Notifire

      August 5, 2025

      Simplified Batch Job Creation with Laravel’s Enhanced Artisan Command

      August 5, 2025

      Send Notifications in Laravel with Firebase Cloud Messaging and Notifire

      August 5, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      This comfy mesh office chair I’ve been testing costs less than $400 — but there’s a worthy alternative that’s far more affordable

      August 5, 2025
      Recent

      This comfy mesh office chair I’ve been testing costs less than $400 — but there’s a worthy alternative that’s far more affordable

      August 5, 2025

      How to get started with Markdown in the Notepad app for Windows 11

      August 5, 2025

      Microsoft Account Lockout: LibreOffice Developer’s Week-Long Nightmare Raises Concerns

      August 5, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»MiniMax AI Releases MiniMax-M1: A 456B Parameter Hybrid Model for Long-Context and Reinforcement Learning RL Tasks

    MiniMax AI Releases MiniMax-M1: A 456B Parameter Hybrid Model for Long-Context and Reinforcement Learning RL Tasks

    June 19, 2025

    The Challenge of Long-Context Reasoning in AI Models

    Large reasoning models are not only designed to understand language but are also structured to think through multi-step processes that require prolonged attention spans and contextual comprehension. As the expectations from AI grow, especially in real-world and software development environments, researchers have sought architectures that can handle longer inputs and sustain deep, coherent reasoning chains without overwhelming computational costs.

    Computational Constraints with Traditional Transformers

    The primary difficulty in expanding these reasoning capabilities lies in the excessive computational load that comes with longer generation lengths. Traditional transformer-based models employ a softmax attention mechanism, which scales quadratically with the input size. This limits their capacity to handle long input sequences or extended chains of thought efficiently. This problem becomes even more pressing in areas that require real-time interaction or cost-sensitive applications, where inference expenses are significant.

    Existing Alternatives and Their Limitations

    Efforts to address this issue have yielded a range of methods, including sparse attention and linear attention variants. Some teams have experimented with state-space models and recurrent networks as alternatives to traditional attention structures. However, these innovations have seen limited adoption in the most competitive reasoning models due to either architectural complexity or a lack of scalability in real-world deployments. Even large-scale systems, such as Tencent’s Hunyuan-T1, which utilizes a novel Mamba architecture, remain closed-source, thereby restricting wider research engagement and validation.

    Introduction of MiniMax-M1: A Scalable Open-Weight Model

    Researchers at MiniMax AI introduced MiniMax-M1, a new open-weight, large-scale reasoning model that combines a mixture of experts’ architecture with lightning-fast attention. Built as an evolution of the MiniMax-Text-01 model, MiniMax-M1 contains 456 billion parameters, with 45.9 billion activated per token. It supports context lengths of up to 1 million tokens—eight times the capacity of DeepSeek R1. This model addresses compute scalability at inference time, consuming only 25% of the FLOPs required by DeepSeek R1 at 100,000 token generation length. It was trained using large-scale reinforcement learning on a broad range of tasks, from mathematics and coding to software engineering, marking a shift toward practical, long-context AI models.

    Hybrid-Attention with Lightning Attention and Softmax Blocks

    To optimize this architecture, MiniMax-M1 employs a hybrid attention scheme where every seventh transformer block uses traditional softmax attention, followed by six blocks using lightning attention. This significantly reduces computational complexity while preserving performance. The lightning attention itself is I/O-aware, adapted from linear attention, and is particularly effective at scaling reasoning lengths to hundreds of thousands of tokens. For reinforcement learning efficiency, the researchers introduced a novel algorithm called CISPO. Instead of clipping token updates as traditional methods do, CISPO clips importance sampling weights, enabling stable training and consistent token contributions, even in off-policy updates.

    The CISPO Algorithm and RL Training Efficiency

    The CISPO algorithm proved essential in overcoming the training instability faced in hybrid architectures. In comparative studies using the Qwen2.5-32B baseline, CISPO achieved a 2x speedup compared to DAPO. Leveraging this, the full reinforcement learning cycle for MiniMax-M1 was completed in just three weeks using 512 H800 GPUs, with a rental cost of approximately $534,700. The model was trained on a diverse dataset comprising 41 logic tasks generated via the SynLogic framework and real-world software engineering environments derived from the SWE bench. These environments utilized execution-based rewards to guide performance, resulting in stronger outcomes in practical coding tasks.

    Benchmark Results and Comparative Performance

    MiniMax-M1 delivered compelling benchmark results. Compared to DeepSeek-R1 and Qwen3-235B, it excelled in software engineering, long-context processing, and agentic tool use. Although it trailed the latest DeepSeek-R1-0528 in math and coding contests, it surpassed both OpenAI o3 and Claude 4 Opus in long-context understanding benchmarks. Furthermore, it outperformed Gemini 2.5 Pro in the TAU-Bench agent tool use evaluation.

    Conclusion: A Scalable and Transparent Model for Long-Context AI

    MiniMax-M1 presents a significant step forward by offering both transparency and scalability. By addressing the dual challenge of inference efficiency and training complexity, the research team at MiniMax AI has set a precedent for open-weight reasoning models. This work not only brings a solution to compute constraints but also introduces practical methods for scaling language model intelligence into real-world applications.


    Check out the Paper, Model and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

    The post MiniMax AI Releases MiniMax-M1: A 456B Parameter Hybrid Model for Long-Context and Reinforcement Learning RL Tasks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAccelerate foundation model training and inference with Amazon SageMaker HyperPod and Amazon SageMaker Studio
    Next Article Discover the Average Cost for Website Redesign: What to Expect in 2025

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 5, 2025
    Machine Learning

    Discover insights from Microsoft Exchange with the Microsoft Exchange connector for Amazon Q Business

    August 5, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Mamorukun ReCurse! Brings Bullet Hell Action to Xbox Series X|S This September

    Operating Systems

    Discover the Average Cost for Website Redesign: What to Expect in 2025

    Web Development

    CVE-2023-7303 – Q2Apro Q2Apro-On-Site-Notifications Cross Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    OpenRefine – desktop program for data cleanup and transformation

    Linux

    Highlights

    CVE-2025-5864 – Tenda TDSEE App Authentication Bypass

    June 9, 2025

    CVE ID : CVE-2025-5864

    Published : June 9, 2025, 6:15 a.m. | 3 hours, 23 minutes ago

    Description : A vulnerability was found in Tenda TDSEE App up to 1.7.12. It has been declared as problematic. Affected by this vulnerability is an unknown functionality of the file /app/ConfirmSmsCode of the component Password Reset Confirmation Code Handler. The manipulation leads to improper restriction of excessive authentication attempts. The attack can be launched remotely. The complexity of an attack is rather high. The exploitation appears to be difficult. The exploit has been disclosed to the public and may be used. Upgrading to version 1.7.15 is able to address this issue. It is recommended to upgrade the affected component.

    Severity: 3.7 | LOW

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Google’s new Search mode puts classic results back on top – how to access it

    July 25, 2025

    CVE-2025-28200 – Victure RX1800 Default Password Weakness

    May 9, 2025

    How 10 years of UX innovation digitally transformed Transport Focus into the UK’s go-to transport resource

    July 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.