Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Upwork Freelancers vs Dedicated React.js Teams: What’s Better for Your Project in 2025?

      August 1, 2025

      Is Agile dead in the age of AI?

      August 1, 2025

      Top 15 Enterprise Use Cases That Justify Hiring Node.js Developers in 2025

      July 31, 2025

      The Core Model: Start FROM The Answer, Not WITH The Solution

      July 31, 2025

      Anthropic beats OpenAI as the top LLM provider for business – and it’s not even close

      August 2, 2025

      I bought Samsung’s Galaxy Watch Ultra 2025 – here’s why I have buyer’s remorse

      August 2, 2025

      I can admit when I’m wrong — this 75% wireless gaming keyboard is way better than I thought it would be

      August 2, 2025

      This is Microsoft’s canceled Windows-based Surface Duo — the dual-screen Windows Phone from 2018 that we never got

      August 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      August 2, 2025
      Recent

      The details of TC39’s last meeting

      August 2, 2025

      Enhancing Laravel Queries with Reusable Scope Patterns

      August 1, 2025

      Everything We Know About Livewire 4

      August 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I can admit when I’m wrong — this 75% wireless gaming keyboard is way better than I thought it would be

      August 2, 2025
      Recent

      I can admit when I’m wrong — this 75% wireless gaming keyboard is way better than I thought it would be

      August 2, 2025

      This is Microsoft’s canceled Windows-based Surface Duo — the dual-screen Windows Phone from 2018 that we never got

      August 2, 2025

      Looking for an Ubuntu Manual? Try This Book

      August 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Xiaomi introduced MiMo-7B: A Compact Language Model that Outperforms Larger Models in Mathematical and Code Reasoning through Rigorous Pre-Training and Reinforcement Learning

    Xiaomi introduced MiMo-7B: A Compact Language Model that Outperforms Larger Models in Mathematical and Code Reasoning through Rigorous Pre-Training and Reinforcement Learning

    May 2, 2025

    With rising demand for AI systems that can handle tasks involving multi-step logic, mathematical proofs, and software development, researchers have turned their attention toward enhancing models’ reasoning potential. This capability, once believed to be exclusive to human intelligence, is now actively being pursued in smaller-scale models to make them more efficient and widely deployable. As reasoning-based tasks continue to expand in relevance, encompassing academic problem-solving, automated theorem proving, algorithm design, and complex software debugging, language models are expected to become more than just general-purpose conversational agents. They are being encouraged to become domain-specific problem solvers who can assist professionals and researchers alike.

    One challenge in building reasoning-focused models is achieving strong, simultaneous performance in mathematics and programming while maintaining a relatively small model size. Most competitive results in these domains are achieved by models with approximately 32 billion parameters or more. These large models are often used because smaller ones struggle with generalization and reward optimization in reinforcement learning tasks, particularly when it comes to code-based problem-solving. Sparse reward feedback, limited high-quality data, and weak base model architecture make it difficult to develop compact yet powerful models. Additionally, the data used to train these models is not always curated with reasoning in mind, often resulting in training inefficiencies and limited gains in problem-solving abilities.

    To address reasoning challenges, several models, including OpenAI’s o-series, DeepSeek R1, and Claude 3.7, have been introduced, leveraging massive parameter counts and complex reinforcement learning strategies. These models employ techniques such as step-by-step planning and backtracking to enhance reasoning, particularly in algorithmic thinking and math-related tasks. However, they heavily depend on post-training stages and underplay the importance of high-quality pre-training data. Many also rely on fixed template-based reward systems that are prone to reward hacking. Code generation benchmarks often reveal that these models perform inconsistently in challenging tasks due to shallow pretraining foundations and ineffective reward signal modeling during fine-tuning.

    A research team from Xiaomi introduced the MiMo-7B family of language models with a focused approach to overcoming these barriers. The innovation lies in treating both pre-training and post-training as equally critical phases for developing reasoning capabilities. The base model, MiMo-7B-Base, was trained from scratch using a dataset comprising 25 trillion tokens. This dataset was constructed with a three-stage mixture strategy that progressively increased the share of mathematical and programming content. An additional multiple-token prediction (MTP) objective was introduced during pre-training to improve both performance and inference speed. For post-training, the team developed a curated dataset of 130,000 verifiable math and programming problems, each tagged with difficulty scores. Reinforcement learning was then applied using a difficulty-driven reward framework, allowing more nuanced and effective feedback during training. This resulted in two major variants: MiMo-7B-RL and MiMo-7B-RL-Zero.

    The pre-training methodology started by extracting reasoning-heavy content from web pages, academic papers, and books using a custom HTML extraction tool designed to preserve math equations and code snippets. Unlike generic pipelines, this extractor retained structural elements critical to problem-solving domains. The team then enhanced the PDF parsing tools to interpret scientific and programming content accurately. To prevent data duplication, global deduplication was applied using URL-based and MinHash techniques. The training corpus was filtered using small language models fine-tuned to tag content quality, replacing outdated heuristic-based filters that often removed valuable reasoning examples. High-quality synthetic reasoning data was also generated from advanced models and added in the final stage of training. This three-stage approach resulted in a final training mix comprising 70% math and code data in stage two and an additional 10% of synthetic content in stage three. The maximum context length was extended from 8,192 to 32,768 tokens, ensuring the model could handle long-form reasoning problems.

    In the reinforcement learning stage, the research team engineered a seamless rollout engine to accelerate training and validation. This infrastructure incorporated asynchronous reward computation and early termination mechanisms to reduce GPU idle time, resulting in 2.29 times faster training and 1.96 times faster validation. The model’s policy was optimized using fine-grained rewards derived from the difficulty of test cases, addressing the sparse reward issue in programming benchmarks. Data re-sampling techniques were introduced to maintain training stability and increase rollout sampling efficiency. These strategies collectively enabled the MiMo-7B variants to learn effectively, even from cold-start states where no pre-fine-tuned initialization is available.

    Performance evaluation revealed that MiMo-7B-Base achieved a score of 75.2 on the Big-Bench Hard (BBH) task, surpassing other open-source 7B models. It also performed well on SuperGPQA, which includes graduate-level reasoning questions. The post-trained MiMo-7B-RL scored 55.4 on the AIME 2025 benchmark, surpassing OpenAI’s o1-mini by 4.7 points. On code generation tasks, it outperformed much larger models like DeepSeek-R1-Zero-32B and Qwen2.5-32B-RL-Zero on both LiveCodeBench v5 and v6. These results demonstrate that a properly optimized 7B model can rival or even outperform models with more than four times the number of parameters.

    The MiMo-7B project serves as a concrete demonstration of how pre-training, data quality, and reinforcement learning infrastructure contribute to the final reasoning capability of a language model. By rethinking the pipeline from data extraction to reward computation, the Xiaomi research team achieved compact yet powerful models suitable for real-world applications in mathematics, coding, and logic. Their approach highlights the untapped potential of small models and challenges the assumption that size alone determines intelligence or versatility.

    Key Takeaways from the Research on MiMo-7B:  

    1. MiMo-7B was trained on a massive dataset of 25 trillion tokens, targeting reasoning tasks through the use of structured data mixtures.  
    2. 130,000 math and code problems were used in RL training, each annotated with difficulty scores to enable effective reward shaping.  
    3. Three-stage pre-training raised math and coding content to 70%, followed by 10% synthetic problem-solving data.  
    4. A seamless rollout engine increased RL training speed by 2.29 times and validation by 1.96 times.  
    5. MiMo-7B-RL achieved 55.4 on AIME 2025, outperforming OpenAI o1-mini by 4.7 points.  
    6. MiMo-7B models are publicly available and include all checkpoints: base, SFT, and RL variants.  
    7. The model’s success shows that small, well-designed models can rival or exceed the performance of 32B models in reasoning tasks.  

    Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Xiaomi introduced MiMo-7B: A Compact Language Model that Outperforms Larger Models in Mathematical and Code Reasoning through Rigorous Pre-Training and Reinforcement Learning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMLOps vs DevOps: Unifying AI and Software Development
    Next Article Building a REACT-Style Agent Using Fireworks AI with LangChain that Fetches Data, Generates BigQuery SQL, and Maintains Conversational Memory

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 2, 2025
    Machine Learning

    Meet Trackio: The Free, Local-First, Open-Source Experiment Tracker Python Library that Simplifies and Enhances Machine Learning Workflows

    August 2, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    How to Build Production-Ready Full Stack Apps with the MERN Stack

    Development

    The Three Pillars of Getting Voice AI Right: Data, Design, and Deployment

    Development

    CVE-2025-49281 – Magways PHP Remote File Inclusion Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    The sweet taste of a new idea

    Artificial Intelligence

    Highlights

    CVE-2025-3632 – IBM 4769 Developers Toolkit Buffer Overflow Denial of Service

    May 12, 2025

    CVE ID : CVE-2025-3632

    Published : May 12, 2025, 5:15 p.m. | 2 hours, 27 minutes ago

    Description : IBM 4769 Developers Toolkit 7.0.0 through 7.5.52 could allow a remote attacker to cause a denial of service in the Hardware Security Module (HSM) due to improper memory allocation of an excessive size.

    Severity: 7.5 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    ⚡ Weekly Recap: Chrome 0-Day, Data Wipers, Misused Tools and Zero-Click iPhone Attacks

    June 17, 2025

    CVE-2025-49194 – Apache Server Plaintext Authentication Exposure

    June 12, 2025

    LogoKit Phishing Kit Used in Government, Banking and Logistics Attacks: Cyble

    July 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.