Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Representative Line: Brace Yourself

      September 18, 2025

      Beyond the Pilot: A Playbook for Enterprise-Scale Agentic AI

      September 18, 2025

      GitHub launches MCP Registry to provide central location for trusted servers

      September 18, 2025

      MongoDB brings Search and Vector Search to self-managed versions of database

      September 18, 2025

      Distribution Release: Security Onion 2.4.180

      September 18, 2025

      Distribution Release: Omarchy 3.0.1

      September 17, 2025

      Distribution Release: Mauna Linux 25

      September 16, 2025

      Distribution Release: SparkyLinux 2025.09

      September 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      AI Momentum and Perficient’s Inclusion in Analyst Reports – Highlights From 2025 So Far

      September 18, 2025
      Recent

      AI Momentum and Perficient’s Inclusion in Analyst Reports – Highlights From 2025 So Far

      September 18, 2025

      Shopping Portal using Python Django & MySQL

      September 17, 2025

      Perficient Earns Adobe’s Real-time CDP Specialization

      September 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Valve Survey Reveals Slight Retreat in Steam-on-Linux Share

      September 18, 2025
      Recent

      Valve Survey Reveals Slight Retreat in Steam-on-Linux Share

      September 18, 2025

      Review: Elecrow’s All-in-one Starter Kit for Pico 2

      September 18, 2025

      FOSS Weekly #25.38: GNOME 49 Release, KDE Drama, sudo vs sudo-rs, Local AI on Android and More Linux Stuff

      September 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Xiaomi introduced MiMo-7B: A Compact Language Model that Outperforms Larger Models in Mathematical and Code Reasoning through Rigorous Pre-Training and Reinforcement Learning

    Xiaomi introduced MiMo-7B: A Compact Language Model that Outperforms Larger Models in Mathematical and Code Reasoning through Rigorous Pre-Training and Reinforcement Learning

    May 2, 2025

    With rising demand for AI systems that can handle tasks involving multi-step logic, mathematical proofs, and software development, researchers have turned their attention toward enhancing models’ reasoning potential. This capability, once believed to be exclusive to human intelligence, is now actively being pursued in smaller-scale models to make them more efficient and widely deployable. As reasoning-based tasks continue to expand in relevance, encompassing academic problem-solving, automated theorem proving, algorithm design, and complex software debugging, language models are expected to become more than just general-purpose conversational agents. They are being encouraged to become domain-specific problem solvers who can assist professionals and researchers alike.

    One challenge in building reasoning-focused models is achieving strong, simultaneous performance in mathematics and programming while maintaining a relatively small model size. Most competitive results in these domains are achieved by models with approximately 32 billion parameters or more. These large models are often used because smaller ones struggle with generalization and reward optimization in reinforcement learning tasks, particularly when it comes to code-based problem-solving. Sparse reward feedback, limited high-quality data, and weak base model architecture make it difficult to develop compact yet powerful models. Additionally, the data used to train these models is not always curated with reasoning in mind, often resulting in training inefficiencies and limited gains in problem-solving abilities.

    To address reasoning challenges, several models, including OpenAI’s o-series, DeepSeek R1, and Claude 3.7, have been introduced, leveraging massive parameter counts and complex reinforcement learning strategies. These models employ techniques such as step-by-step planning and backtracking to enhance reasoning, particularly in algorithmic thinking and math-related tasks. However, they heavily depend on post-training stages and underplay the importance of high-quality pre-training data. Many also rely on fixed template-based reward systems that are prone to reward hacking. Code generation benchmarks often reveal that these models perform inconsistently in challenging tasks due to shallow pretraining foundations and ineffective reward signal modeling during fine-tuning.

    A research team from Xiaomi introduced the MiMo-7B family of language models with a focused approach to overcoming these barriers. The innovation lies in treating both pre-training and post-training as equally critical phases for developing reasoning capabilities. The base model, MiMo-7B-Base, was trained from scratch using a dataset comprising 25 trillion tokens. This dataset was constructed with a three-stage mixture strategy that progressively increased the share of mathematical and programming content. An additional multiple-token prediction (MTP) objective was introduced during pre-training to improve both performance and inference speed. For post-training, the team developed a curated dataset of 130,000 verifiable math and programming problems, each tagged with difficulty scores. Reinforcement learning was then applied using a difficulty-driven reward framework, allowing more nuanced and effective feedback during training. This resulted in two major variants: MiMo-7B-RL and MiMo-7B-RL-Zero.

    The pre-training methodology started by extracting reasoning-heavy content from web pages, academic papers, and books using a custom HTML extraction tool designed to preserve math equations and code snippets. Unlike generic pipelines, this extractor retained structural elements critical to problem-solving domains. The team then enhanced the PDF parsing tools to interpret scientific and programming content accurately. To prevent data duplication, global deduplication was applied using URL-based and MinHash techniques. The training corpus was filtered using small language models fine-tuned to tag content quality, replacing outdated heuristic-based filters that often removed valuable reasoning examples. High-quality synthetic reasoning data was also generated from advanced models and added in the final stage of training. This three-stage approach resulted in a final training mix comprising 70% math and code data in stage two and an additional 10% of synthetic content in stage three. The maximum context length was extended from 8,192 to 32,768 tokens, ensuring the model could handle long-form reasoning problems.

    In the reinforcement learning stage, the research team engineered a seamless rollout engine to accelerate training and validation. This infrastructure incorporated asynchronous reward computation and early termination mechanisms to reduce GPU idle time, resulting in 2.29 times faster training and 1.96 times faster validation. The model’s policy was optimized using fine-grained rewards derived from the difficulty of test cases, addressing the sparse reward issue in programming benchmarks. Data re-sampling techniques were introduced to maintain training stability and increase rollout sampling efficiency. These strategies collectively enabled the MiMo-7B variants to learn effectively, even from cold-start states where no pre-fine-tuned initialization is available.

    Performance evaluation revealed that MiMo-7B-Base achieved a score of 75.2 on the Big-Bench Hard (BBH) task, surpassing other open-source 7B models. It also performed well on SuperGPQA, which includes graduate-level reasoning questions. The post-trained MiMo-7B-RL scored 55.4 on the AIME 2025 benchmark, surpassing OpenAI’s o1-mini by 4.7 points. On code generation tasks, it outperformed much larger models like DeepSeek-R1-Zero-32B and Qwen2.5-32B-RL-Zero on both LiveCodeBench v5 and v6. These results demonstrate that a properly optimized 7B model can rival or even outperform models with more than four times the number of parameters.

    The MiMo-7B project serves as a concrete demonstration of how pre-training, data quality, and reinforcement learning infrastructure contribute to the final reasoning capability of a language model. By rethinking the pipeline from data extraction to reward computation, the Xiaomi research team achieved compact yet powerful models suitable for real-world applications in mathematics, coding, and logic. Their approach highlights the untapped potential of small models and challenges the assumption that size alone determines intelligence or versatility.

    Key Takeaways from the Research on MiMo-7B:  

    1. MiMo-7B was trained on a massive dataset of 25 trillion tokens, targeting reasoning tasks through the use of structured data mixtures.  
    2. 130,000 math and code problems were used in RL training, each annotated with difficulty scores to enable effective reward shaping.  
    3. Three-stage pre-training raised math and coding content to 70%, followed by 10% synthetic problem-solving data.  
    4. A seamless rollout engine increased RL training speed by 2.29 times and validation by 1.96 times.  
    5. MiMo-7B-RL achieved 55.4 on AIME 2025, outperforming OpenAI o1-mini by 4.7 points.  
    6. MiMo-7B models are publicly available and include all checkpoints: base, SFT, and RL variants.  
    7. The model’s success shows that small, well-designed models can rival or exceed the performance of 32B models in reasoning tasks.  

    Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Xiaomi introduced MiMo-7B: A Compact Language Model that Outperforms Larger Models in Mathematical and Code Reasoning through Rigorous Pre-Training and Reinforcement Learning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMLOps vs DevOps: Unifying AI and Software Development
    Next Article Building a REACT-Style Agent Using Fireworks AI with LangChain that Fetches Data, Generates BigQuery SQL, and Maintains Conversational Memory

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Model predicts long-term effects of nuclear waste on underground disposal systems

    Artificial Intelligence

    Modeling Speech Emotion With Label Variance and Analyzing Performance Across Speakers and Unseen Acoustic Conditions

    Machine Learning

    Using Large Language Models on Amazon Bedrock for multi-step task execution

    Machine Learning

    µgRD – generate a custom initramfs environment

    Linux

    Highlights

    Mandiant: kwetsbaarheden in vpn-software vaakst aangevallen vorig jaar

    April 29, 2025

    Mandiant: kwetsbaarheden in vpn-software vaakst aangevallen vorig jaar

    Kwetsbaarheden in vpn-software blijven een zeer populair doelwit van aanvallers, zo stelt Mandiant. De helft van alle cyberincidenten die het securitybedrijf vorig jaar onderzocht begonnen via kwetsba …
    Read more

    Published Date:
    Apr 28, 2025 (18 hours, 7 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2024-3400

    CVE-2023-48788

    CVE-2024-21887

    CVE-2023-46805

    Is Chrome Copying Edge? ‘Omnibox Tools’ Bring Edge-Style Address Bar Shortcuts

    June 14, 2025

    Tosca : Guidelines and Best Practices

    June 7, 2025

    ASCII Draw lets you sketch anything using characters

    May 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.