This AI Paper from China Introduces MiniCPM: Introducing Innovative Small Language Models Through Scalable Training Approaches

Developing Large Language Models (LLMs) with trillions of parameters is costly and resource-intensive, prompting interest in exploring Small Language Models (SLMs) as a more efficient option. Despite their potential, LLMs pose challenges due to their immense training costs and operational inefficiencies. Understanding their training mechanisms is elusive, making experiments prohibitively expensive. Also, deploying such large models on devices like PCs or smartphones is often impractical or inefficient.

Recent interest in SLMs has led to the emergence of innovative models like the Phi series, TinyLlama, MobileLLM, and Gemma. While these models have enriched the SLM field, they still struggle in two key areas: replicating the comprehensive abilities of LLMs and establishing transparent, scalable training methods beneficial for both SLMs and LLMsâ€™ advancement.

The researchers from the Department of Computer Science and Technology, Tsinghua University, and Modelbest Inc. introduce MiniCPM, comprising 1.2B and 2.4B non-embedding parameter variants, which rival 7B-13B LLMs in performance while focusing on SLMs. Their approach emphasizes scalability in model and data dimensions for future LLM research. They utilize extensive model wind tunnel experiments for stable scaling and introduce a Warmup-Stable-Decay (WSD) learning rate scheduler for data scaling, facilitating continuous training and domain adaptation. This method enables efficient study of the data-model scaling law and introduces variants like MiniCPM-DPO, MiniCPM-MoE, and MiniCPM-128K.

The Cosine Learning Rate Scheduler (LRS) is vital for adjusting learning rates during training. It gradually reduces the learning rate following a cosine curve after a warmup stage, with a key parameter T indicating when the decrease first reaches the minimum. Setting T equal to the total training steps S isnâ€™t optimal; both T < S and T > S yield suboptimal results. Cosine LRS performs best when T = S due to longer high learning rate training and thorough decay phases, aiding in finding global and local optima. Instead of Cosine LRS, the Warmup-Stable-Decay (WSD) LRS is proposed, dividing training into warmup, stable, and decay stages to enhance performance.

Observations show that, on average, MiniCPM-2.4B ranks highest among SLMs. It performs similarly to Mistral-7B-v0.1 in English but surpasses it significantly in Chinese. MiniCPM-2.4B outperforms Llama2-13B in most areas except MMLU, BBH, and HellaSwag, while MiniCPM-1.2B outperforms Llama2-7B except in HellaSwag. Generally, BBH poses more difficulty for SLMs than LLMs in knowledge-oriented datasets, suggesting reasoning abilityâ€™s reliance on model size over knowledge. Phi-2 matches MiniCPMâ€™s performance on academic datasets, possibly due to their emphasis on educational contexts in training data.

In conclusion, This paper introduces MiniCPM, featuring two SLMs with 2.4B and 1.2B non-embedding parameters, respectively, outperforming larger models. Their scalable training methodologies show promise for both model and data size, with inspiring potential applications in LLM development. The WSD scheduler enhances continuous training and facilitates efficient scaling law study. The MiniCPM family, including DPO, long context, and MoE versions, is introduced, with future directions aiming to analyze loss decrease in the decay stage and enhance MiniCPMâ€™s capability through scaling in model and data size.Â

Check out theÂ Paper.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 40k+ ML SubReddit

Want to get in front of 1.5 Million AI Audience? Work with us here

The post This AI Paper from China Introduces MiniCPM: Introducing Innovative Small Language Models Through Scalable Training Approaches appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

This AI Paper from China Introduces MiniCPM: Introducing Innovative Small Language Models Through Scalable Training Approaches

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

Rockstar2FA Collapse Fuels Expansion of FlowerStorm Phishing-as-a-Service

CVE-2025-28103 – LaskBlog Arbitrary Account Deletion Vulnerability

Two flagship LG OLED TVs released at CES 2025: Specs, best new features, and more

Marvel Rivals is bringing a wave of swimsuits this summer

Setting up CloudFront using Python

Wrote a 3min blog on how Linear Algebra is used in Machine Learning

4 Reasons Your SaaS Attack Surface Can No Longer be Ignored

Srinidhi’s Stellar Odyssey: Unveiling the Lost World of the Cosmos

This AI Paper from China Introduces MiniCPM: Introducing Innovative Small Language Models Through Scalable Training Approaches

Related Posts