Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025

      I may have found the ultimate monitor for conferencing and productivity, but it has a few weaknesses

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      May report 2025

      June 2, 2025
      Recent

      May report 2025

      June 2, 2025

      Write more reliable JavaScript with optional chaining

      June 2, 2025

      Deploying a Scalable Next.js App on Vercel – A Step-by-Step Guide

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025
      Recent

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meet Baichuan-M1: A New Series of Large Language Models Trained on 20T Tokens with a Dedicated Focus on Enhancing Medical Capabilities

    Meet Baichuan-M1: A New Series of Large Language Models Trained on 20T Tokens with a Dedicated Focus on Enhancing Medical Capabilities

    February 21, 2025

    While LLMs have shown remarkable advancements in general-purpose applications, their development for specialized fields like medicine remains limited. The complexity of medical knowledge and the scarcity of high-quality, domain-specific data make creating highly efficient medical LLMs challenging. Although models like GPT-4 and DeepseekR1 have demonstrated impressive capabilities across industries, their adaptation to the medical domain is hindered by the intricate nature of medical terminology, diverse disciplines, and constantly evolving literature. Unlike general applications, medical AI must interpret highly technical language and provide precise, contextually relevant responses, which traditional LLMs struggle to achieve.

    One major obstacle in building effective medical LLMs is the limited accessibility of high-quality training data, which is restricted due to privacy concerns and regulatory barriers. Medical datasets consist of structured and unstructured information, including clinical notes, textbooks, and research articles, making comprehensive model training difficult. While approaches like fine-tuning general LLMs on medical datasets and applying transfer learning have been explored, these methods often fail to grasp the depth of medical knowledge fully. As a result, such models may perform well on specific tasks but lack the nuanced understanding necessary for complex medical inquiries, highlighting the need for more refined training strategies.

    Researchers at Baichuan Inc. introduced Baichuan-M1, a specialized large language model series designed specifically for medical applications. Unlike traditional models that refine existing architectures through additional pretraining or post-training, Baichuan-M1 is built from scratch with a strong focus on medical expertise. Trained on 20 trillion tokens, including both general and medical-specific data, the model balances broad language understanding with domain-specific precision. It excels in general tasks like coding and mathematics and in medical applications such as diagnostics and treatment recommendations. With an optimized Transformer architecture, Baichuan-M1 sets a new benchmark for AI-driven advancements in healthcare.

    The model architecture follows Llama and similar frameworks, incorporating pre-norm RMSNorm, SwishGlu in the FFN layer, and rotary position embeddings. The study integrates global and sliding window attention to optimize inference efficiency, increasing the head dimension to 256 for global layers. Additionally, temporal short convolutions on key-value attention enhance in-context learning. The model employs a hybrid tokenizer for medical and general text, a curriculum-based training strategy with progressive data complexity, and adaptive gradient clipping for stability. Supervised fine-tuning refines general reasoning and medical-specific tasks, ensuring robust language understanding, medical reasoning, and long-document handling capabilities while maintaining inference efficiency.

    Using various benchmarks, baichuan-M1-14B-Base’s code and mathematical abilities were evaluated against the Qwen2.5 series models. Code generation performance was tested with the EvalPlus framework and Bigcodebench, while mathematical proficiency was assessed using MATH and CMATH datasets. Although the 14B-Instruct variant still lags behind proprietary models like Claude-3.5-Sonnet and GPT-4o, the gap has narrowed significantly. The results demonstrate that Baichuan-M1-14B-Base performs competitively in certain tasks, showcasing its code generation and mathematical reasoning strengths compared to other advanced models.

    In conclusion, Traditional methods for adapting LLMs to specialized fields often involve fine-tuning existing models. However, experiments suggest that further training on pre-existing models can hinder domain-specific improvements without sacrificing general performance. In the medical domain, fine-tuning general models with domain-specific data may be less effective than training from scratch. Baichuan-M1 was developed with this approach, using 20 trillion tokens to enhance medical expertise while maintaining general capabilities. Open-sourcing Baichuan-M1-14B allows further research, though challenges remain in rare disease diagnosis and real-world applications. Its continued evolution could significantly advance AI-driven medical decision-making.


    Check out the Paper, Baichuan-M1-14B-Base and Baichuan-M1-14B-Instruct. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

    The post Meet Baichuan-M1: A New Series of Large Language Models Trained on 20T Tokens with a Dedicated Focus on Enhancing Medical Capabilities appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow Rocket Companies modernized their data science solution on AWS
    Next Article AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in Large Language Models

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Mapping the misuse of generative AI

    Artificial Intelligence

    How to Use Your Raspberry Pi Headlessly with VS Code and SSH (No Monitor Needed)

    Development

    Part 3: A Survey of Analytics Engineering Work at Netflix

    Development

    Avowed Memory of the Deep guide — Should I give the Meteorite to Josep or the Giftbearer?

    News & Updates

    Highlights

    Bombadillo is a non-web browser designed for many protocols

    May 26, 2025

    Bombadillo is a non-web browser, designed for a growing list of protocols operating outside of…

    5 products Apple silently scrapped while unveiling the iPhone 16e this week

    February 21, 2025

    DDoS-for-Hire Empire Dismantled as Poland Arrests Four, U.S. Seizes Nine Domains

    May 8, 2025

    AI Full Audiobook Creator from a single line of prompt : Say hi to Kuluko

    February 19, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.