Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      tRPC vs GraphQL vs REST: Choosing the right API design for modern web applications

      June 26, 2025

      Jakarta EE 11 Platform launches with modernized Test Compatibility Kit framework

      June 26, 2025

      Can Good UX Protect Older Users From Digital Scams?

      June 25, 2025

      Warp 2.0 evolves terminal experience into an Agentic Development Environment

      June 25, 2025

      Microsoft Copilot secures a spot in classrooms as a “thought partner” — with Copilot Chat backed by OpenAI’s GPT-4o

      June 26, 2025

      OpenAI started as a “countervailing force” to Google — did Elon Musk and Sam Altman torpedo DeepMind’s plans to dictate AGI?

      June 26, 2025

      Gears of War: Reloaded preorders — where to buy and everything you need to know

      June 26, 2025

      OpenAI’s Sam Altman breaks silence on Microsoft feud with Satya Nadella — citing “points of tension” amid evolution plans

      June 26, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Are Semantic Layers Sexy Again? or The Rise and Fall and Rise of Semantic Layers

      June 26, 2025
      Recent

      Are Semantic Layers Sexy Again? or The Rise and Fall and Rise of Semantic Layers

      June 26, 2025

      Salesforce Marketing Cloud Engagement vs. Oracle Eloqua

      June 26, 2025

      Exploring Lucidworks Fusion and Coveo Using Apache Solr

      June 26, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft Copilot secures a spot in classrooms as a “thought partner” — with Copilot Chat backed by OpenAI’s GPT-4o

      June 26, 2025
      Recent

      Microsoft Copilot secures a spot in classrooms as a “thought partner” — with Copilot Chat backed by OpenAI’s GPT-4o

      June 26, 2025

      OpenAI started as a “countervailing force” to Google — did Elon Musk and Sam Altman torpedo DeepMind’s plans to dictate AGI?

      June 26, 2025

      Gears of War: Reloaded preorders — where to buy and everything you need to know

      June 26, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization

    Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization

    May 10, 2025

    Sparse large language models (LLMs) based on the Mixture of Experts (MoE) framework have gained traction for their ability to scale efficiently by activating only a subset of parameters per token. This dynamic sparsity allows MoE models to retain high representational capacity while limiting computation per token. However, with their increasing complexity and model size approaching trillions of parameters, training them efficiently requires algorithmic innovation and a tightly integrated hardware-software optimization. These challenges are especially relevant when deploying models on non-standard AI accelerators like Ascend NPUs, which require specific architectural alignment to deliver optimal performance.

    A major technical challenge lies in the inefficient utilization of hardware resources while training sparse LLMs. Since only a portion of parameters are active for each token, workloads across devices become unbalanced, leading to synchronization delays and underused processing power. This imbalance also affects memory utilization as different experts process different numbers of tokens, sometimes exceeding capacity. These inefficiencies are compounded at a large scale, such as across thousands of AI chips, where communication and memory management bottlenecks significantly hinder throughput. The inability to fully harness the computational promise of sparsity in practice restricts the deployment of such models on hardware systems like Ascend NPUs.

    Several strategies have been proposed to tackle these challenges. These include auxiliary losses to balance token distribution across experts and drop-and-pad strategies that limit expert overload by discarding tokens exceeding capacity. However, these techniques either reduce model performance or introduce inefficiencies in memory and computation. Other efforts include heuristic expert placement and traditional communication patterns like All-to-All dispatching, but these often fail to scale well or maintain high throughput. Moreover, standard memory-saving techniques like recomputation are usually coarse-grained, targeting whole layers instead of specific operations, leading to increased runtime without proportional memory savings.

    Researchers from the Pangu team at Huawei Cloud introduced a highly structured and optimized training approach for large MoE models tailored to Ascend NPUs. They developed Pangu Ultra MoE, a sparse LLM with 718 billion parameters, focusing on aligning model architecture and system design with the capabilities of the Ascend hardware. Their approach begins with a simulation-based model configuration process that evaluates thousands of architecture variants using metrics grounded in actual hardware behavior. These simulations inform design decisions before any physical training is undertaken, thus saving substantial computational resources and enabling informed tuning of model hyperparameters.

    The simulation method analyzes combinations of parameters such as the number of layers, hidden size, and expert count using a five-dimensional parallelism strategy that includes Pipeline Parallelism, Tensor Parallelism, Expert Parallelism, Data Parallelism, and Context Parallelism. The final model configuration adopted by Huawei included 256 experts, a hidden size 7680, and 61 transformer layers. To further optimize performance, researchers integrated an Adaptive Pipe Overlap mechanism to mask communication costs and used hierarchical All-to-All communication to reduce inter-node data transfer. They employed fine-grained recomputation, such as recomputing only key-value vectors in attention modules, and introduced tensor swapping to offload activation memory to host devices dynamically.

    Pangu Ultra MoE achieved a Model Flops Utilization (MFU) of 30.0% and processed tokens at a rate of 1.46 million per second using 6,000 Ascend NPUs. The baseline MFU was 18.9% with 0.61 million tokens per second on 4,000 NPUs. The researchers also introduced dynamic expert placement strategies, improving device-level load balance and achieving a relative 10% MFU improvement. The model performed competitively on benchmark evaluations, attaining 81.3% on AIME2024, 97.4% on MATH500, 94.8% on CLUEWSC, and 91.5% on MMLU. In the healthcare domain, it outperformed DeepSeek R1 by scoring 87.1% on MedQA and 80.8% on MedMCQA, confirming its strength in domain-specific applications.

    This study illustrates how the Pangu team at Huawei effectively tackled the core difficulties of training massive MoE models on specialized hardware. Their systematic architecture search, efficient communication techniques, and tailored memory optimizations represent a strong framework for scalable AI training. The work demonstrates practical ways to unlock the performance potential of sparse models and sets a direction for future system-aware AI design.


    Check out Paper here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit.

    Here’s a brief overview of what we’re building at Marktechpost:

    • ML News Community – r/machinelearningnews (92k+ members)
    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)

    The post Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level Optimization appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Coding Guide to Unlock mem0 Memory for Anthropic Claude Bot: Enabling Context-Rich Conversations
    Next Article ZeroSearch from Alibaba Uses Reinforcement Learning and Simulated Documents to Teach LLMs Retrieval Without Real-Time Search

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 26, 2025
    Machine Learning

    Using Amazon SageMaker AI Random Cut Forest for NASA’s Blue Origin spacecraft sensor data

    June 26, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Codeless Test Automation: Does It Live Up To the Hype (And How to Make the Most Out of It)?

    Web Development

    CVE-2025-6118 – Das Parking Management System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Work 2.0: Emerging Workplace Trends Shaping the Future of Offices.

    Web Development

    CVE-2025-3995 – TOTOLINK N150RT Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    C++ Setup and Installation Tools – CMake, vcpkg, Docker & Copilot Development

    C++ Setup and Installation Tools – CMake, vcpkg, Docker & Copilot

    April 8, 2025

    Setting up a C++ development environment can be one of the most challenging aspects for…

    Sony could raise PlayStation 5 prices in response to tariffs

    May 15, 2025

    Crestic is a configurable restic wrapper

    April 12, 2025

    CVE-2025-32964 – ManageWiki Extension Privilege Escalation Vulnerability

    April 22, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.