Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 31, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 31, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 31, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 31, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025

      How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025
      Recent

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025

      Filament Is Now Running Natively on Mobile

      May 31, 2025

      How Remix is shaking things up

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025
      Recent

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference

    DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference

    February 25, 2025

    Large language models that use the Mixture-of-Experts (MoE) architecture have enabled significant increases in model capacity without a corresponding rise in computation. However, this approach also introduces challenges—especially when it comes to communication between GPUs. In MoE models, only a subset of experts is active for any given token, so efficiently exchanging data among devices is critical. Traditional methods for all-to-all communication can create bottlenecks that increase latency and underutilize GPU resources. In latency-sensitive settings, such as real-time inference, even small delays can affect overall performance. Moreover, while low-precision operations (such as FP8) help reduce memory usage, they require careful optimization to maintain model quality. These issues underscore the need for a communication library tailored to the specific demands of expert parallelism.

    DeepSeek AI has recently introduced DeepEP, a communication library specifically designed for MoE models and expert parallelism (EP). DeepEP addresses the inefficiencies inherent in how tokens are dispatched and aggregated across GPUs. The library provides high-throughput, low-latency all-to-all GPU kernels—commonly referred to as MoE dispatch and combine kernels—that streamline data exchange during both training and inference. Notably, DeepEP supports low-precision operations (including FP8), aligning with techniques detailed in the DeepSeek-V3 paper. This release responds directly to the challenges of scaling MoE architectures in both intranode and internode environments.

    Technical Overview and Benefits

    DeepEP offers two primary types of kernels designed to meet different operational needs:

    • Normal Kernels: These kernels are optimized for scenarios that require high throughput, such as during the pre-filling phase of inference or training. They efficiently forward data across GPUs by taking advantage of both NVLink and RDMA networking technologies. For instance, tests on Hopper GPUs with NVLink have shown throughput around 153 GB/s for intranode communication, while internode tests using CX7 InfiniBand (approximately 50 GB/s bandwidth) achieve stable performance near 43–47 GB/s. By maximizing available bandwidth, these kernels reduce communication overhead during token dispatch and result combining.
    • Low-Latency Kernels: For inference tasks where responsiveness is crucial, DeepEP provides low-latency kernels that rely solely on RDMA. These kernels are tailored to handle small batches—common in real-time applications—with reported latencies as low as 163 microseconds for dispatch operations involving eight experts. The design also incorporates a hook-based communication-computation overlapping technique that allows data transfers to occur concurrently with computation, without consuming GPU streaming multiprocessors (SMs).

    DeepEP further offers flexibility through adaptive configurations. Users can adjust parameters such as the number of SMs in use or set environment variables (for example, NVSHMEM_IB_SL) to manage traffic isolation. Adaptive routing, which is currently supported in the low-latency kernels, helps distribute network traffic evenly under heavy loads, thereby improving robustness.

    Performance Insights and Practical Outcomes

    The performance metrics for DeepEP are noteworthy. In typical tests using normal kernels, intranode communication can achieve throughput up to 153 GB/s, and internode setups maintain around 43–47 GB/s over RDMA. Low-latency kernels are particularly effective in production scenarios; for a batch of 128 tokens processed with eight experts, dispatch latency can be as low as 163 microseconds. Such improvements mean that the overall inference process becomes more efficient, allowing for larger batch sizes and smoother overlap between computation and communication.

    In practical terms, these optimizations lead to faster response times in inference decoding and improved throughput in training scenarios. The inclusion of FP8 support not only lowers the memory footprint but also facilitates quicker data transfers, which is essential when deploying models in environments where resources are limited.

    Conclusion

    DeepEP is a thoughtful contribution to the field of large-scale language model deployment. By addressing key communication bottlenecks in MoE architectures, it enables more efficient training and inference. Its dual-kernel approach—with one set designed for high throughput and another for low latency—offers flexibility for a range of applications. Built with support for low-precision operations and equipped with mechanisms for adaptive configuration, DeepEP provides researchers and developers a practical tool to further optimize expert parallelism.

    In summary, DeepSeek AI’s release of DeepEP represents a careful, well-engineered solution that balances performance with resource efficiency. Its design helps pave the way for more scalable and responsive AI models, supporting both academic research and real-world applications in a cost-effective manner.


    Check out the GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

    🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

    The post DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpen-Reasoner-Zero: An Open-source Implementation of Large-Scale Reasoning-Oriented Reinforcement Learning Training
    Next Article Building an Interactive Weather Data Scraper in Google Colab: A Code Guide to Extract, Display, and Download Live Forecast Data Using Python, BeautifulSoup, Requests, Pandas, and Ipywidgets

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 31, 2025
    Machine Learning

    Cisco’s Latest AI Agents Report Details the Transformative Impact of Agentic AI on Customer Experience

    May 31, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2022-46729 – Apache Struts Deserialization Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Google DeepMind’s latest research at ICML 2023

    Artificial Intelligence

    How one tiny microphone solved my biggest video production problems

    News & Updates

    Explore 2025’s Leading Web Design Trends: The Ultimate Top 25 Guide

    Web Development

    Highlights

    Development

    PeckPHP – A CLI tool designed to identify wording or spelling mistakes in your codebase

    January 14, 2025

    PeckPHP is a command-line tool developed by Nuno Maduro that meticulously scans your codebase for…

    How to bring elements to view automatically in Protractor or selenium

    June 30, 2024

    CVE-2025-39386 – Mojoomla Hospital Management System SQL Injection

    May 19, 2025

    CVE-2025-4028 – PHPGurukul COVID19 Testing Management System SQL Injection Vulnerability

    April 28, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.