Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»DeepSeek AI Researchers Propose Expert-Specialized Fine-Tuning, or ESFT to Reduce Memory by up to 90% and Time by up to 30%

    DeepSeek AI Researchers Propose Expert-Specialized Fine-Tuning, or ESFT to Reduce Memory by up to 90% and Time by up to 30%

    July 6, 2024

    Natural language processing is advancing rapidly, focusing on optimizing large language models (LLMs) for specific tasks. These models, often containing billions of parameters, pose a significant challenge in customization. The aim is to develop efficient and better methods for fine-tuning these models to specific downstream tasks without prohibitive computational costs. This requires innovative approaches to parameter-efficient fine-tuning (PEFT) that maximize performance while minimizing resource usage.

    One major problem in this domain is the resource-intensive nature of customizing LLMs for specific tasks. Traditional fine-tuning methods typically update all model parameters, which can lead to high computational costs and overfitting. Given the scale of modern LLMs, such as those with sparse architectures that distribute tasks across multiple specialized experts, there is a pressing need for more efficient fine-tuning techniques. The challenge lies in optimizing performance while ensuring the computational burden remains manageable.

    Existing methods for PEFT in dense-architecture LLMs include low-rank adaptation (LoRA) and P-Tuning. These methods generally involve adding new parameters to the model or selectively updating existing ones. For instance, LoRA decomposes weight matrices into low-rank components, which helps reduce the number of parameters that need to be trained. However, these approaches have primarily focused on dense models and do not fully exploit the potential of sparse-architecture LLMs. In sparse models, different tasks activate different subsets of parameters, making traditional methods less effective.

    DeepSeek AI and Northwestern University researchers have introduced a novel method called Expert-Specialized Fine-Tuning (ESFT) tailored for sparse-architecture LLMs, specifically those using a mixture-of-experts (MoE) architecture. This method aims to fine-tune only the most relevant experts for a given task while freezing the other experts and model components. By doing so, ESFT enhances tuning efficiency and maintains the specialization of the experts, which is crucial for optimal performance. The ESFT method capitalizes on the MoE architecture’s inherent ability to assign different tasks to experts, ensuring that only the necessary parameters are updated.

    In more detail, ESFT involves calculating the affinity scores of experts to task-specific data and selecting a subset of experts with the highest relevance. These selected experts are then fine-tuned while the rest of the model remains unchanged. This selective approach significantly reduces the computational costs associated with fine-tuning. For instance, ESFT can reduce storage requirements by up to 90% and training time by up to 30% compared to full-parameter fine-tuning. This efficiency is achieved without compromising the model’s overall performance, as demonstrated by the experimental results.

    In various downstream tasks, ESFT not only matched but often surpassed the performance of traditional full-parameter fine-tuning methods. For example, in tasks like math and code, ESFT achieved significant performance gains while maintaining a high degree of specialization. The method’s ability to efficiently fine-tune a subset of experts, selected based on their relevance to the task, highlights its effectiveness. The results showed that ESFT maintained general task performance better than other PEFT methods like LoRA, making it a versatile and powerful tool for LLM customization.

    In conclusion, the research introduces Expert-Specialized Fine-Tuning (ESFT) as a solution to the problem of resource-intensive fine-tuning in large language models. By selectively tuning relevant experts, ESFT optimizes both performance and efficiency. This method leverages the specialized architecture of sparse-architecture LLMs to achieve superior results with reduced computational costs. The research demonstrates that ESFT can significantly improve training efficiency, reduce storage and training time, and maintain high performance across various tasks. This makes ESFT a promising approach for future developments in customizing large language models. 

    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 46k+ ML SubReddit

    The post DeepSeek AI Researchers Propose Expert-Specialized Fine-Tuning, or ESFT to Reduce Memory by up to 90% and Time by up to 30% appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSafeguarding Healthcare AI: Exposing and Addressing LLM Manipulation Risks
    Next Article Horizontal Timeline CSS Responsive -Guide

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-48187 – RAGFlow Authentication Bypass

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    OpenAI Announces SearchGPT: A Revolutionary AI-Powered Search Engine

    Artificial Intelligence

    Speaker diarization improvements: new languages, increased accuracy

    Artificial Intelligence

    Mask sensitive Amazon DocumentDB log data with Amazon CloudWatch Logs data protection

    Databases

    CodeSOD: A Little Extra Padding

    Development

    Highlights

    Shift left security — Good intentions, poor execution, and ways to fix it

    December 27, 2024

    The concept of “shift left” is fundamentally sound. Integrating security earlier into the software development…

    Could AI make you a billionaire in 2025?

    January 3, 2025

    Otters’ Sweet Treats

    July 5, 2024

    CVE-2024-57233 – NETGEAR RAX5 Command Injection Vulnerability

    May 5, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.