Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Why Non-Native Content Designers Improve Global UX

      July 18, 2025

      DevOps won’t scale without platform engineering and here’s why your teams are still stuck

      July 18, 2025

      This week in AI dev tools: Slack’s enterprise search, Claude Code’s analytics dashboard, and more (July 18, 2025)

      July 18, 2025

      Report: 71% of tech leaders won’t hire devs without AI skills

      July 17, 2025

      Could OpenAI’s rumored browser be a Chrome-killer? Here’s what I’m expecting

      July 18, 2025

      My favorite lens and screen-cleaning kit keeps my tech spotless, and it only costs $8

      July 18, 2025

      AI’s biggest impact on your workforce is still to come – 3 ways to avoid getting left behind

      July 18, 2025

      Remedy offers update on ‘FBC: Firebreak,’ details coming improvements — “We’ve seen many players come into the game and leave within the first hour.”

      July 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      July 18, 2025
      Recent

      The details of TC39’s last meeting

      July 18, 2025

      Online Examination System using PHP and MySQL

      July 18, 2025

      A tricky, educational quiz: it’s about time..

      July 18, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      CAD Sketcher – constraint-based geometry sketcher

      July 18, 2025
      Recent

      CAD Sketcher – constraint-based geometry sketcher

      July 18, 2025

      7 Best Free and Open Source Linux FTP Servers

      July 18, 2025

      Best Free and Open Source Alternatives to Autodesk FBX Review

      July 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference

    This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference

    May 31, 2025

    Large language models (LLMs), with billions of parameters, power many AI-driven services across industries. However, their massive size and complex architectures make their computational costs during inference a significant challenge. As these models evolve, optimizing the balance between computational efficiency and output quality has become a crucial area of research.

    The core challenge lies in how LLMs handle inference. Every time an input is processed, the entire model is activated, which consumes extensive computational resources. This full activation is unnecessary for most tasks, as only a small subset of neurons contribute meaningfully to the final output. Existing sparse activation methods attempt to address this by selectively deactivating less important neurons. However, these approaches often focus only on the magnitude of hidden states while ignoring the critical role of weight matrices in propagating errors through the network. This oversight leads to high approximation errors and deteriorates model performance, particularly at higher sparsity levels.

    Sparse activation techniques have included methods like Mixture-of-Experts (MoE) used in models such as GPT-4 and Mistral, which rely on additional training to learn which experts to activate for each input. Other approaches, such as TEAL and CATS, aim to reduce computation by using the size of hidden activations to prune neurons, but they still leave room for improvement. These methods often struggle with balancing sparsity and accuracy, as they can mistakenly deactivate important neurons or retain those with minimal influence. Moreover, they require model-specific threshold tuning, making them less flexible across different architectures.

    Researchers from Microsoft, Renmin University of China, New York University, and the South China University of Technology proposed a new method called WINA (Weight Informed Neuron Activation) to address these issues. WINA introduces a training-free sparse activation technique that uses both hidden state magnitudes and column-wise ℓ2 norms of weight matrices to determine which neurons to activate during inference. By considering the combined impact of input magnitudes and weight importance, WINA creates a more effective sparsification strategy that adapts to different layers of the model without requiring retraining or fine-tuning.

    The WINA method is built on a simple yet powerful idea: neurons that have strong activations and large weight magnitudes are more likely to influence downstream computations. To operationalize this, WINA calculates the element-wise product of hidden states and weight norms, selecting the top-K components based on this combined metric. This strategy allows WINA to construct a sparse sub-network that preserves the most important signals while ignoring redundant activations. The method also includes a tensor transformation step that enforces column-wise orthogonality in weight matrices, ensuring theoretical error bounds translate effectively to real-world performance. By combining these steps, WINA maintains a tight approximation error while delivering significant computational savings.

    The research team evaluated WINA on several large language models, including Qwen-2.5-7B, LLaMA-2-7B, LLaMA-3-8B, and Phi-4-14B, across various tasks and sparsity levels. WINA outperformed TEAL and CATS across all tested models and sparsity settings. For example, on Qwen-2.5-7B at 65% sparsity, WINA achieved up to 2.94% higher average performance than TEAL and 1.41% better than TEAL-Transform. On LLaMA-3-8B, WINA delivered gains of 1.06% at 50% sparsity and 2.41% at 65% sparsity. Even at high sparsity levels, WINA retained stronger performance on reasoning-intensive tasks like GSM8K and ARC Challenge. WINA also delivered consistent computational savings, reducing floating-point operations by up to 63.7% on LLaMA-2-7B and 62.7% on Phi-4-14B.

    In summary, WINA offers a robust, training-free solution for sparse activation in large language models by combining hidden state magnitudes with weight matrix norms. This approach addresses the limitations of prior methods, such as TEAL, resulting in lower approximation errors, improved accuracy, and significant computational savings. The research team’s work represents an important step forward in developing more efficient LLM inference methods that can adapt to diverse models without requiring additional training.


    Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation
    Next Article Cisco’s Latest AI Agents Report Details the Transformative Impact of Agentic AI on Customer Experience

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 18, 2025
    Machine Learning

    Language Models Improve When Pretraining Data Matches Target Tasks

    July 18, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    How to Make a Dropdown Menu with shadcn/ui

    Development

    Fighting AI with AI, finance firms prevented $5 million in fraud – but at what cost?

    News & Updates

    Native array_first() and array_last() Functions in PHP 8.5

    Development

    CVE-2025-6304 – Code-projects Online Shoe Store SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-4809 – Tenda AC7 Stack-Based Buffer Overflow Vulnerability

    May 16, 2025

    CVE ID : CVE-2025-4809

    Published : May 16, 2025, 8:15 p.m. | 48 minutes ago

    Description : A vulnerability was found in Tenda AC7 15.03.06.44. It has been classified as critical. Affected is the function fromSafeSetMacFilter of the file /goform/setMacFilterCfg. The manipulation of the argument deviceList leads to stack-based buffer overflow. It is possible to launch the attack remotely. The exploit has been disclosed to the public and may be used.

    Severity: 8.8 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    The Last Letter from the Hills: Part 2 – The Monsoon of Memories

    June 15, 2025

    CVE-2025-5977 – Code-projects School Fees Payment System SQL Injection Vulnerability

    June 10, 2025

    CVE-2025-35975 – MicroDicom DICOM Viewer Out-of-Bounds Write RCE

    May 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.