Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 17, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 17, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 17, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 17, 2025

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025

      Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

      May 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025
      Recent

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025

      Apps in Generative AI – Transforming the Digital Experience

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025
      Recent

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models

    ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models

    June 13, 2024

    Deploying large language models (LLMs) on resource-constrained devices presents significant challenges due to their extensive parameters and reliance on dense multiplication operations. This results in high memory demands and latency bottlenecks, hindering their practical application in real-world scenarios. For instance, models like GPT-3 require immense computational resources, making them unsuitable for many edge and cloud environments. Overcoming these challenges is crucial for the advancement of AI, as it would enable the efficient deployment of powerful LLMs, thereby broadening their applicability and impact.

    Current methods to enhance the efficiency of LLMs include pruning, quantization, and attention optimization. Pruning techniques reduce model size by removing less significant parameters, but this often leads to accuracy loss. Quantization, particularly post-training quantization (PTQ), reduces the bit-width of weights and activations to lower memory and computation demands. However, existing PTQ methods either require significant retraining or lead to accuracy degradation due to quantization errors. Additionally, these methods still rely heavily on costly multiplication operations, limiting their effectiveness in reducing latency and energy consumption.

    Researchers from Google, Intel, and Georgia Institute of Technology propose ShiftAddLLM, a method that accelerates pre-trained LLMs through post-training shift-and-add reparameterization. This approach replaces traditional multiplications with hardware-friendly shift and add operations. Specifically, it quantizes weight matrices into binary matrices with group-wise scaling factors. These multiplications are then reparameterized into shifts between activations and scaling factors, and queries and adds based on the binary matrices. This method addresses the limitations of existing quantization techniques by minimizing both weight and activation reparameterization errors through a multi-objective optimization framework. This innovative approach significantly reduces memory usage and latency while maintaining or improving model accuracy.

    ShiftAddLLM employs a multi-objective optimization method to align weight and output activation objectives, minimizing overall reparameterization errors. The researchers introduced an automated bit allocation strategy, optimizing the bit-widths for weights in each layer based on their sensitivity to reparameterization. This strategy ensures that more sensitive layers receive higher-bit representations, thus avoiding accuracy loss while maximizing efficiency. The proposed method is validated across five LLM families and eight tasks, showing average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the best existing quantized LLMs. Additionally, ShiftAddLLM achieves over 80% reductions in memory and energy consumption.

    The experimental results demonstrate the effectiveness of ShiftAddLLM. Significant improvements in perplexity scores across various models and tasks were reported. For example, ShiftAddLLM achieves perplexity reductions of 5.63/38.47/5136.13 compared to OPTQ, LUT-GEMM, and AWQ at 3 bits, respectively. In 2-bit settings, where most baselines fail, ShiftAddLLM maintains low perplexity and achieves an average reduction of 22.74 perplexity points over the most competitive baseline, QuIP. The method also shows better accuracy-latency trade-offs, with up to 103830.45 perplexity reduction and up to 60.1% latency reductions. The below key result table compares perplexity scores and latencies of various methods, highlighting ShiftAddLLM’s superior performance in both metrics.

    In conclusion, the researchers present ShiftAddLLM, a significant advancement in the efficient deployment of LLMs. The method reparameterizes weight matrices into shift-and-add operations, drastically reducing computational costs while maintaining high accuracy. This innovation is achieved through a multi-objective optimization strategy and an automated bit allocation approach. ShiftAddLLM offers substantial improvements in memory and energy efficiency, demonstrating its potential to make advanced LLMs more accessible and practical for a wider range of applications. This work represents a critical step forward in addressing the deployment challenges of large-scale AI models.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    The post ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleResearchers at Stanford Introduce TEXTGRAD: A Powerful AI Framework Performing Automatic “Differentiation” via Text
    Next Article Are You a Startup Struggling with UI/UX For Production? Meet CodeParrot: An AI-Powered Tool that Transforms Figma Files to Production Ready Code

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 17, 2025
    Development

    Learn A1 Level Spanish

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Building a Retrieval-Augmented Generation (RAG) System with FAISS and Open-Source LLMs

    Machine Learning

    CVE-2025-3995 – TOTOLINK N150RT Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    fum is a TUI-based MPRIS music client

    Linux

    AI washing is dirty business. Lenovo’s COO explains how to avoid it

    Development
    GetResponse

    Highlights

    Development

    Meet Glasskube: A Open Source Package Manager for Kubernetes

    June 28, 2024

    The number of Kubernetes packages on the CNCF landscape has increased dramatically. With over 7…

    Android studio emulator problem

    August 3, 2024

    Revealing the UTG-Q-010 Campaign: A Deep Dive into Cryptocurrency Lures and Pupy RAT

    August 14, 2024

    This AI Paper from Salesforce Introduces VLM2VEC and MMEB: A Contrastive Framework and Benchmark for Universal Multimodal Embeddings

    April 11, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.