ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models

Deploying large language models (LLMs) on resource-constrained devices presents significant challenges due to their extensive parameters and reliance on dense multiplication operations. This results in high memory demands and latency bottlenecks, hindering their practical application in real-world scenarios. For instance, models like GPT-3 require immense computational resources, making them unsuitable for many edge and cloud environments. Overcoming these challenges is crucial for the advancement of AI, as it would enable the efficient deployment of powerful LLMs, thereby broadening their applicability and impact.

Current methods to enhance the efficiency of LLMs include pruning, quantization, and attention optimization. Pruning techniques reduce model size by removing less significant parameters, but this often leads to accuracy loss. Quantization, particularly post-training quantization (PTQ), reduces the bit-width of weights and activations to lower memory and computation demands. However, existing PTQ methods either require significant retraining or lead to accuracy degradation due to quantization errors. Additionally, these methods still rely heavily on costly multiplication operations, limiting their effectiveness in reducing latency and energy consumption.

Researchers from Google, Intel, and Georgia Institute of Technology propose ShiftAddLLM, a method that accelerates pre-trained LLMs through post-training shift-and-add reparameterization. This approach replaces traditional multiplications with hardware-friendly shift and add operations. Specifically, it quantizes weight matrices into binary matrices with group-wise scaling factors. These multiplications are then reparameterized into shifts between activations and scaling factors, and queries and adds based on the binary matrices. This method addresses the limitations of existing quantization techniques by minimizing both weight and activation reparameterization errors through a multi-objective optimization framework. This innovative approach significantly reduces memory usage and latency while maintaining or improving model accuracy.

ShiftAddLLM employs a multi-objective optimization method to align weight and output activation objectives, minimizing overall reparameterization errors. The researchers introduced an automated bit allocation strategy, optimizing the bit-widths for weights in each layer based on their sensitivity to reparameterization. This strategy ensures that more sensitive layers receive higher-bit representations, thus avoiding accuracy loss while maximizing efficiency. The proposed method is validated across five LLM families and eight tasks, showing average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the best existing quantized LLMs. Additionally, ShiftAddLLM achieves over 80% reductions in memory and energy consumption.

The experimental results demonstrate the effectiveness of ShiftAddLLM. Significant improvements in perplexity scores across various models and tasks were reported. For example, ShiftAddLLM achieves perplexity reductions of 5.63/38.47/5136.13 compared to OPTQ, LUT-GEMM, and AWQ at 3 bits, respectively. In 2-bit settings, where most baselines fail, ShiftAddLLM maintains low perplexity and achieves an average reduction of 22.74 perplexity points over the most competitive baseline, QuIP. The method also shows better accuracy-latency trade-offs, with up to 103830.45 perplexity reduction and up to 60.1% latency reductions. The below key result table compares perplexity scores and latencies of various methods, highlighting ShiftAddLLMâ€™s superior performance in both metrics.

In conclusion, the researchers present ShiftAddLLM, a significant advancement in the efficient deployment of LLMs. The method reparameterizes weight matrices into shift-and-add operations, drastically reducing computational costs while maintaining high accuracy. This innovation is achieved through a multi-objective optimization strategy and an automated bit allocation approach. ShiftAddLLM offers substantial improvements in memory and energy efficiency, demonstrating its potential to make advanced LLMs more accessible and practical for a wider range of applications. This work represents a critical step forward in addressing the deployment challenges of large-scale AI models.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 44k+ ML SubReddit

The post ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Apps in Generative AI – Transforming the Digital Experience

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models

February 2025 Baseline monthly digest

Learn A1 Level Spanish

Building a Retrieval-Augmented Generation (RAG) System with FAISS and Open-Source LLMs

CVE-2025-3995 – TOTOLINK N150RT Cross-Site Scripting Vulnerability

fum is a TUI-based MPRIS music client

AI washing is dirty business. Lenovo’s COO explains how to avoid it

Meet Glasskube: A Open Source Package Manager for Kubernetes

Android studio emulator problem

Revealing the UTG-Q-010 Campaign: A Deep Dive into Cryptocurrency Lures and Pupy RAT

This AI Paper from Salesforce Introduces VLM2VEC and MMEB: A Contrastive Framework and Benchmark for Universal Multimodal Embeddings

ShiftAddLLM: Accelerating Pretrained LLMs through Post-Training Shift-and-Add Reparameterization: Creating Efficient Multiplication-Free Models

Related Posts