Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The state of DevOps and AI: Not just hype

      September 1, 2025

      A Breeze Of Inspiration In September (2025 Wallpapers Edition)

      August 31, 2025

      10 Top Generative AI Development Companies for Enterprise Node.js Projects

      August 30, 2025

      Prompting Is A Design Act: How To Brief, Guide And Iterate With AI

      August 29, 2025

      Look out, Meta Ray-Bans! These AI glasses just raised over $1M in pre-orders in 3 days

      September 2, 2025

      Samsung ‘Galaxy Glasses’ powered by Android XR are reportedly on track to be unveiled this month

      September 2, 2025

      The M4 iPad Pro is discounted $100 as a last-minute Labor Day deal

      September 2, 2025

      Distribution Release: Linux From Scratch 12.4

      September 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Enhanced Queue Job Control with Laravel’s ThrottlesExceptions failWhen() Method

      September 2, 2025
      Recent

      Enhanced Queue Job Control with Laravel’s ThrottlesExceptions failWhen() Method

      September 2, 2025

      August report 2025

      September 2, 2025

      Fake News Detection using Python Machine Learning (ML)

      September 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Installing Proxmox on a Raspberry Pi to run Virtual Machines on it

      September 2, 2025
      Recent

      Installing Proxmox on a Raspberry Pi to run Virtual Machines on it

      September 2, 2025

      Download Transcribe! for Windows

      September 1, 2025

      Microsoft Fixes CertificateServicesClient (CertEnroll) Error in Windows 11

      September 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality

    LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality

    April 11, 2025
    • HIGGS — the innovative method for compressing large language models was developed in collaboration with teams at Yandex Research, MIT, KAUST and ISTA.
    • HIGGS makes it possible to compress LLMs without additional data or resource-intensive parameter optimization.
    • Unlike other compression methods, HIGGS does not require specialized hardware and powerful GPUs. Models can be quantized directly on a smartphone or laptop in just a few minutes with no significant quality loss.
    • The method has already been used to quantize popular LLaMA 3.1 and 3.2-family models, as well as DeepSeek and Qwen-family models. 

    The Yandex Research team, together with researchers from the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA) and the King Abdullah University of Science and Technology (KAUST), developed a method to rapidly compress large language models without a significant loss of quality. 

    Previously, deploying large language models on mobile devices or laptops involved a quantization process — taking anywhere from hours to weeks and it had to be run on industrial servers — to maintain good quality. Now, quantization can be completed in a matter of minutes right on a smartphone or laptop without industry-grade hardware or powerful GPUs. 

    HIGGS lowers the barrier to entry for testing and deploying new models on consumer-grade devices, like home PCs and smartphones by removing the need for industrial computing power.

    The innovative compression method furthers the company’s commitment to making large language models accessible to everyone, from major players, SMBs, and non-profit organizations to individual contributors, developers, and researchers. Last year, Yandex researchers collaborated with major science and technology universities to introduce two novel LLM compression methods: Additive Quantization of Large Language Models (AQLM) and PV-Tuning. Combined, these methods can reduce model size by up to 8 times while maintaining 95% response quality.

    Breaking Down LLM Adoption Barriers

    Large language models require substantial computational resources, which makes them inaccessible and cost-prohibitive for most. This is also the case for open-source models, like the popular DeepSeek R1, which can’t be easily deployed on even the most advanced servers designed for model training and other machine learning tasks.  

    As a result, access to these powerful models has traditionally been limited to a select few organizations with the necessary infrastructure and computing power, despite their public availability. 

    However, HIGGS can pave the way for broader accessibility. Developers can now reduce model size without sacrificing quality and run them on more affordable devices. For example, this method can be used to compress LLMs like DeepSeek R1 with 671B parameters and Llama 4 Maverick with 400B parameters, which previously could only be quantized (compressed) with a significant loss in quality. This quantization technique unlocks new ways to use LLMs across various fields, especially in resource-constrained environments. Now, startups and independent developers can leverage compressed models to build innovative products and services, while cutting costs on expensive equipment. 

    Yandex is already using HIGGS to prototype and accelerate product development, and idea testing, as compressed models enable faster testing than their full-scale counterparts.

    About the Method 

    HIGGS (Hadamard Incoherence with Gaussian MSE-optimal GridS) compresses large language models without requiring additional data or gradient descent methods, making quantization more accessible and efficient for a wide range of applications and devices. This is particularly valuable when there’s a lack of suitable data for calibrating the model. The method offers a balance between model quality, size, and quantization complexity, making it possible to use the models on a wide range of devices like smartphones and consumer laptops.

    HIGGS was tested on the LLaMA 3.1 and 3.2-family models, as well as on Qwen-family models. Experiments show that HIGGS outperforms other data-free quantization methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantization), in terms of quality-to-size ratio.

    Developers and researchers can already access the method on Hugging Face or explore the research paper, which is available on arXiv. At the end of this month, the team will present their paper at NAACL, one of the world’s top conferences on AI. 

    Continuous Commitment to Advancing Science and Optimization

    This is one of several papers Yandex Research presented on large language model quantization. For example, the team presented AQLM and PV-Tuning, two methods of LLM compression that can reduce a company’s computational budget by up to 8 times without significant loss in AI response quality. The team also built a service that lets users run an 8B model on a regular PC or smartphone via a browser-based interface, even without high computing power.

    Beyond LLM quantization, Yandex has open-sourced several tools that optimize resources used in LLM training. For example, the YaFSDP library accelerates LLM training by as much as 25% and reduces GPU resources for training by up to 20%. 

    Earlier this year, Yandex developers open-sourced Perforator, a tool for continuous real-time monitoring and analysis of servers and apps. Perforator highlights code inefficiencies and provides actionable insights, which helps companies reduce infrastructure costs by up to 20%. This could translate to potential savings in millions or even billions of dollars per year, depending on company size. 


    Check out Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit. Note: Thanks to the Yandex team for the thought leadership/ Resources for this article. Yandex team has financially supported us for this content/article.

    The post LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleRacing beyond DeepRacer: Debut of the AWS LLM League
    Next Article Simple ReFlow: Improved Techniques for Fast Flow Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 2, 2025
    Machine Learning

    Introducing auto scaling on Amazon SageMaker HyperPod

    August 30, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Learn Kubernetes – Full Handbook for Developers, Startups, and Businesses

    Development

    CVE-2025-53098 – Roo Code MCP Configuration Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Apple doesn’t need better AI as much as AI needs Apple to bring its A-game

    News & Updates

    CVE-2025-1050 – Sonos Era 300 Remote Code Execution (RCE) Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-28168 – Outsystems Unrestricted File Upload Vulnerability

    May 5, 2025

    CVE ID : CVE-2025-28168

    Published : May 5, 2025, 2:15 p.m. | 1 hour, 18 minutes ago

    Description : Outsystems Multiple File Upload
    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Using CSS backdrop-filter for UI Effects

    April 16, 2025

    CVE-2025-53131 – Windows Media Heap-Based Buffer Overflow

    August 12, 2025

    Zero-Click AI Vulnerability Exposes Microsoft 365 Copilot Data Without User Interaction

    June 12, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.