Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025

      I may have found the ultimate monitor for conferencing and productivity, but it has a few weaknesses

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      May report 2025

      June 2, 2025
      Recent

      May report 2025

      June 2, 2025

      Write more reliable JavaScript with optional chaining

      June 2, 2025

      Deploying a Scalable Next.js App on Vercel – A Step-by-Step Guide

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025
      Recent

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality

    LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality

    April 11, 2025
    • HIGGS — the innovative method for compressing large language models was developed in collaboration with teams at Yandex Research, MIT, KAUST and ISTA.
    • HIGGS makes it possible to compress LLMs without additional data or resource-intensive parameter optimization.
    • Unlike other compression methods, HIGGS does not require specialized hardware and powerful GPUs. Models can be quantized directly on a smartphone or laptop in just a few minutes with no significant quality loss.
    • The method has already been used to quantize popular LLaMA 3.1 and 3.2-family models, as well as DeepSeek and Qwen-family models. 

    The Yandex Research team, together with researchers from the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA) and the King Abdullah University of Science and Technology (KAUST), developed a method to rapidly compress large language models without a significant loss of quality. 

    Previously, deploying large language models on mobile devices or laptops involved a quantization process — taking anywhere from hours to weeks and it had to be run on industrial servers — to maintain good quality. Now, quantization can be completed in a matter of minutes right on a smartphone or laptop without industry-grade hardware or powerful GPUs. 

    HIGGS lowers the barrier to entry for testing and deploying new models on consumer-grade devices, like home PCs and smartphones by removing the need for industrial computing power.

    The innovative compression method furthers the company’s commitment to making large language models accessible to everyone, from major players, SMBs, and non-profit organizations to individual contributors, developers, and researchers. Last year, Yandex researchers collaborated with major science and technology universities to introduce two novel LLM compression methods: Additive Quantization of Large Language Models (AQLM) and PV-Tuning. Combined, these methods can reduce model size by up to 8 times while maintaining 95% response quality.

    Breaking Down LLM Adoption Barriers

    Large language models require substantial computational resources, which makes them inaccessible and cost-prohibitive for most. This is also the case for open-source models, like the popular DeepSeek R1, which can’t be easily deployed on even the most advanced servers designed for model training and other machine learning tasks.  

    As a result, access to these powerful models has traditionally been limited to a select few organizations with the necessary infrastructure and computing power, despite their public availability. 

    However, HIGGS can pave the way for broader accessibility. Developers can now reduce model size without sacrificing quality and run them on more affordable devices. For example, this method can be used to compress LLMs like DeepSeek R1 with 671B parameters and Llama 4 Maverick with 400B parameters, which previously could only be quantized (compressed) with a significant loss in quality. This quantization technique unlocks new ways to use LLMs across various fields, especially in resource-constrained environments. Now, startups and independent developers can leverage compressed models to build innovative products and services, while cutting costs on expensive equipment. 

    Yandex is already using HIGGS to prototype and accelerate product development, and idea testing, as compressed models enable faster testing than their full-scale counterparts.

    About the Method 

    HIGGS (Hadamard Incoherence with Gaussian MSE-optimal GridS) compresses large language models without requiring additional data or gradient descent methods, making quantization more accessible and efficient for a wide range of applications and devices. This is particularly valuable when there’s a lack of suitable data for calibrating the model. The method offers a balance between model quality, size, and quantization complexity, making it possible to use the models on a wide range of devices like smartphones and consumer laptops.

    HIGGS was tested on the LLaMA 3.1 and 3.2-family models, as well as on Qwen-family models. Experiments show that HIGGS outperforms other data-free quantization methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantization), in terms of quality-to-size ratio.

    Developers and researchers can already access the method on Hugging Face or explore the research paper, which is available on arXiv. At the end of this month, the team will present their paper at NAACL, one of the world’s top conferences on AI. 

    Continuous Commitment to Advancing Science and Optimization

    This is one of several papers Yandex Research presented on large language model quantization. For example, the team presented AQLM and PV-Tuning, two methods of LLM compression that can reduce a company’s computational budget by up to 8 times without significant loss in AI response quality. The team also built a service that lets users run an 8B model on a regular PC or smartphone via a browser-based interface, even without high computing power.

    Beyond LLM quantization, Yandex has open-sourced several tools that optimize resources used in LLM training. For example, the YaFSDP library accelerates LLM training by as much as 25% and reduces GPU resources for training by up to 20%. 

    Earlier this year, Yandex developers open-sourced Perforator, a tool for continuous real-time monitoring and analysis of servers and apps. Perforator highlights code inefficiencies and provides actionable insights, which helps companies reduce infrastructure costs by up to 20%. This could translate to potential savings in millions or even billions of dollars per year, depending on company size. 


    Check out Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit. Note: Thanks to the Yandex team for the thought leadership/ Resources for this article. Yandex team has financially supported us for this content/article.

    The post LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleRacing beyond DeepRacer: Debut of the AWS LLM League
    Next Article Simple ReFlow: Improved Techniques for Fast Flow Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in Large Language Models

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    6 Steps to Successfully Change Your Career Path

    Development

    Rilasciato Calibre 8.4: Gestione avanzata degli eBook e miglioramenti per KEPUB su GNU/Linux

    Linux

    Black Hat Europe 2024: Can AI systems be socially engineered?

    Development

    London Stock Exchange Group uses Amazon Q Business to enhance post-trade client services

    Machine Learning

    Highlights

    Amazon announces its own series of foundation models, Amazon Nova

    December 7, 2024

    Piling on to the list of announcements from Amazon at AWS re:Invent, the company announced…

    Want to try ChatGPT’s Deep Research tool for free? Check out the lightweight version

    April 25, 2025

    Unlocking the Potential of Multimodal Data: A Look at Vision-Language Models and their Applications

    May 31, 2024

    Understanding concepts in Event Driven Architectures (EDA)

    April 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.