Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Intel Labs Explores Low-Rank Adapters and Neural Architecture Search for LLM Compression

    Intel Labs Explores Low-Rank Adapters and Neural Architecture Search for LLM Compression

    February 1, 2025

    Large language models (LLMs) have become indispensable for various natural language processing applications, including machine translation, text summarization, and conversational AI. However, their increasing complexity and size have led to significant computational efficiency and memory consumption challenges. As these models grow, the resource demand makes them difficult to deploy in environments with limited computational capabilities.

    The primary obstacle with LLMs lies in their massive computational requirements. Training and fine-tuning these models involve billions of parameters, making them resource-intensive and limiting their accessibility. Existing methods for improving efficiency, such as parameter-efficient fine-tuning (PEFT), provide some relief but often compromise performance. The challenge is to find an approach that can significantly reduce computational demands while maintaining the model’s accuracy and effectiveness in real-world scenarios. Researchers have been exploring methods that allow efficient model tuning without requiring extensive computational resources.

    Researchers at Intel Labs and Intel Corporation have introduced an approach integrating low-rank adaptation (LoRA) with neural architecture search (NAS) techniques. This method seeks to address the limitations of traditional fine-tuning approaches while enhancing efficiency and performance. The research team developed a framework that optimizes memory consumption and computational speed by leveraging structured low-rank representations. The technique involves a weight-sharing super-network that dynamically adjusts substructures to enhance training efficiency. This integration allows the model to be fine-tuned effectively while maintaining a minimal computational footprint.

    The methodology introduced by Intel Labs is centered around LoNAS (Low-rank Neural Architecture Search), which employs elastic LoRA adapters for model fine-tuning. Unlike conventional approaches that require full fine-tuning of LLMs, LoNAS enables selective activation of model substructures, reducing redundancy. The key innovation lies in the flexibility of the elastic adapters, which adjust dynamically based on model requirements. The approach is supported by heuristic sub-network searches that further streamline the fine-tuning process. By focusing only on relevant model parameters, the technique achieves a balance between computational efficiency and performance. The process is structured to allow selective activation of low-rank structures while maintaining high inference speed.

    Performance evaluation of the proposed method highlights its significant improvements over conventional techniques. Experimental results indicate that LoNAS achieves an inference speedup of up to 1.4x while reducing model parameters by approximately 80%. When applied to fine-tuning LLaMA-7B on a 15k unified commonsense reasoning dataset, LoNAS demonstrated an average accuracy score of 65.8%. A comparative analysis of different LoNAS configurations showed that heuristic subnet optimization achieved an inference speedup of 1.23x, while search subnet configurations yielded speedups of 1.28x and 1.41x. Further, applying LoNAS to Mistral-7B-v0.3 in GSM8K tasks increased accuracy from 44.1% to 50.1%, maintaining efficiency across different model sizes. These findings confirm that the proposed methodology significantly enhances the performance of LLMs while reducing computational requirements.

    Further improvements to the framework include the introduction of Shears, an advanced fine-tuning strategy that builds on LoNAS. Shears utilize neural low-rank adapter search (NLS) to restrict elasticity to the adapter rank, reducing unnecessary computations. The approach applies sparsity to the base model using predefined metrics, ensuring that fine-tuning remains efficient. This strategy has been particularly effective in maintaining model accuracy while reducing the number of active parameters. Another extension, SQFT, incorporates sparsity and low numerical precision for enhanced fine-tuning. Using quantization-aware techniques, SQFT ensures that sparse models can be fine-tuned without losing efficiency. These refinements highlight the adaptability of LoNAS and its potential for further optimization.

    Integrating LoRA and NAS offers a transformative approach to large language model optimization. By leveraging structured low-rank representations, the research demonstrates that computational efficiency can be significantly improved without compromising performance. The study conducted by Intel Labs confirms that combining these techniques reduces the burden of fine-tuning while ensuring model integrity. Future research could explore further optimizations, including enhanced sub-network selection and more efficient heuristic strategies. This approach sets a precedent for making LLMs more accessible and deployable in diverse environments, paving the way for more efficient AI models.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)

    The post Intel Labs Explores Low-Rank Adapters and Neural Architecture Search for LLM Compression appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNim: A Personal Website Template Built with Motion-Primitives
    Next Article Meet RAGEN Framework: The First Open-Source Reproduction of DeepSeek-R1 for Training Agentic Models via Reinforcement Learning

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    AI Podcast Video-Making Tool will soon be here?

    Artificial Intelligence

    Alice Calls

    Artificial Intelligence

    Grammarly to roll out a new AI content detector tool. Here’s how it works

    Development

    CVE-2025-32956 – ManageWiki SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)
    GetResponse

    Highlights

    Development

    New Cyber Threat Exposed: Advanced Techniques Used to Target German Systems

    January 21, 2025

    A new cyberattack targeting German entities has recently been uncovered by Cyble Research and Intelligence…

    Meta AI Introduces CyberSecEval 2: A Novel Machine Learning Benchmark to Quantify LLM Security Risks and Capabilities

    May 1, 2024

    Phishing Attack at Los Angeles County Department of Public Health Leads to Major Data Breach

    June 18, 2024

    Cybersecurity in The Internet Age: Safeguarding Your Assets and Data

    February 20, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.