Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 19, 2025

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 19, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 19, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 19, 2025

      My latest hands-on could be the best value AI laptop of the summer, but I still have questions

      May 19, 2025

      DOOM: The Dark Ages had the lowest Steam launch numbers in series history — Is it suffering from the ‘Game Pass Effect’?

      May 19, 2025

      Microsoft won’t be left exposed if something “catastrophic” happens to OpenAI — but may still be 3 to 6 months behind ChatGPT

      May 19, 2025

      Microsoft Copilot gets OpenAI’s GPT-4o image generation support — but maybe a day late and a dollar short for the hype?

      May 19, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      ES6: Set Vs Array- What and When?

      May 19, 2025
      Recent

      ES6: Set Vs Array- What and When?

      May 19, 2025

      Transform JSON into Typed Collections with Laravel’s AsCollection::of()

      May 19, 2025

      Deployer

      May 19, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My latest hands-on could be the best value AI laptop of the summer, but I still have questions

      May 19, 2025
      Recent

      My latest hands-on could be the best value AI laptop of the summer, but I still have questions

      May 19, 2025

      DOOM: The Dark Ages had the lowest Steam launch numbers in series history — Is it suffering from the ‘Game Pass Effect’?

      May 19, 2025

      Microsoft won’t be left exposed if something “catastrophic” happens to OpenAI — but may still be 3 to 6 months behind ChatGPT

      May 19, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Adam-mini: A Memory-Efficient Optimizer Revolutionizing Large Language Model Training with Reduced Memory Usage and Enhanced Performance

    Adam-mini: A Memory-Efficient Optimizer Revolutionizing Large Language Model Training with Reduced Memory Usage and Enhanced Performance

    July 2, 2024

    The field of research focuses on optimizing algorithms for training large language models (LLMs), which are essential for understanding and generating human language. These models are critical for various applications, including natural language processing and artificial intelligence. Training LLMs requires significant computational resources and memory, making optimizing these processes a high-priority area for researchers.

    The primary problem addressed by this paper is the high memory demand of optimization algorithms used in training large language models. Specifically, the Adam optimizer, a standard in the field due to its superior performance, requires substantial memory to store optimizer states such as first-order and second-order momentum values. This memory demand doubles the necessary resources compared to the model size, creating a significant burden. As a result, training large models becomes expensive and less accessible to researchers with limited resources. Alternative methods like Adafactor attempt to reduce memory usage but often compromise performance, highlighting the need for more efficient solutions.

    The Adam optimizer is widely used for training LLMs because of its ability to handle various model sizes and tasks effectively. However, Adam’s requirement for extensive memory to store its optimizer states, particularly the first-order and second-order momentums, poses a considerable challenge. For instance, training a 7 billion parameter model with Adam requires about 56 GB per card for these states alone, totaling 86 GB when gradients are included. This makes training prohibitively expensive, even with advanced graphical cards like the A100-80GB. Additionally, CPU-offloading and sharding are employed to manage this high memory requirement, increasing latency and slowing down the training process.

    Researchers from The Chinese University of Hong Kong, Shenzhen, Shenzhen Research Institute of Big Data, Duke University, and Stanford University introduced Adam-mini, an optimizer designed to achieve similar or better performance than Adam while reducing memory usage by 45% to 50%. Adam-mini accomplishes this by partitioning model parameters into blocks based on the Hessian structure of transformers. Each block is then assigned a single high-quality learning rate, significantly reducing the number of learning rates from billions to a manageable number. This approach allows Adam-mini to maintain or even improve performance with a fraction of the memory required by Adam.

    Adam-mini works by leveraging the near-block diagonal structure of transformers’ Hessians, partitioning parameters into blocks such as Query, Key, Value, and MLP layers. For each block, a single effective learning rate is calculated using the average of Adam’s second-order momentum values in that block. This method reduces the memory footprint and simplifies the learning rate assignment process. For example, during the pre-training of Llama2-7B on two A800-80GB GPUs, Adam-mini achieved a throughput of 5572.19 tokens per second, compared to 3725.59 tokens per second with AdamW, representing a 49.6% increase. This efficiency results in a 33% reduction in wall-clock time for processing the same number of tokens.

    The researchers validated Adam-mini’s performance across various language models ranging from 125 million to 7 billion parameters, including pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF). The optimizer demonstrated on-par or superior performance to AdamW, with notable improvements in memory efficiency and training speed. For instance, in supervised fine-tuning and reinforcement learning tasks, Adam-mini consistently outperformed AdamW, achieving higher evaluation scores and faster convergence.

    In conclusion, the Adam-mini optimizer addresses the significant memory inefficiencies of traditional optimization methods like Adam by introducing a novel partitioning strategy based on the Hessian structure of models. This innovative approach results in substantial memory savings and improved training efficiency, making it a valuable tool for researchers working with large-scale language models. By reducing the memory footprint by up to 50% and increasing throughput by nearly 50%, Adam-mini not only enhances the feasibility of training large models but also encourages broader participation from researchers with limited GPU resources.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    The post Adam-mini: A Memory-Efficient Optimizer Revolutionizing Large Language Model Training with Reduced Memory Usage and Enhanced Performance appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCreate an end-to-end serverless digital assistant for semantic search with Amazon Bedrock
    Next Article Top AI News, June 2024: Apple Intelligence, Figma AI, Gen-3 Alpha, and more

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 19, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4915 – PHPGurukul Auto Taxi Stand Management System SQL Injection

    May 19, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    5 Best QuickBooks Self-Employed/Solopreneur Alternatives

    Development

    Want to Extract Data in JMeter

    Development

    The old Sticky Notes cannot be accessed anymore by dozens of Windows users

    Development

    9 Best Free and Open Source Terminal-Based Calendar Tools

    Development

    Highlights

    AI moves to your PC with its own special hardware

    January 7, 2025

    Seeking to keep sensitive data private and accelerate AI workloads? Look no further than AI…

    Luke’s Larabits: How to Perform Random Order Pagination

    May 27, 2024

    A maintainer’s guide to vulnerability disclosure: GitHub tools to make it simple

    March 24, 2025

    How to Optimize Next.js Web Apps for Better Performance

    January 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.