Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 22, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 22, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 22, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 22, 2025

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025

      How to get started with Microsoft Copilot on Windows 11

      May 22, 2025

      Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

      May 22, 2025

      I missed out on the Clair Obscur: Expedition 33 Collector’s Edition but thankfully, the developers are launching something special

      May 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Perficient is Shaping the Future of Salesforce Innovation

      May 22, 2025
      Recent

      Perficient is Shaping the Future of Salesforce Innovation

      May 22, 2025

      Opal – Optimizely’s AI-Powered Marketing Assistant

      May 22, 2025

      Content Compliance Without the Chaos: How Optimizely CMP Empowers Financial Services Marketers

      May 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025
      Recent

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025

      How to get started with Microsoft Copilot on Windows 11

      May 22, 2025

      Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

      May 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning

    ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning

    November 9, 2024

    Adam is widely used in deep learning as an adaptive optimization algorithm, but it struggles with convergence unless the hyperparameter β2 is adjusted based on the specific problem. Attempts to fix this, like AMSGrad, require the impractical assumption of uniformly bounded gradient noise, which doesn’t hold in cases with Gaussian noise, as seen in variational autoencoders and diffusion models. Other methods, such as AdaShift, address convergence in limited scenarios but aren’t effective for general problems. Recent studies suggest Adam can converge by fine-tuning β2 per task, though this approach is complex and problem-specific, warranting further exploration for universal solutions.

    Researchers from The University of Tokyo introduced ADOPT. This new adaptive gradient method achieves optimal convergence at an O(1/√T) rate without requiring specific choices for β2 or the bounded noise assumption. ADOPT addresses Adam’s non-convergence by excluding the current gradient from the second moment estimate and adjusting the order of momentum and normalization updates. Experiments across diverse tasks—such as image classification, generative modeling, language processing, and reinforcement learning—show ADOPT’s superior performance over Adam and its variants. The method also converges reliably in challenging cases, including scenarios where Adam and AMSGrad struggle.

    This study focuses on minimizing an objective function that depends on a parameter vector by using first-order stochastic optimization methods. Rather than working with the exact gradient, they rely on an estimate known as the stochastic gradient. Since the function may be nonconvex, the goal is to find a stationary point where the gradient is zero. Standard analyses for convergence in this area generally make several key assumptions: the function has a minimum bound, the stochastic gradient provides an unbiased estimate of the gradient, the function changes smoothly, and the variance of the stochastic gradient is uniformly limited. For adaptive methods like Adam, an additional assumption about the gradient variance is often made to simplify convergence proofs. The researchers apply a set of assumptions to investigate how adaptive gradient methods converge without relying on the stricter assumption that the gradient noise remains bounded.

    Prior research suggests that while basic stochastic gradient descent often converges in nonconvex settings, adaptive gradient methods like Adam are widely used in deep learning due to their flexibility. However, Adam sometimes needs to converge, especially in convex cases. A modified version called AMSGrad was developed to address this, which introduces a non-decreasing scaling of the learning rate by updating the second-moment estimate with a maximum function. Still, AMSGrad’s convergence is based on the stronger assumption of uniformly bounded gradient noise, which is not valid in all scenarios, such as in certain generative models. Therefore, the researchers propose a new adaptive gradient update approach that aims to ensure reliable convergence without relying on stringent assumptions about gradient noise, addressing Adam’s limitations regarding convergence and optimizing parameter dependencies.

    The ADOPT algorithm is evaluated across various tasks to verify its performance and robustness compared to Adam and AMSGrad. Starting with a toy problem, ADOPT successfully converges where Adam does not, especially under high-gradient noise conditions. Testing with an MLP on the MNIST dataset and a ResNet on CIFAR-10 shows that ADOPT achieves faster and more stable convergence. ADOPT also outperforms Adam in applications such as Swin Transformer-based ImageNet classification, NVAE generative modeling, and GPT-2 pretraining under noisy gradient conditions and yields improved scores in LLaMA-7B language model finetuning on the MMLU benchmark.

    The study addresses the theoretical limitations of adaptive gradient methods like Adam, which need specific hyperparameter settings to converge. To resolve this, the authors introduce ADOPT, an optimizer that achieves optimal convergence rates across various tasks without problem-specific tuning. ADOPT overcomes Adam’s limitations by altering the momentum update order and excluding the current gradient from second-moment calculations, ensuring stability across tasks like image classification, NLP, and generative modeling. The work bridges theory and application in adaptive optimization, although future research may explore more relaxed assumptions to generalize ADOPT’s effectiveness further.


    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS‘

    The post ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleEssential Senior Front End Developer Skills
    Next Article Gemini AI Now Accessible Through the OpenAI Library for Streamlined Use

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 23, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-48695 – CyberDAVA Privilege Escalation Vulnerability

    May 23, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    New FrigidStealer Malware Targets macOS Users via Fake Browser Updates

    Development

    CVE-2025-46580 – GoldenDB Database Information Disclosure and Privilege Escalation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Q&A: The climate impact of generative AI

    Artificial Intelligence

    Meeting the UK’s Telecommunications Security Act with MongoDB

    Databases
    Hostinger

    Highlights

    Artificial Intelligence

    Measuring perception in AI models

    May 13, 2025

    Perception – the process of experiencing the world through senses – is a significant part…

    TwelveTransfers

    May 17, 2025

    The Legend of Srinidhi Ranganathan

    August 19, 2024

    CVE-2025-3794 – WordPress WPForms Stored Cross-Site Scripting Vulnerability

    May 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.