ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning

Adam is widely used in deep learning as an adaptive optimization algorithm, but it struggles with convergence unless the hyperparameter Î²2 is adjusted based on the specific problem. Attempts to fix this, like AMSGrad, require the impractical assumption of uniformly bounded gradient noise, which doesnâ€™t hold in cases with Gaussian noise, as seen in variational autoencoders and diffusion models. Other methods, such as AdaShift, address convergence in limited scenarios but arenâ€™t effective for general problems. Recent studies suggest Adam can converge by fine-tuning Î²2 per task, though this approach is complex and problem-specific, warranting further exploration for universal solutions.

Researchers from The University of Tokyo introduced ADOPT. This new adaptive gradient method achieves optimal convergence at an O(1/âˆšT) rate without requiring specific choices for Î²2 or the bounded noise assumption. ADOPT addresses Adamâ€™s non-convergence by excluding the current gradient from the second moment estimate and adjusting the order of momentum and normalization updates. Experiments across diverse tasksâ€”such as image classification, generative modeling, language processing, and reinforcement learningâ€”show ADOPTâ€™s superior performance over Adam and its variants. The method also converges reliably in challenging cases, including scenarios where Adam and AMSGrad struggle.

This study focuses on minimizing an objective function that depends on a parameter vector by using first-order stochastic optimization methods. Rather than working with the exact gradient, they rely on an estimate known as the stochastic gradient. Since the function may be nonconvex, the goal is to find a stationary point where the gradient is zero. Standard analyses for convergence in this area generally make several key assumptions: the function has a minimum bound, the stochastic gradient provides an unbiased estimate of the gradient, the function changes smoothly, and the variance of the stochastic gradient is uniformly limited. For adaptive methods like Adam, an additional assumption about the gradient variance is often made to simplify convergence proofs. The researchers apply a set of assumptions to investigate how adaptive gradient methods converge without relying on the stricter assumption that the gradient noise remains bounded.

Prior research suggests that while basic stochastic gradient descent often converges in nonconvex settings, adaptive gradient methods like Adam are widely used in deep learning due to their flexibility. However, Adam sometimes needs to converge, especially in convex cases. A modified version called AMSGrad was developed to address this, which introduces a non-decreasing scaling of the learning rate by updating the second-moment estimate with a maximum function. Still, AMSGradâ€™s convergence is based on the stronger assumption of uniformly bounded gradient noise, which is not valid in all scenarios, such as in certain generative models. Therefore, the researchers propose a new adaptive gradient update approach that aims to ensure reliable convergence without relying on stringent assumptions about gradient noise, addressing Adamâ€™s limitations regarding convergence and optimizing parameter dependencies.

The ADOPT algorithm is evaluated across various tasks to verify its performance and robustness compared to Adam and AMSGrad. Starting with a toy problem, ADOPT successfully converges where Adam does not, especially under high-gradient noise conditions. Testing with an MLP on the MNIST dataset and a ResNet on CIFAR-10 shows that ADOPT achieves faster and more stable convergence. ADOPT also outperforms Adam in applications such as Swin Transformer-based ImageNet classification, NVAE generative modeling, and GPT-2 pretraining under noisy gradient conditions and yields improved scores in LLaMA-7B language model finetuning on the MMLU benchmark.

The study addresses the theoretical limitations of adaptive gradient methods like Adam, which need specific hyperparameter settings to converge. To resolve this, the authors introduce ADOPT, an optimizer that achieves optimal convergence rates across various tasks without problem-specific tuning. ADOPT overcomes Adamâ€™s limitations by altering the momentum update order and excluding the current gradient from second-moment calculations, ensuring stability across tasks like image classification, NLP, and generative modeling. The work bridges theory and application in adaptive optimization, although future research may explore more relaxed assumptions to generalize ADOPTâ€™s effectiveness further.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter.. Donâ€™t Forget to join ourÂ 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on â€˜SMALL LANGUAGE MODELSâ€˜

The post ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

How to get started with Microsoft Copilot on Windows 11

Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

I missed out on the Clair Obscur: Expedition 33 Collector’s Edition but thankfully, the developers are launching something special

Perficient is Shaping the Future of Salesforce Innovation

Perficient is Shaping the Future of Salesforce Innovation

Opal – Optimizely’s AI-Powered Marketing Assistant

Content Compliance Without the Chaos: How Optimizely CMP Empowers Financial Services Marketers

Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

How to get started with Microsoft Copilot on Windows 11

Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-48695 – CyberDAVA Privilege Escalation Vulnerability

New FrigidStealer Malware Targets macOS Users via Fake Browser Updates

CVE-2025-46580 – GoldenDB Database Information Disclosure and Privilege Escalation Vulnerability

Q&A: The climate impact of generative AI

Meeting the UKâ€™s Telecommunications Security Act with MongoDB

Measuring perception in AI models

TwelveTransfers

The Legend of Srinidhi Ranganathan

CVE-2025-3794 – WordPress WPForms Stored Cross-Site Scripting Vulnerability

ADOPT: A Universal Adaptive Gradient Method for Reliable Convergence without Hyperparameter Tuning

Related Posts