Exploring Sharpness-Aware Minimization (SAM): Insights into Label Noise Robustness and Generalization

Recently, thereâ€™s been increasing interest in enhancing deep networksâ€™ generalization by regulating loss landscape sharpness. Sharpness Aware Minimization (SAM) has gained popularity for its superior performance on various benchmarks, specifically in managing random label noise, outperforming SGD by significant margins. SAMâ€™s robustness shines particularly in scenarios with label noise, showcasing substantial improvements over existing techniques. Also, SAMâ€™s effectiveness persists even with under-parameterization, potentially increasing gains with larger datasets. Understanding SAMâ€™s behavior, especially in the early learning phases, becomes crucial in optimizing its performance.

While SAMâ€™s underlying mechanisms remain elusive, several studies have attempted to shed light on the significance of per-example regularization in 1-SAM. Some researchers demonstrated that in sparse regression, 1-SAM exhibits a bias towards sparser weights compared to naive SAM. Prior studies also differentiate between the two by highlighting differences in the regularization of â€œflatness.â€ Recent research links naive SAM to generalization, underscoring the importance of understanding SAMâ€™s behavior beyond convergence.

Carnegie Mellon University researchers provide a study that investigates why 1-SAM demonstrates greater robustness to label noise compared to SGD at a mechanistic level. By analyzing the gradient decomposition of each example, particularly focusing on the logit scale and network Jacobian terms, the research identifies key mechanisms enhancing early-stopping test accuracy. In linear models, SAMâ€™s explicit up-weighting of low loss points proves beneficial, especially in the presence of mislabeled examples. Empirical findings suggest that SAMâ€™s label noise robustness originates primarily from its Jacobian term in deep networks, indicating a fundamentally different mechanism compared to the logit scale term. Also, analyzing Jacobian-only SAM reveals a decomposition into SGD with â„“2 regularization, offering insights into its performance improvement. These findings underscore the importance of optimization trajectory rather than sharpness properties at convergence in achieving SAMâ€™s label noise robustness.

Through experimental investigations on toy Gaussian data with label noise, SAM demonstrates significantly higher early-stopping test accuracy compared to SGD. Analyzing SAMâ€™s update process, it becomes evident that its adversarial weight perturbation prioritizes up-weighting the gradient signal from low-loss points, thereby maintaining high contributions from clean examples in the early training epochs. This preference for clean data leads to higher test accuracy before overfitting to noise. The study further sheds light on the role of SAMâ€™s logit scale, showing how it effectively up-weights gradients from low-loss points, consequently improving overall performance. This preference for low-loss points is demonstrated through mathematical proofs and empirical observations, highlighting SAMâ€™s distinct behavior from naive SAM updates.

After simplifying SAMâ€™s regularization to include â„“2 regularization on the last layer weights and last hidden layer intermediate activations in deep network training using SGD. This regularization objective is applied to CIFAR10 with ResNet18 architecture. Due to instability issues with batch normalization, researchers replace it with layer normalization for 1-SAM. Comparing the performance of SGD, 1-SAM, L-SAM, J-SAM, and regularized SGD, they found that while regularized SGD doesnâ€™t match SAMâ€™s test accuracy, the gap significantly narrows from 17% to 9% under label noise. However, in noise-free scenarios, regularized SGD only marginally improves, while SAM maintains an 8% advantage over SGD. This suggests that while not fully explaining SAMâ€™s generalization benefits, similar regularization in the final layers is crucial for SAMâ€™s performance, especially in noisy environments.

In conclusion, This work aims to provide a robust perspective on the effectiveness of SAM by demonstrating its ability to prioritize learning clean examples before fitting noisy ones, particularly in the presence of label noise. In linear models, SAM explicitly up-weights gradients from low loss points, akin to existing label noise robustness methods. In nonlinear settings, SAMâ€™s regularization of intermediate activations and final layer weights improves label noise robustness, similar to methods that regulate logitsâ€™ norm. Despite their similarities, SAM remains underexplored in the label noise domain. Nonetheless, simulating aspects of SAMâ€™s regularization of the network Jacobian can preserve its performance, suggesting potential for developing label-noise robustness methods inspired by SAMâ€™s principles, albeit without the additional runtime costs of 1-SAM.

Check out theÂ Paper.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 41k+ ML SubReddit

The post Exploring Sharpness-Aware Minimization (SAM): Insights into Label Noise Robustness and Generalization appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Exploring Sharpness-Aware Minimization (SAM): Insights into Label Noise Robustness and Generalization

LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

Universal Design in Pharmacies – WCAG – Understandable

Chrome on Android is making it easier to access bookmarks and history

Charting the Impact of ChatGPT: Transforming Human Skills in the Age of Generative AI

CVE-2025-43016 – JetBrains Rider Unvalidated Archive Unpacking Vulnerability

Why this versatile air pump is my new must-have for traveling (and it’s only $42)

13 Free AI Courses on AI Agents in 2025

CVE-2024-47893 – VMware GPU Firmware Memory Disclosure

Exploring Sharpness-Aware Minimization (SAM): Insights into Label Noise Robustness and Generalization

Related Posts