Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Exploring Sharpness-Aware Minimization (SAM): Insights into Label Noise Robustness and Generalization

    Exploring Sharpness-Aware Minimization (SAM): Insights into Label Noise Robustness and Generalization

    May 9, 2024

    Recently, there’s been increasing interest in enhancing deep networks’ generalization by regulating loss landscape sharpness. Sharpness Aware Minimization (SAM) has gained popularity for its superior performance on various benchmarks, specifically in managing random label noise, outperforming SGD by significant margins. SAM’s robustness shines particularly in scenarios with label noise, showcasing substantial improvements over existing techniques. Also, SAM’s effectiveness persists even with under-parameterization, potentially increasing gains with larger datasets. Understanding SAM’s behavior, especially in the early learning phases, becomes crucial in optimizing its performance.

    While SAM’s underlying mechanisms remain elusive, several studies have attempted to shed light on the significance of per-example regularization in 1-SAM. Some researchers demonstrated that in sparse regression, 1-SAM exhibits a bias towards sparser weights compared to naive SAM. Prior studies also differentiate between the two by highlighting differences in the regularization of “flatness.” Recent research links naive SAM to generalization, underscoring the importance of understanding SAM’s behavior beyond convergence.

    Carnegie Mellon University researchers provide a study that investigates why 1-SAM demonstrates greater robustness to label noise compared to SGD at a mechanistic level. By analyzing the gradient decomposition of each example, particularly focusing on the logit scale and network Jacobian terms, the research identifies key mechanisms enhancing early-stopping test accuracy. In linear models, SAM’s explicit up-weighting of low loss points proves beneficial, especially in the presence of mislabeled examples. Empirical findings suggest that SAM’s label noise robustness originates primarily from its Jacobian term in deep networks, indicating a fundamentally different mechanism compared to the logit scale term. Also, analyzing Jacobian-only SAM reveals a decomposition into SGD with ℓ2 regularization, offering insights into its performance improvement. These findings underscore the importance of optimization trajectory rather than sharpness properties at convergence in achieving SAM’s label noise robustness.

    Through experimental investigations on toy Gaussian data with label noise, SAM demonstrates significantly higher early-stopping test accuracy compared to SGD. Analyzing SAM’s update process, it becomes evident that its adversarial weight perturbation prioritizes up-weighting the gradient signal from low-loss points, thereby maintaining high contributions from clean examples in the early training epochs. This preference for clean data leads to higher test accuracy before overfitting to noise. The study further sheds light on the role of SAM’s logit scale, showing how it effectively up-weights gradients from low-loss points, consequently improving overall performance. This preference for low-loss points is demonstrated through mathematical proofs and empirical observations, highlighting SAM’s distinct behavior from naive SAM updates.

    After simplifying SAM’s regularization to include ℓ2 regularization on the last layer weights and last hidden layer intermediate activations in deep network training using SGD. This regularization objective is applied to CIFAR10 with ResNet18 architecture. Due to instability issues with batch normalization, researchers replace it with layer normalization for 1-SAM. Comparing the performance of SGD, 1-SAM, L-SAM, J-SAM, and regularized SGD, they found that while regularized SGD doesn’t match SAM’s test accuracy, the gap significantly narrows from 17% to 9% under label noise. However, in noise-free scenarios, regularized SGD only marginally improves, while SAM maintains an 8% advantage over SGD. This suggests that while not fully explaining SAM’s generalization benefits, similar regularization in the final layers is crucial for SAM’s performance, especially in noisy environments.

    In conclusion, This work aims to provide a robust perspective on the effectiveness of SAM by demonstrating its ability to prioritize learning clean examples before fitting noisy ones, particularly in the presence of label noise. In linear models, SAM explicitly up-weights gradients from low loss points, akin to existing label noise robustness methods. In nonlinear settings, SAM’s regularization of intermediate activations and final layer weights improves label noise robustness, similar to methods that regulate logits’ norm. Despite their similarities, SAM remains underexplored in the label noise domain. Nonetheless, simulating aspects of SAM’s regularization of the network Jacobian can preserve its performance, suggesting potential for developing label-noise robustness methods inspired by SAM’s principles, albeit without the additional runtime costs of 1-SAM.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 41k+ ML SubReddit

    The post Exploring Sharpness-Aware Minimization (SAM): Insights into Label Noise Robustness and Generalization appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleEstablishing an AI/ML center of excellence
    Next Article Rightsify’s GCX: Your Go-To Source for High-Quality, Ethically Sourced, Copyright-Cleared AI Music Training Datasets with Rich Metadata

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

    Machine Learning
    Universal Design in Pharmacies – WCAG – Understandable

    Universal Design in Pharmacies – WCAG – Understandable

    Development

    Chrome on Android is making it easier to access bookmarks and history

    Development

    Charting the Impact of ChatGPT: Transforming Human Skills in the Age of Generative AI

    Development

    Highlights

    CVE-2025-43016 – JetBrains Rider Unvalidated Archive Unpacking Vulnerability

    April 25, 2025

    CVE ID : CVE-2025-43016

    Published : April 25, 2025, 3:15 p.m. | 3 hours, 46 minutes ago

    Description : In JetBrains Rider before 2025.1.2 custom archive unpacker allowed arbitrary file overwrite during remote debug session

    Severity: 5.4 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Why this versatile air pump is my new must-have for traveling (and it’s only $42)

    July 27, 2024

    13 Free AI Courses on AI Agents in 2025

    January 1, 2025

    CVE-2024-47893 – VMware GPU Firmware Memory Disclosure

    May 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.