Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
- URL: http://arxiv.org/abs/2509.18001v2
- Date: Mon, 27 Oct 2025 14:49:07 GMT
- Title: Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
- Authors: Haocheng Luo, Mehrtash Harandi, Dinh Phung, Trung Le,
- Abstract summary: We investigate the phenomenon known as msharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases.<n>In practice, the empirical m-sharpness effect underpins the deployment of SAM in training, yet a rigorous theoretical account has remained lacking.<n>We introduce by our theoretical insights, we introduce Reweighted SAM (RWSAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable.
- Score: 31.637051623223346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. In practice, the empirical m-sharpness effect underpins the deployment of SAM in distributed training, yet a rigorous theoretical account has remained lacking. To provide a theoretical explanation for m-sharpness, we leverage an extended Stochastic Differential Equation (SDE) framework and analyze the structure of stochastic gradient noise (SGN) to characterize the dynamics of various SAM variants, including n-SAM and m-SAM. Our findings reveal that the stochastic noise introduced during SAM perturbations inherently induces a variance-based sharpness regularization effect. Motivated by our theoretical insights, we introduce Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate the effectiveness of our theoretical analysis and proposed method.
Related papers
- Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning [52.63618112418439]
Sharpness-aware computation (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning.<n>We propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers.
arXiv Detail & Related papers (2026-02-10T04:05:43Z) - LightSAM: Parameter-Agnostic Sharpness-Aware Minimization [92.17866492331524]
Sharpness-Aware Minimization (SAM) enhances the ability of the machine learning model by exploring the flat minima landscape through weight perturbations.<n>SAM introduces an additional hyper- parameter, the perturbation radius, which causes the sensitivity of SAM to it.<n>In this paper, we propose the algorithm LightSAM which sets the perturbation radius and learning rate of SAM adaptively.
arXiv Detail & Related papers (2025-05-30T09:28:38Z) - Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification [113.6840565194525]
Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult.<n>Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM) to improve generalization by flattening the loss landscape.<n>We introduce Focal-SAM, which assigns different penalties to class-wise, achieving fine-grained control without extra backpropagations.
arXiv Detail & Related papers (2025-05-03T03:01:28Z) - Sharpness-Aware Minimization: General Analysis and Improved Rates [10.11126899274029]
Sharpness-Aware Minimization (SAM) has emerged as a powerful method for improving generalization in machine learning models.<n>We provide an analysis of SAM and its unnormalized variant rule rule (USAM) under one update.<n>We present results of the new size under a relaxed more natural assumption.
arXiv Detail & Related papers (2025-03-04T03:04:06Z) - Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry [2.854482269849925]
Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation.<n>We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM.<n>We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
arXiv Detail & Related papers (2025-02-12T14:40:19Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Critical Influence of Overparameterization on Sharpness-aware Minimization [12.321517302762558]
Sharpness-Aware Minimization (SAM) has attracted considerable attention for its effectiveness in improving generalization in deep neural network training.<n>This work presents both empirical and theoretical findings that reveal its critical influence on SAM's effectiveness.
arXiv Detail & Related papers (2023-11-29T11:19:50Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - Stability Analysis of Sharpness-Aware Minimization [5.024497308975435]
Sharpness-aware (SAM) is a recently proposed training method that seeks to find flat minima in deep learning.
In this paper, we demonstrate that SAM dynamics can have convergence instability that occurs near a saddle point.
arXiv Detail & Related papers (2023-01-16T08:42:40Z) - How Does Sharpness-Aware Minimization Minimize Sharpness? [29.90109733192208]
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks.
This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism.
arXiv Detail & Related papers (2022-11-10T17:56:38Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.