Related papers: Towards Understanding Sharpness-Aware Minimization

Towards Understanding Sharpness-Aware Minimization

URL: http://arxiv.org/abs/2206.06232v1
Date: Mon, 13 Jun 2022 15:07:32 GMT
Title: Towards Understanding Sharpness-Aware Minimization
Authors: Maksym Andriushchenko, Nicolas Flammarion
Abstract summary: We argue that the existing justifications for the success of Sharpness-Aware Minimization (SAM) are based on a PACBayes generalization. We theoretically analyze its implicit bias for diagonal linear networks. We show that fine-tuning a standard model with SAM can be shown significant improvements on the properties of non-sharp networks.
Score: 27.666483899332643
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. Moreover, there are no explanations for the success of using $m$-sharpness in SAM which has been shown as essential for generalization. To better understand this aspect of SAM, we theoretically analyze its implicit bias for diagonal linear networks. We prove that SAM always chooses a solution that enjoys better generalization properties than standard gradient descent for a certain class of problems, and this effect is amplified by using $m$-sharpness. We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/tml-epfl/understanding-sam.

Related papers

Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification [113.6840565194525]
Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult.<n>Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM) to improve generalization by flattening the loss landscape.<n>We introduce Focal-SAM, which assigns different penalties to class-wise, achieving fine-grained control without extra backpropagations.
arXiv Detail & Related papers (2025-05-03T03:01:28Z)
Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS) In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS) By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z)
Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics [10.304082706818562]
We show that perturbations in sharpness-aware (SAM) perturbations perform forgetting, where they discard undesirable model biases to exhibit learning signals that perturbed better. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.
arXiv Detail & Related papers (2024-06-10T18:02:48Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Why Does Sharpness-Aware Minimization Generalize Better Than SGD? [102.40907275290891]
We show why Sharpness-Aware Minimization (SAM) generalizes better than Gradient Descent (SGD) for certain data model and two-layer convolutional ReLU networks. Our result explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, thereby facilitating more effective learning of features.
arXiv Detail & Related papers (2023-10-11T07:51:10Z)
ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition [62.20538402226608]
We show that the Sharpness-Aware Minimization (SAM) fails to address generalization issues under the class-imbalanced setting. We propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM) to overcome this bottleneck. Our ImbSAM demonstrates remarkable performance improvements for tail classes and anomaly.
arXiv Detail & Related papers (2023-08-15T14:46:32Z)
Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima. We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z)
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term [4.719514928428503]
We propose a more general method, called WSAM, by incorporating sharpness as a regularization term. We prove its generalization bound through the combination of PAC and Bayes-PAC techniques. The results demonstrate that WSAM achieves improved generalization, or is at least highly competitive, compared to the vanilla, SAM and its variants.
arXiv Detail & Related papers (2023-05-25T08:00:34Z)
On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees [5.91402820967386]
We present a new theoretical explanation of why Sharpness-Aware Minimization (SAM) generalizes well. SAM is particularly well-suited for both sharp and non-sharp problems. Our findings are validated using numerical experiments on deep neural networks.
arXiv Detail & Related papers (2023-02-23T07:52:31Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.