Sharpness-Aware Minimization Revisited: Weighted Sharpness as a
Regularization Term
- URL: http://arxiv.org/abs/2305.15817v2
- Date: Fri, 9 Jun 2023 07:58:13 GMT
- Title: Sharpness-Aware Minimization Revisited: Weighted Sharpness as a
Regularization Term
- Authors: Yun Yue, Jiadi Jiang, Zhiling Ye, Ning Gao, Yongchao Liu, Ke Zhang
- Abstract summary: We propose a more general method, called WSAM, by incorporating sharpness as a regularization term.
We prove its generalization bound through the combination of PAC and Bayes-PAC techniques.
The results demonstrate that WSAM achieves improved generalization, or is at least highly competitive, compared to the vanilla, SAM and its variants.
- Score: 4.719514928428503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) generalization is known to be closely related to
the flatness of minima, leading to the development of Sharpness-Aware
Minimization (SAM) for seeking flatter minima and better generalization. In
this paper, we revisit the loss of SAM and propose a more general method,
called WSAM, by incorporating sharpness as a regularization term. We prove its
generalization bound through the combination of PAC and Bayes-PAC techniques,
and evaluate its performance on various public datasets. The results
demonstrate that WSAM achieves improved generalization, or is at least highly
competitive, compared to the vanilla optimizer, SAM and its variants. The code
is available at
https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers.
Related papers
- SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks.
Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost.
We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z) - Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS)
In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS)
By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z) - Improving SAM Requires Rethinking its Optimization Formulation [57.601718870423454]
Sharpness-Aware Minimization (SAM) is originally formulated as a zero-sum game where the weights of a network and a bounded perturbation try to minimize/maximize, respectively, the same differentiable loss.
We argue that SAM should instead be reformulated using the 0-1 loss. As a continuous relaxation, we follow the simple conventional approach where the minimizing (maximizing) player uses an upper bound (lower bound) surrogate to the 0-1 loss. This leads to a novel formulation of SAM as a bilevel optimization problem, dubbed as BiSAM.
arXiv Detail & Related papers (2024-07-17T20:22:33Z) - Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics [10.304082706818562]
We show that perturbations in sharpness-aware (SAM) perturbations perform forgetting, where they discard undesirable model biases to exhibit learning signals that perturbed better.
Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.
arXiv Detail & Related papers (2024-06-10T18:02:48Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - ImbSAM: A Closer Look at Sharpness-Aware Minimization in
Class-Imbalanced Recognition [62.20538402226608]
We show that the Sharpness-Aware Minimization (SAM) fails to address generalization issues under the class-imbalanced setting.
We propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM) to overcome this bottleneck.
Our ImbSAM demonstrates remarkable performance improvements for tail classes and anomaly.
arXiv Detail & Related papers (2023-08-15T14:46:32Z) - Sharpness-Aware Minimization Alone can Improve Adversarial Robustness [7.9810915020234035]
We explore Sharpness-Aware Minimization (SAM) in the context of adversarial robustness.
We find that using only SAM can achieve superior adversarial robustness without sacrificing clean accuracy compared to standard training.
We show that SAM and adversarial training (AT) differ in terms of perturbation strength, leading to different accuracy and robustness trade-offs.
arXiv Detail & Related papers (2023-05-09T12:39:21Z) - Towards Understanding Sharpness-Aware Minimization [27.666483899332643]
We argue that the existing justifications for the success of Sharpness-Aware Minimization (SAM) are based on a PACBayes generalization.
We theoretically analyze its implicit bias for diagonal linear networks.
We show that fine-tuning a standard model with SAM can be shown significant improvements on the properties of non-sharp networks.
arXiv Detail & Related papers (2022-06-13T15:07:32Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.