Enhancing Sharpness-Aware Optimization Through Variance Suppression
- URL: http://arxiv.org/abs/2309.15639v3
- Date: Fri, 22 Dec 2023 11:15:44 GMT
- Title: Enhancing Sharpness-Aware Optimization Through Variance Suppression
- Authors: Bingcong Li, Georgios B. Giannakis
- Abstract summary: This work embraces the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability.
It seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood.
Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization.
- Score: 48.908966673827734
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sharpness-aware minimization (SAM) has well documented merits in enhancing
generalization of deep neural networks, even without sizable data augmentation.
Embracing the geometry of the loss function, where neighborhoods of 'flat
minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing
the maximum loss caused by an adversary perturbing parameters within the
neighborhood. Although critical to account for sharpness of the loss function,
such an 'over-friendly adversary' can curtail the outmost level of
generalization. The novel approach of this contribution fosters stabilization
of adversaries through variance suppression (VaSSO) to avoid such friendliness.
VaSSO's provable stability safeguards its numerical improvement over SAM in
model-agnostic tasks, including image classification and machine translation.
In addition, experiments confirm that VaSSO endows SAM with robustness against
high levels of label noise.
Related papers
- Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance.
We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step.
Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z) - Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima.
We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z) - Sharpness-Aware Gradient Matching for Domain Generalization [84.14789746460197]
The goal of domain generalization (DG) is to enhance the generalization capability of the model learned from a source domain to other unseen domains.
The recently developed Sharpness-Aware Minimization (SAM) method aims to achieve this goal by minimizing the sharpness measure of the loss landscape.
We present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM)
Our proposed SAGM method consistently outperforms the state-of-the-art methods on five DG benchmarks.
arXiv Detail & Related papers (2023-03-18T07:25:12Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z) - ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
of Deep Neural Networks [2.8292841621378844]
We introduce the concept of adaptive sharpness which is scale-invariant and propose the corresponding generalization bound.
We suggest a novel learning method, adaptive sharpness-aware minimization (ASAM), utilizing the proposed generalization bound.
Experimental results in various benchmark datasets show that ASAM contributes to significant improvement of model generalization performance.
arXiv Detail & Related papers (2021-02-23T10:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.