Stability Analysis of Sharpness-Aware Minimization
- URL: http://arxiv.org/abs/2301.06308v1
- Date: Mon, 16 Jan 2023 08:42:40 GMT
- Title: Stability Analysis of Sharpness-Aware Minimization
- Authors: Hoki Kim, Jinseong Park, Yujin Choi, and Jaewook Lee
- Abstract summary: Sharpness-aware (SAM) is a recently proposed training method that seeks to find flat minima in deep learning.
In this paper, we demonstrate that SAM dynamics can have convergence instability that occurs near a saddle point.
- Score: 5.024497308975435
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sharpness-aware minimization (SAM) is a recently proposed training method
that seeks to find flat minima in deep learning, resulting in state-of-the-art
performance across various domains. Instead of minimizing the loss of the
current weights, SAM minimizes the worst-case loss in its neighborhood in the
parameter space. In this paper, we demonstrate that SAM dynamics can have
convergence instability that occurs near a saddle point. Utilizing the
qualitative theory of dynamical systems, we explain how SAM becomes stuck in
the saddle point and then theoretically prove that the saddle point can become
an attractor under SAM dynamics. Additionally, we show that this convergence
instability can also occur in stochastic dynamical systems by establishing the
diffusion of SAM. We prove that SAM diffusion is worse than that of vanilla
gradient descent in terms of saddle point escape. Further, we demonstrate that
often overlooked training tricks, momentum and batch-size, are important to
mitigate the convergence instability and achieve high generalization
performance. Our theoretical and empirical results are thoroughly verified
through experiments on several well-known optimization problems and benchmark
tasks.
Related papers
- Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Stabilizing Sharpness-aware Minimization Through A Simple
Renormalization Strategy [12.927965934262847]
Training neural networks with sharpness-aware (SAM) can be highly unstable.
We propose a simple renormalization strategy, dubbed StableSAM, so that the norm of the surrogate gradient maintains the same as that of the exact gradient.
We show how StableSAM extends this regime of learning rate and when it can consistently perform better than SAM with minor modification.
arXiv Detail & Related papers (2024-01-14T10:53:36Z) - Critical Influence of Overparameterization on Sharpness-aware Minimization [12.321517302762558]
We show that sharpness-aware minimization (SAM) strategy is affected by over parameterization.
We prove multiple theoretical benefits of over parameterization for SAM to attain (i) minima with more uniform Hessian moments compared to SGD, (ii) much faster convergence at a linear rate, and (iii) lower test error for two-layer networks.
arXiv Detail & Related papers (2023-11-29T11:19:50Z) - Enhancing Sharpness-Aware Optimization Through Variance Suppression [48.908966673827734]
This work embraces the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability.
It seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood.
Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization.
arXiv Detail & Related papers (2023-09-27T13:18:23Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - On Statistical Properties of Sharpness-Aware Minimization: Provable
Guarantees [5.91402820967386]
We present a new theoretical explanation of why Sharpness-Aware Minimization (SAM) generalizes well.
SAM is particularly well-suited for both sharp and non-sharp problems.
Our findings are validated using numerical experiments on deep neural networks.
arXiv Detail & Related papers (2023-02-23T07:52:31Z) - SAM operates far from home: eigenvalue regularization as a dynamical
phenomenon [15.332235979022036]
The Sharpness Aware Minimization (SAM) algorithm has been shown to control large eigenvalues of the loss Hessian.
We show that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory.
Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters.
arXiv Detail & Related papers (2023-02-17T04:51:20Z) - An SDE for Modeling SAM: Theory and Insights [7.1967126772249586]
We study the SAM (Sharpness-Aware Minimization) which has recently attracted a lot of interest due to its increased performance over classical variants of descent.
Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two gradient of its variants, both for the full-batch and mini-batch settings.
arXiv Detail & Related papers (2023-01-19T17:54:50Z) - Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation
Approach [132.37966970098645]
One of the popular solutions is Sharpness-Aware Minimization (SAM), which minimizes the change of weight loss when adding a perturbation.
In this paper, we propose an efficient effective training scheme coined as Sparse SAM (SSAM), which achieves double overhead of common perturbations.
In addition, we theoretically prove that S can converge at the same SAM, i.e., $O(log T/sqrtTTTTTTTTTTTTTTTTT
arXiv Detail & Related papers (2022-10-11T06:30:10Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.