Related papers: The Crucial Role of Normalization in Sharpness-Aware Minimization

The Crucial Role of Normalization in Sharpness-Aware Minimization

URL: http://arxiv.org/abs/2305.15287v2
Date: Mon, 23 Oct 2023 16:12:38 GMT
Title: The Crucial Role of Normalization in Sharpness-Aware Minimization
Authors: Yan Dai, Kwangjun Ahn, Suvrit Sra
Abstract summary: Sharpness-Aware Minimization (SAM) is a gradient-based neural network that greatly improves prediction performance. We argue that two properties of normalization make SAM robust against the choice of hyper- practicalitys.
Score: 44.00155917998616
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically and empirically study the effect of normalization in SAM for both convex and non-convex functions, revealing two key roles played by normalization: i) it helps in stabilizing the algorithm; and ii) it enables the algorithm to drift along a continuum (manifold) of minima -- a property identified by recent theoretical works that is the key to better performance. We further argue that these two properties of normalization make SAM robust against the choice of hyper-parameters, supporting the practicality of SAM. Our conclusions are backed by various experiments.

Related papers

LightSAM: Parameter-Agnostic Sharpness-Aware Minimization [92.17866492331524]
Sharpness-Aware Minimization (SAM) enhances the ability of the machine learning model by exploring the flat minima landscape through weight perturbations.<n>SAM introduces an additional hyper- parameter, the perturbation radius, which causes the sensitivity of SAM to it.<n>In this paper, we propose the algorithm LightSAM which sets the perturbation radius and learning rate of SAM adaptively.
arXiv Detail & Related papers (2025-05-30T09:28:38Z)
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification [113.6840565194525]
Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult.<n>Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM) to improve generalization by flattening the loss landscape.<n>We introduce Focal-SAM, which assigns different penalties to class-wise, achieving fine-grained control without extra backpropagations.
arXiv Detail & Related papers (2025-05-03T03:01:28Z)
Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization [37.515131384121204]
Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks. We analyze SAM's training dynamics using maximum eigenvalue of the Hessian as a measure of sharpness. We introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue.
arXiv Detail & Related papers (2025-01-22T06:03:16Z)
$\boldsymbolμ\mathbf{P^2}$: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling [49.25546155981064]
We study the infinite-width limit of neural networks trained with Sharpness Aware Minimization (SAM) Our findings reveal that the dynamics of standard SAM effectively reduce to applying SAM solely in the last layer in wide neural networks. In contrast, we identify a stable parameterization with layerwise scaling perturbation, which we call $textitMaximal Update and Perturbation $ ($mu$P$2$), that ensures all layers are both feature learning and effectively perturbed in the limit.
arXiv Detail & Related papers (2024-10-31T16:32:04Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Critical Influence of Overparameterization on Sharpness-aware Minimization [12.321517302762558]
We show that sharpness-aware minimization (SAM) is affected by over parameterization. We show that this effect is particularly pronounced in practical settings involving label noise and sparsity. We also provide insights into how over parameterization helps SAM achieve minima with more uniform Hessian moments compared to SGD.
arXiv Detail & Related papers (2023-11-29T11:19:50Z)
Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima. We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z)
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks. Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored. We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z)
On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees [5.91402820967386]
We present a new theoretical explanation of why Sharpness-Aware Minimization (SAM) generalizes well. SAM is particularly well-suited for both sharp and non-sharp problems. Our findings are validated using numerical experiments on deep neural networks.
arXiv Detail & Related papers (2023-02-23T07:52:31Z)
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization [20.560184120992094]
Sharpness-Aware Minimization technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM.
arXiv Detail & Related papers (2023-02-19T23:27:12Z)
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon [15.332235979022036]
The Sharpness Aware Minimization (SAM) algorithm has been shown to control large eigenvalues of the loss Hessian. We show that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters.
arXiv Detail & Related papers (2023-02-17T04:51:20Z)
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z)
How Does Sharpness-Aware Minimization Minimize Sharpness? [29.90109733192208]
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism.
arXiv Detail & Related papers (2022-11-10T17:56:38Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.