Sharpness-Aware Minimization for Efficiently Improving Generalization
- URL: http://arxiv.org/abs/2010.01412v3
- Date: Thu, 29 Apr 2021 16:44:25 GMT
- Title: Sharpness-Aware Minimization for Efficiently Improving Generalization
- Authors: Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur
- Abstract summary: We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness.
Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss.
We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
- Score: 36.87818971067698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In today's heavily overparameterized models, the value of the training loss
provides few guarantees on model generalization ability. Indeed, optimizing
only the training loss value, as is commonly done, can easily lead to
suboptimal model quality. Motivated by prior work connecting the geometry of
the loss landscape and generalization, we introduce a novel, effective
procedure for instead simultaneously minimizing loss value and loss sharpness.
In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks
parameters that lie in neighborhoods having uniformly low loss; this
formulation results in a min-max optimization problem on which gradient descent
can be performed efficiently. We present empirical results showing that SAM
improves model generalization across a variety of benchmark datasets (e.g.,
CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel
state-of-the-art performance for several. Additionally, we find that SAM
natively provides robustness to label noise on par with that provided by
state-of-the-art procedures that specifically target learning with noisy
labels. We open source our code at
\url{https://github.com/google-research/sam}.
Related papers
- A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - CR-SAM: Curvature Regularized Sharpness-Aware Minimization [8.248964912483912]
Sharpness-Aware Minimization (SAM) aims to enhance the generalizability by minimizing worst-case loss using one-step gradient ascent as an approximation.
In this paper, we introduce a normalized Hessian trace to accurately measure the curvature of loss landscape on em both training and test sets.
In particular, to counter excessive non-linearity of loss landscape, we propose Curvature Regularized SAM (CR-SAM)
arXiv Detail & Related papers (2023-12-21T03:46:29Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - Gradient constrained sharpness-aware prompt learning for vision-language
models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM)
By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness.
We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.