Related papers: Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

URL: http://arxiv.org/abs/2110.03141v1
Date: Thu, 7 Oct 2021 02:20:37 GMT
Title: Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Authors: Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, Vincent Y. F. Tan
Abstract summary: This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
Score: 146.2011175973769
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.

Related papers

Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification [113.6840565194525]
Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult.<n>Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM) to improve generalization by flattening the loss landscape.<n>We introduce Focal-SAM, which assigns different penalties to class-wise, achieving fine-grained control without extra backpropagations.
arXiv Detail & Related papers (2025-05-03T03:01:28Z)
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM) AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency. As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes. SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation. In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z)
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z)
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach [132.37966970098645]
One of the popular solutions is Sharpness-Aware Minimization (SAM), which minimizes the change of weight loss when adding a perturbation. In this paper, we propose an efficient effective training scheme coined as Sparse SAM (SSAM), which achieves double overhead of common perturbations. In addition, we theoretically prove that S can converge at the same SAM, i.e., $O(log T/sqrtTTTTTTTTTTTTTTTTT
arXiv Detail & Related papers (2022-10-11T06:30:10Z)
Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base. SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z)
Towards Efficient and Scalable Sharpness-Aware Minimization [81.22779501753695]
We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent. LookSAM achieves similar accuracy gains to SAM while being tremendously faster. We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
arXiv Detail & Related papers (2022-03-05T11:53:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.