K-SAM: Sharpness-Aware Minimization at the Speed of SGD
- URL: http://arxiv.org/abs/2210.12864v1
- Date: Sun, 23 Oct 2022 21:49:58 GMT
- Title: K-SAM: Sharpness-Aware Minimization at the Speed of SGD
- Authors: Renkun Ni, Ping-yeh Chiang, Jonas Geiping, Micah Goldblum, Andrew
Gordon Wilson, Tom Goldstein
- Abstract summary: Sharpness-Aware Minimization (SAM) has emerged as a robust technique for improving the accuracy of deep neural networks.
SAM incurs a high computational cost in practice, requiring up to twice as much computation as vanilla SGD.
We propose to compute gradients in both stages of SAM on only the top-k samples with highest loss.
- Score: 83.78737278889837
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sharpness-Aware Minimization (SAM) has recently emerged as a robust technique
for improving the accuracy of deep neural networks. However, SAM incurs a high
computational cost in practice, requiring up to twice as much computation as
vanilla SGD. The computational challenge posed by SAM arises because each
iteration requires both ascent and descent steps and thus double the gradient
computations. To address this challenge, we propose to compute gradients in
both stages of SAM on only the top-k samples with highest loss. K-SAM is simple
and extremely easy-to-implement while providing significant generalization
boosts over vanilla SGD at little to no additional cost.
Related papers
- SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks.
Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost.
We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z) - Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM)
AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency.
As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Momentum-SAM: Sharpness Aware Minimization without Computational
Overhead [0.6577148087211809]
We propose Momentum-SAM, which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands.
We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.
arXiv Detail & Related papers (2024-01-22T15:19:18Z) - Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance.
We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step.
Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z) - Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation
Approach [132.37966970098645]
One of the popular solutions is Sharpness-Aware Minimization (SAM), which minimizes the change of weight loss when adding a perturbation.
In this paper, we propose an efficient effective training scheme coined as Sparse SAM (SSAM), which achieves double overhead of common perturbations.
In addition, we theoretically prove that S can converge at the same SAM, i.e., $O(log T/sqrtTTTTTTTTTTTTTTTTT
arXiv Detail & Related papers (2022-10-11T06:30:10Z) - Towards Efficient and Scalable Sharpness-Aware Minimization [81.22779501753695]
We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent.
LookSAM achieves similar accuracy gains to SAM while being tremendously faster.
We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
arXiv Detail & Related papers (2022-03-05T11:53:37Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.