Towards Efficient and Scalable Sharpness-Aware Minimization
- URL: http://arxiv.org/abs/2203.02714v1
- Date: Sat, 5 Mar 2022 11:53:37 GMT
- Title: Towards Efficient and Scalable Sharpness-Aware Minimization
- Authors: Yong Liu, Siqi Mai, Xiangning Chen, Cho-Jui Hsieh, Yang You
- Abstract summary: We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent.
LookSAM achieves similar accuracy gains to SAM while being tremendously faster.
We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
- Score: 81.22779501753695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of
the loss landscape and generalization, has demonstrated significant performance
boosts on training large-scale models such as vision transformers. However, the
update rule of SAM requires two sequential (non-parallelizable) gradient
computations at each step, which can double the computational overhead. In this
paper, we propose a novel algorithm LookSAM - that only periodically calculates
the inner gradient ascent, to significantly reduce the additional training cost
of SAM. The empirical results illustrate that LookSAM achieves similar accuracy
gains to SAM while being tremendously faster - it enjoys comparable
computational complexity with first-order optimizers such as SGD or Adam. To
further evaluate the performance and scalability of LookSAM, we incorporate a
layer-wise modification and perform experiments in the large-batch training
scenario, which is more prone to converge to sharp local minima. We are the
first to successfully scale up the batch size when training Vision Transformers
(ViTs). With a 64k batch size, we are able to train ViTs from scratch in
minutes while maintaining competitive performance.
Related papers
- SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks.
Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost.
We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z) - Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models [42.59948316941217]
Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and generalization degradation.
We propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models.
arXiv Detail & Related papers (2024-06-19T01:03:23Z) - Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance.
We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step.
Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - Improved Deep Neural Network Generalization Using m-Sharpness-Aware
Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima.
Recent work suggests that mSAM can outperform SAM in terms of test accuracy.
This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z) - K-SAM: Sharpness-Aware Minimization at the Speed of SGD [83.78737278889837]
Sharpness-Aware Minimization (SAM) has emerged as a robust technique for improving the accuracy of deep neural networks.
SAM incurs a high computational cost in practice, requiring up to twice as much computation as vanilla SGD.
We propose to compute gradients in both stages of SAM on only the top-k samples with highest loss.
arXiv Detail & Related papers (2022-10-23T21:49:58Z) - Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation
Approach [132.37966970098645]
One of the popular solutions is Sharpness-Aware Minimization (SAM), which minimizes the change of weight loss when adding a perturbation.
In this paper, we propose an efficient effective training scheme coined as Sparse SAM (SSAM), which achieves double overhead of common perturbations.
In addition, we theoretically prove that S can converge at the same SAM, i.e., $O(log T/sqrtTTTTTTTTTTTTTTTTT
arXiv Detail & Related papers (2022-10-11T06:30:10Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.