Asynchronous Sharpness-Aware Minimization For Fast and Accurate Deep Learning
- URL: http://arxiv.org/abs/2503.11147v1
- Date: Fri, 14 Mar 2025 07:34:39 GMT
- Title: Asynchronous Sharpness-Aware Minimization For Fast and Accurate Deep Learning
- Authors: Junhyuk Jo, Jihyun Lim, Sunwoo Lee,
- Abstract summary: Sharpness-Aware Minimization (SAM) is an optimization method that improves generalization performance of machine learning models.<n>Despite its superior generalization, SAM has not been actively used in real-world applications due to its expensive computational cost.<n>We propose a novel asynchronous-parallel SAM which achieves nearly the same gradient normizing effect like the original SAM while breaking the data dependency between the model perturbation and the model update.
- Score: 5.77502465665279
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sharpness-Aware Minimization (SAM) is an optimization method that improves generalization performance of machine learning models. Despite its superior generalization, SAM has not been actively used in real-world applications due to its expensive computational cost. In this work, we propose a novel asynchronous-parallel SAM which achieves nearly the same gradient norm penalizing effect like the original SAM while breaking the data dependency between the model perturbation and the model update. The proposed asynchronous SAM can even entirely hide the model perturbation time by adjusting the batch size for the model perturbation in a system-aware manner. Thus, the proposed method enables to fully utilize heterogeneous system resources such as CPUs and GPUs. Our extensive experiments well demonstrate the practical benefits of the proposed asynchronous approach. E.g., the asynchronous SAM achieves comparable Vision Transformer fine-tuning accuracy (CIFAR-100) as the original SAM while having almost the same training time as SGD.
Related papers
- Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks [4.877624278656814]
Sharpness-Aware Minimization (SAM) has proven highly effective in improving model generalization in machine learning tasks.
We propose the SAM with Adaptive Regularization (SAMAR), which introduces a flexible sharpness ratio rule to update the regularization parameter dynamically.
arXiv Detail & Related papers (2024-12-22T04:40:02Z) - SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks.
Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost.
We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z) - Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM)
AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency.
As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z) - SlimSAM: 0.1% Data Makes Segment Anything Slim [52.96232442322824]
We introduce SlimSAM, a novel data-efficient SAM compression method.
SlimSAM achieves superior performance with extremely less training data.
The code is available at http://github.com/czg1225/SlimSAM.
arXiv Detail & Related papers (2023-12-08T12:48:53Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - An Adaptive Policy to Employ Sharpness-Aware Minimization [5.5347134457499845]
Sharpness-aware minimization (SAM) searches for flat minima by min-max optimization.
Recent state-of-the-arts reduce the fraction of SAM updates.
Two efficient algorithms, AE-SAM and AE-LookSAM, are proposed.
arXiv Detail & Related papers (2023-04-28T06:23:32Z) - Improved Deep Neural Network Generalization Using m-Sharpness-Aware
Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima.
Recent work suggests that mSAM can outperform SAM in terms of test accuracy.
This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z) - Improving Sharpness-Aware Minimization with Fisher Mask for Better
Generalization on Language Models [93.85178920914721]
Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor computation.
We propose a novel optimization procedure, namely FSAM, which introduces a Fisher mask to improve the efficiency and performance of SAM.
We show that FSAM consistently outperforms the vanilla SAM by 0.671.98 average score among four different pretrained models.
arXiv Detail & Related papers (2022-10-11T14:53:58Z) - Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation
Approach [132.37966970098645]
One of the popular solutions is Sharpness-Aware Minimization (SAM), which minimizes the change of weight loss when adding a perturbation.
In this paper, we propose an efficient effective training scheme coined as Sparse SAM (SSAM), which achieves double overhead of common perturbations.
In addition, we theoretically prove that S can converge at the same SAM, i.e., $O(log T/sqrtTTTTTTTTTTTTTTTTT
arXiv Detail & Related papers (2022-10-11T06:30:10Z) - Randomized Sharpness-Aware Training for Boosting Computational
Efficiency in Deep Learning [13.937644559223548]
We propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST).s in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM)
We show that G-RST can outperform SAM in most cases while saving 50% extra cost.
arXiv Detail & Related papers (2022-03-18T13:57:17Z) - Towards Efficient and Scalable Sharpness-Aware Minimization [81.22779501753695]
We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent.
LookSAM achieves similar accuracy gains to SAM while being tremendously faster.
We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
arXiv Detail & Related papers (2022-03-05T11:53:37Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.