Adaptively Sampling-Reusing-Mixing Decomposed Gradients to Speed Up Sharpness Aware Minimization
- URL: http://arxiv.org/abs/2510.03763v1
- Date: Sat, 04 Oct 2025 10:08:14 GMT
- Title: Adaptively Sampling-Reusing-Mixing Decomposed Gradients to Speed Up Sharpness Aware Minimization
- Authors: Jiaxin Deng, Junbiao Pang,
- Abstract summary: Sharpness-Aware Minimization (SAM) improves model generalization but doubles the computational cost.<n>We propose Adaptively sampling-Reusing-mixing gradients to significantly accelerate SAM (ARSAM)<n>ARSAM achieves state-of-the-art accuracies comparable to SAM across diverse network architectures.
- Score: 7.33659299625996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sharpness-Aware Minimization (SAM) improves model generalization but doubles the computational cost of Stochastic Gradient Descent (SGD) by requiring twice the gradient calculations per optimization step. To mitigate this, we propose Adaptively sampling-Reusing-mixing decomposed gradients to significantly accelerate SAM (ARSAM). Concretely, we firstly discover that SAM's gradient can be decomposed into the SGD gradient and the Projection of the Second-order gradient onto the First-order gradient (PSF). Furthermore, we observe that the SGD gradient and PSF dynamically evolve during training, emphasizing the growing role of the PSF to achieve a flat minima. Therefore, ARSAM is proposed to the reused PSF and the timely updated PSF still maintain the model's generalization ability. Extensive experiments show that ARSAM achieves state-of-the-art accuracies comparable to SAM across diverse network architectures. On CIFAR-10/100, ARSAM is comparable to SAM while providing a speedup of about 40\%. Moreover, ARSAM accelerates optimization for the various challenge tasks (\textit{e.g.}, human pose estimation, and model quantization) without sacrificing performance, demonstrating its broad practicality.% The code is publicly accessible at: https://github.com/ajiaaa/ARSAM.
Related papers
- Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning [52.63618112418439]
Sharpness-aware computation (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning.<n>We propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers.
arXiv Detail & Related papers (2026-02-10T04:05:43Z) - SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks.
Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost.
We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z) - Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM)<n>AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency.<n>As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization [19.469113881229646]
Sharpness-aware Minimization (SAM) has been proposed recently to improve model generalization ability.
SAM calculates the gradient twice in each optimization step, thereby doubling the computation costs.
We propose a simple yet efficient sampling method to significantly accelerate SAM.
arXiv Detail & Related papers (2024-02-24T05:48:05Z) - Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance.
We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step.
Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - K-SAM: Sharpness-Aware Minimization at the Speed of SGD [83.78737278889837]
Sharpness-Aware Minimization (SAM) has emerged as a robust technique for improving the accuracy of deep neural networks.
SAM incurs a high computational cost in practice, requiring up to twice as much computation as vanilla SGD.
We propose to compute gradients in both stages of SAM on only the top-k samples with highest loss.
arXiv Detail & Related papers (2022-10-23T21:49:58Z) - Towards Efficient and Scalable Sharpness-Aware Minimization [81.22779501753695]
We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent.
LookSAM achieves similar accuracy gains to SAM while being tremendously faster.
We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
arXiv Detail & Related papers (2022-03-05T11:53:37Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.