Momentum-SAM: Sharpness Aware Minimization without Computational
Overhead
- URL: http://arxiv.org/abs/2401.12033v1
- Date: Mon, 22 Jan 2024 15:19:18 GMT
- Title: Momentum-SAM: Sharpness Aware Minimization without Computational
Overhead
- Authors: Marlon Becker, Frederick Altrock, Benjamin Risse
- Abstract summary: We propose Momentum-SAM, which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands.
We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.
- Score: 0.6577148087211809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recently proposed optimization algorithm for deep neural networks
Sharpness Aware Minimization (SAM) suggests perturbing parameters before
gradient calculation by a gradient ascent step to guide the optimization into
parameter space regions of flat loss. While significant generalization
improvements and thus reduction of overfitting could be demonstrated, the
computational costs are doubled due to the additionally needed gradient
calculation, making SAM unfeasible in case of limited computationally
capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose
Momentum-SAM (MSAM), which perturbs parameters in the direction of the
accumulated momentum vector to achieve low sharpness without significant
computational overhead or memory demands over SGD or Adam. We evaluate MSAM in
detail and reveal insights on separable mechanisms of NAG, SAM and MSAM
regarding training optimization and generalization. Code is available at
https://github.com/MarlonBecker/MSAM.
Related papers
- Reweighting Local Mimina with Tilted SAM [24.689230137012174]
Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of over infinity by seeking flat minima on flatter loss.
In this work, we propose TSAM (TSAM) that effectively assigns higher priority to local solutions that are flatter and that incur losses.
arXiv Detail & Related papers (2024-10-30T02:49:48Z) - Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM)
AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency.
As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - K-SAM: Sharpness-Aware Minimization at the Speed of SGD [83.78737278889837]
Sharpness-Aware Minimization (SAM) has emerged as a robust technique for improving the accuracy of deep neural networks.
SAM incurs a high computational cost in practice, requiring up to twice as much computation as vanilla SGD.
We propose to compute gradients in both stages of SAM on only the top-k samples with highest loss.
arXiv Detail & Related papers (2022-10-23T21:49:58Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.