Related papers: An SDE for Modeling SAM: Theory and Insights

An SDE for Modeling SAM: Theory and Insights

URL: http://arxiv.org/abs/2301.08203v3
Date: Sun, 4 Jun 2023 19:54:21 GMT
Title: An SDE for Modeling SAM: Theory and Insights
Authors: Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto, Frank Norbert Proske, Hans Kersting, Aurelien Lucchi
Abstract summary: We study the SAM (Sharpness-Aware Minimization) which has recently attracted a lot of interest due to its increased performance over classical variants of descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two gradient of its variants, both for the full-batch and mini-batch settings.
Score: 7.1967126772249586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.

Related papers

Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning [52.63618112418439]
Sharpness-aware computation (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning.<n>We propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers.
arXiv Detail & Related papers (2026-02-10T04:05:43Z)
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization [92.17866492331524]
Sharpness-Aware Minimization (SAM) enhances the ability of the machine learning model by exploring the flat minima landscape through weight perturbations.<n>SAM introduces an additional hyper- parameter, the perturbation radius, which causes the sensitivity of SAM to it.<n>In this paper, we propose the algorithm LightSAM which sets the perturbation radius and learning rate of SAM adaptively.
arXiv Detail & Related papers (2025-05-30T09:28:38Z)
Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry [2.854482269849925]
Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation. We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM. We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
arXiv Detail & Related papers (2025-02-12T14:40:19Z)
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training [47.25594539120258]
We find that Sharpness-Aware Minimization (SAM) efficiently selects flatter minima late in training. Even a few epochs of SAM applied at the end of training yield nearly the same generalization and solution sharpness as full SAM training. We conjecture that the optimization method chosen in the late phase is more crucial in shaping the final solution's properties.
arXiv Detail & Related papers (2024-10-14T10:56:42Z)
Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS) In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS) By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes. SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation. In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z)
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks. Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored. We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z)
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization [20.560184120992094]
Sharpness-Aware Minimization technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM.
arXiv Detail & Related papers (2023-02-19T23:27:12Z)
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon [15.332235979022036]
The Sharpness Aware Minimization (SAM) algorithm has been shown to control large eigenvalues of the loss Hessian. We show that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters.
arXiv Detail & Related papers (2023-02-17T04:51:20Z)
Stability Analysis of Sharpness-Aware Minimization [5.024497308975435]
Sharpness-aware (SAM) is a recently proposed training method that seeks to find flat minima in deep learning. In this paper, we demonstrate that SAM dynamics can have convergence instability that occurs near a saddle point.
arXiv Detail & Related papers (2023-01-16T08:42:40Z)
Towards Understanding Sharpness-Aware Minimization [27.666483899332643]
We argue that the existing justifications for the success of Sharpness-Aware Minimization (SAM) are based on a PACBayes generalization. We theoretically analyze its implicit bias for diagonal linear networks. We show that fine-tuning a standard model with SAM can be shown significant improvements on the properties of non-sharp networks.
arXiv Detail & Related papers (2022-06-13T15:07:32Z)
Towards Efficient and Scalable Sharpness-Aware Minimization [81.22779501753695]
We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent. LookSAM achieves similar accuracy gains to SAM while being tremendously faster. We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
arXiv Detail & Related papers (2022-03-05T11:53:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.