Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry
- URL: http://arxiv.org/abs/2502.08448v1
- Date: Wed, 12 Feb 2025 14:40:19 GMT
- Title: Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry
- Authors: Albert Kjøller Jacobsen, Georgios Arvanitidis,
- Abstract summary: Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation.
We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM.
We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
- Score: 2.854482269849925
- License:
- Abstract: Recent studies on deep neural networks show that flat minima of the loss landscape correlate with improved generalization. Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation. The perturbation depends on the Euclidean metric, making SAM non-invariant under reparametrizations, which blurs sharpness and generalization. We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM by considering a Riemannian metric in the parameter space induced naturally by the loss surface. Compared to previous approaches, M-SAM works under any modeling choice, relies only on mild assumptions while being as computationally efficient as SAM. We theoretically argue that M-SAM varies between SAM and gradient descent (GD), which increases robustness to hyperparameter selection and reduces attraction to suboptimal equilibria like saddle points. We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
Related papers
- Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning [63.55145330447408]
We propose a novel textbfSelf-textbfPerceptinon textbfTuning (textbfSPT) method for anomaly segmentation.
The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process.
arXiv Detail & Related papers (2024-11-26T08:33:25Z) - μP$^2$: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling [49.25546155981064]
We study the infinite-width limit of neural networks trained with Sharpness Aware Minimization (SAM)
Our findings reveal that the dynamics of standard SAM effectively reduce to applying SAM solely in the last layer in wide neural networks.
In contrast, we identify a stable parameterization with layerwise scaling, which we call $textitMaximal Update and Perturbation $ ($mu$P$2$), that ensures all layers are both feature learning and effectively perturbed in the limit.
arXiv Detail & Related papers (2024-10-31T16:32:04Z) - Reweighting Local Mimina with Tilted SAM [24.689230137012174]
Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of over infinity by seeking flat minima on flatter loss.
In this work, we propose TSAM (TSAM) that effectively assigns higher priority to local solutions that are flatter and that incur losses.
arXiv Detail & Related papers (2024-10-30T02:49:48Z) - Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS)
In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS)
By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z) - SAM-PARSER: Fine-tuning SAM Efficiently by Parameter Space
Reconstruction [53.871596866809725]
Segment Anything Model (SAM) has received remarkable attention as it offers a powerful and versatile solution for object segmentation in images.
We propose fine-tuning SAM efficiently by parameter space reconstruction (SAM-PARSER)
We obtain the bases by matrix decomposition, and fine-tuning the coefficients to reconstruct the parameter space tailored to the new scenario by an optimal linear combination of the bases.
arXiv Detail & Related papers (2023-08-28T14:17:16Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima.
We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z) - mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization [20.560184120992094]
Sharpness-Aware Minimization technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima.
We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM.
arXiv Detail & Related papers (2023-02-19T23:27:12Z) - Stability Analysis of Sharpness-Aware Minimization [5.024497308975435]
Sharpness-aware (SAM) is a recently proposed training method that seeks to find flat minima in deep learning.
In this paper, we demonstrate that SAM dynamics can have convergence instability that occurs near a saddle point.
arXiv Detail & Related papers (2023-01-16T08:42:40Z) - Improved Deep Neural Network Generalization Using m-Sharpness-Aware
Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima.
Recent work suggests that mSAM can outperform SAM in terms of test accuracy.
This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z) - Rethinking Sharpness-Aware Minimization as Variational Inference [1.749935196721634]
Sharpness-aware (SAM) aims to improve the generalisation of gradient-based learning by seeking out flat minima.
We establish connections between SAM and Mean-Field Variational Inference (MFVI) of neural network parameters.
arXiv Detail & Related papers (2022-10-19T10:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.