Related papers: Reweighting Local Mimina with Tilted SAM

Reweighting Local Mimina with Tilted SAM

URL: http://arxiv.org/abs/2410.22656v1
Date: Wed, 30 Oct 2024 02:49:48 GMT
Title: Reweighting Local Mimina with Tilted SAM
Authors: Tian Li, Tianyi Zhou, Jeffrey A. Bilmes,
Abstract summary: Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of over infinity by seeking flat minima on flatter loss. In this work, we propose TSAM (TSAM) that effectively assigns higher priority to local solutions that are flatter and that incur losses.
Score: 24.689230137012174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss within a neighborhood. Nevertheless, such min-max formulations are computationally challenging especially when the problem is highly non-convex. Additionally, focusing only on the worst-case local solution while ignoring potentially many other local solutions may be suboptimal when searching for flat minima. In this work, we propose Tilted SAM (TSAM), a generalization of SAM inspired by exponential tilting that effectively assigns higher priority to local solutions that are flatter and that incur larger losses. TSAM is parameterized by a tilt hyperparameter t and reduces to SAM as t approaches infinity. We prove that (1) the TSAM objective is smoother than SAM and thus easier to optimize; and (2) TSAM explicitly favors flatter minima as t increases. This is desirable as flatter minima could have better generalization properties for certain tasks. We develop algorithms motivated by the discretization of Hamiltonian dynamics to solve TSAM. Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks.

Related papers

Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning [52.63618112418439]
Sharpness-aware computation (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning.<n>We propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers.
arXiv Detail & Related papers (2026-02-10T04:05:43Z)
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization [92.17866492331524]
Sharpness-Aware Minimization (SAM) enhances the ability of the machine learning model by exploring the flat minima landscape through weight perturbations.<n>SAM introduces an additional hyper- parameter, the perturbation radius, which causes the sensitivity of SAM to it.<n>In this paper, we propose the algorithm LightSAM which sets the perturbation radius and learning rate of SAM adaptively.
arXiv Detail & Related papers (2025-05-30T09:28:38Z)
Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry [2.854482269849925]
Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation. We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM. We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
arXiv Detail & Related papers (2025-02-12T14:40:19Z)
Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS) In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS) By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead [0.6577148087211809]
We propose Momentum-SAM, which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.
arXiv Detail & Related papers (2024-01-22T15:19:18Z)
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes. SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation. In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z)
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization [20.560184120992094]
Sharpness-Aware Minimization technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM.
arXiv Detail & Related papers (2023-02-19T23:27:12Z)
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon [15.332235979022036]
The Sharpness Aware Minimization (SAM) algorithm has been shown to control large eigenvalues of the loss Hessian. We show that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters.
arXiv Detail & Related papers (2023-02-17T04:51:20Z)
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z)
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models [93.85178920914721]
Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor computation. We propose a novel optimization procedure, namely FSAM, which introduces a Fisher mask to improve the efficiency and performance of SAM. We show that FSAM consistently outperforms the vanilla SAM by 0.671.98 average score among four different pretrained models.
arXiv Detail & Related papers (2022-10-11T14:53:58Z)
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach [132.37966970098645]
One of the popular solutions is Sharpness-Aware Minimization (SAM), which minimizes the change of weight loss when adding a perturbation. In this paper, we propose an efficient effective training scheme coined as Sparse SAM (SSAM), which achieves double overhead of common perturbations. In addition, we theoretically prove that S can converge at the same SAM, i.e., $O(log T/sqrtTTTTTTTTTTTTTTTTT
arXiv Detail & Related papers (2022-10-11T06:30:10Z)
Surrogate Gap Minimization Improves Sharpness-Aware Training [52.58252223573646]
Surrogate textbfGap Guided textbfSharpness-textbfAware textbfMinimization (GSAM) is a novel improvement over Sharpness-Aware Minimization (SAM) with negligible computation overhead. GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities.
arXiv Detail & Related papers (2022-03-15T16:57:59Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.