Related papers: 1st-Order Magic: Analysis of Sharpness-Aware Minimization

1st-Order Magic: Analysis of Sharpness-Aware Minimization

URL: http://arxiv.org/abs/2411.01714v1
Date: Sun, 03 Nov 2024 23:50:34 GMT
Title: 1st-Order Magic: Analysis of Sharpness-Aware Minimization
Authors: Nalin Tiwary, Siddarth Aananth,
Abstract summary: Sharpness-Aware Minimization (SAM) is an optimization technique designed to improve generalization by favoring flatter loss minima. We find that more precise approximations of the proposed SAM objective degrade generalization performance. This highlights a gap in our understanding of SAM's effectiveness and calls for further investigation into the role of approximations in optimization.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sharpness-Aware Minimization (SAM) is an optimization technique designed to improve generalization by favoring flatter loss minima. To achieve this, SAM optimizes a modified objective that penalizes sharpness, using computationally efficient approximations. Interestingly, we find that more precise approximations of the proposed SAM objective degrade generalization performance, suggesting that the generalization benefits of SAM are rooted in these approximations rather than in the original intended mechanism. This highlights a gap in our understanding of SAM's effectiveness and calls for further investigation into the role of approximations in optimization.

Related papers

LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM [13.180761892449736]
We study robust parameter-efficient fine-tuning (PEFT) techniques for large-language models (LLMs) We present a new highly computationally efficient framework called AdaZo-SAM, combining Adam and Sharpness-Aware Minimization (SAM) We also design a low-rank gradient optimization method named LORENZA, which is a memory-efficient version of AdaZo-SAM.
arXiv Detail & Related papers (2025-02-26T21:30:34Z)
GCSAM: Gradient Centralized Sharpness Aware Minimization [45.05109291721135]
Sharpness-Aware Minimization (SAM) has emerged as an effective optimization technique for reducing the sharpness of the loss landscape. We propose Gradient-Sharpness-Aware Minimization (GCSAM), which incorporates Gradient Centralization (GC) to stabilize and accelerate convergence. GCSAM consistently outperforms SAM and the Adam in terms of generalization and computational efficiency.
arXiv Detail & Related papers (2025-01-20T16:42:31Z)
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm [39.656014609027494]
sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights. A novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates.
arXiv Detail & Related papers (2025-01-11T18:05:33Z)
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM) AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency. As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z)
A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions. We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead [0.6577148087211809]
We propose Momentum-SAM, which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.
arXiv Detail & Related papers (2024-01-22T15:19:18Z)
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance. We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z)
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes. SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation. In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z)
Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima. We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z)
Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base. SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.