Related papers: Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization

Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization

URL: http://arxiv.org/abs/2501.12666v1
Date: Wed, 22 Jan 2025 06:03:16 GMT
Title: Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
Authors: Haocheng Luo, Tuan Truong, Tung Pham, Mehrtash Harandi, Dinh Phung, Trung Le,
Abstract summary: Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks.<n>We analyze SAM's training dynamics using maximum eigenvalue of the Hessian as a measure of sharpness.<n>We introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue.
Score: 37.515131384121204
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks. However, its underlying principles remain poorly understood. In this work, we analyze SAM's training dynamics using the maximum eigenvalue of the Hessian as a measure of sharpness, and propose a third-order stochastic differential equation (SDE), which reveals that the dynamics are driven by a complex mixture of second- and third-order terms. We show that alignment between the perturbation vector and the top eigenvector is crucial for SAM's effectiveness in regularizing sharpness, but find that this alignment is often inadequate in practice, limiting SAM's efficiency. Building on these insights, we introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue by aligning the perturbation vector with the leading eigenvector. We validate the effectiveness of our theory and the practical advantages of our proposed approach through comprehensive experiments. Code is available at https://github.com/RitianLuo/EigenSAM.

Related papers

X-SAM: Boosting Sharpness-Aware Minimization with Dominant-Eigenvector Gradient Correction [7.8091155908891965]
We investigate Sharpness-Aware Minimization (SAM) from a spectral and geometric perspective.<n>We propose an explicit eigenvector-aligned SAM (X-SAM) which corrects the gradient via decomposition along the top eigenvector.<n>We prove X-SAM's convergence and superior generalization, with extensive experimental evaluations confirming both theoretical and practical advantages.
arXiv Detail & Related papers (2026-01-15T10:19:08Z)
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise [31.637051623223346]
We investigate the phenomenon known as msharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases.<n>In practice, the empirical m-sharpness effect underpins the deployment of SAM in training, yet a rigorous theoretical account has remained lacking.<n>We introduce by our theoretical insights, we introduce Reweighted SAM (RWSAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable.
arXiv Detail & Related papers (2025-09-22T16:40:42Z)
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification [113.6840565194525]
Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult.<n>Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM) to improve generalization by flattening the loss landscape.<n>We introduce Focal-SAM, which assigns different penalties to class-wise, achieving fine-grained control without extra backpropagations.
arXiv Detail & Related papers (2025-05-03T03:01:28Z)
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems [26.377807940655305]
This work introduces a concept termed balancedness, defined as the difference between the squared norm of two variables. We develop a resource-efficient SAM variant, balancedness-aware regularization (BAR), tailored for scale-invariant problems.
arXiv Detail & Related papers (2024-10-18T18:19:18Z)
A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions. We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance. We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z)
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes. SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation. In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z)
The Crucial Role of Normalization in Sharpness-Aware Minimization [44.00155917998616]
Sharpness-Aware Minimization (SAM) is a gradient-based neural network that greatly improves prediction performance. We argue that two properties of normalization make SAM robust against the choice of hyper- practicalitys.
arXiv Detail & Related papers (2023-05-24T16:09:41Z)
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks. Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored. We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z)
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon [15.332235979022036]
The Sharpness Aware Minimization (SAM) algorithm has been shown to control large eigenvalues of the loss Hessian. We show that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters.
arXiv Detail & Related papers (2023-02-17T04:51:20Z)
How Does Sharpness-Aware Minimization Minimize Sharpness? [29.90109733192208]
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism.
arXiv Detail & Related papers (2022-11-10T17:56:38Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.