X-SAM: Boosting Sharpness-Aware Minimization with Dominant-Eigenvector Gradient Correction
- URL: http://arxiv.org/abs/2601.10251v1
- Date: Thu, 15 Jan 2026 10:19:08 GMT
- Title: X-SAM: Boosting Sharpness-Aware Minimization with Dominant-Eigenvector Gradient Correction
- Authors: Hongru Duan, Yongle Chen, Lei Guan,
- Abstract summary: We investigate Sharpness-Aware Minimization (SAM) from a spectral and geometric perspective.<n>We propose an explicit eigenvector-aligned SAM (X-SAM) which corrects the gradient via decomposition along the top eigenvector.<n>We prove X-SAM's convergence and superior generalization, with extensive experimental evaluations confirming both theoretical and practical advantages.
- Score: 7.8091155908891965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sharpness-Aware Minimization (SAM) aims to improve generalization by minimizing a worst-case perturbed loss over a small neighborhood of model parameters. However, during training, its optimization behavior does not always align with theoretical expectations, since both sharp and flat regions may yield a small perturbed loss. In such cases, the gradient may still point toward sharp regions, failing to achieve the intended effect of SAM. To address this issue, we investigate SAM from a spectral and geometric perspective: specifically, we utilize the angle between the gradient and the leading eigenvector of the Hessian as a measure of sharpness. Our analysis illustrates that when this angle is less than or equal to ninety degrees, the effect of SAM's sharpness regularization can be weakened. Furthermore, we propose an explicit eigenvector-aligned SAM (X-SAM), which corrects the gradient via orthogonal decomposition along the top eigenvector, enabling more direct and efficient regularization of the Hessian's maximum eigenvalue. We prove X-SAM's convergence and superior generalization, with extensive experimental evaluations confirming both theoretical and practical advantages.
Related papers
- Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization [37.515131384121204]
Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks.<n>We analyze SAM's training dynamics using maximum eigenvalue of the Hessian as a measure of sharpness.<n>We introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue.
arXiv Detail & Related papers (2025-01-22T06:03:16Z) - Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS)
In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS)
By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Critical Influence of Overparameterization on Sharpness-aware Minimization [12.321517302762558]
Sharpness-Aware Minimization (SAM) has attracted considerable attention for its effectiveness in improving generalization in deep neural network training.<n>This work presents both empirical and theoretical findings that reveal its critical influence on SAM's effectiveness.
arXiv Detail & Related papers (2023-11-29T11:19:50Z) - Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima.
We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z) - Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves
Generalization [33.50116027503244]
We show that the zeroth-order flatness can be insufficient to discriminate minima with low gradient error.
We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions.
arXiv Detail & Related papers (2023-03-03T16:58:53Z) - SAM operates far from home: eigenvalue regularization as a dynamical
phenomenon [15.332235979022036]
The Sharpness Aware Minimization (SAM) algorithm has been shown to control large eigenvalues of the loss Hessian.
We show that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory.
Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters.
arXiv Detail & Related papers (2023-02-17T04:51:20Z) - The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
and Drifting Towards Wide Minima [41.961056785108845]
We consider Sharpness-Aware Minimization, a gradient-based optimization method for deep networks.
We show that when SAM is applied with a convex quadratic objective, it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature.
In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian.
arXiv Detail & Related papers (2022-10-04T10:34:37Z) - Towards Understanding Sharpness-Aware Minimization [27.666483899332643]
We argue that the existing justifications for the success of Sharpness-Aware Minimization (SAM) are based on a PACBayes generalization.
We theoretically analyze its implicit bias for diagonal linear networks.
We show that fine-tuning a standard model with SAM can be shown significant improvements on the properties of non-sharp networks.
arXiv Detail & Related papers (2022-06-13T15:07:32Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Surrogate Gap Minimization Improves Sharpness-Aware Training [52.58252223573646]
Surrogate textbfGap Guided textbfSharpness-textbfAware textbfMinimization (GSAM) is a novel improvement over Sharpness-Aware Minimization (SAM) with negligible computation overhead.
GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities.
arXiv Detail & Related papers (2022-03-15T16:57:59Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.