Related papers: Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models

Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models

URL: http://arxiv.org/abs/2508.10435v1
Date: Thu, 14 Aug 2025 08:17:34 GMT
Title: Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models
Authors: Tianxiao Cao, Kyohei Atarashi, Hisashi Kashima,
Abstract summary: We analyze the norm dynamics of Sharpness-Aware Minimization (SAM) in general tensorized models.<n>We show that SAM's implicit control of Norm Deviation is governed by the covariance between core norms and their gradient magnitudes.<n>Motivated by these findings, we propose a simple yet effective method, emphDeviation-Aware Scaling (DAS), which explicitly mimics this regularization behavior by scaling core norms in a data-adaptive manner.
Score: 21.52081811249999
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sharpness-Aware Minimization (SAM) has been proven to be an effective optimization technique for improving generalization in overparameterized models. While prior works have explored the implicit regularization of SAM in simple two-core scale-invariant settings, its behavior in more general tensorized or scale-invariant models remains underexplored. In this work, we leverage scale-invariance to analyze the norm dynamics of SAM in general tensorized models. We introduce the notion of \emph{Norm Deviation} as a global measure of core norm imbalance, and derive its evolution under SAM using gradient flow analysis. We show that SAM's implicit control of Norm Deviation is governed by the covariance between core norms and their gradient magnitudes. Motivated by these findings, we propose a simple yet effective method, \emph{Deviation-Aware Scaling (DAS)}, which explicitly mimics this regularization behavior by scaling core norms in a data-adaptive manner. Our experiments across tensor completion, noisy training, model compression, and parameter-efficient fine-tuning confirm that DAS achieves competitive or improved performance over SAM, while offering reduced computational overhead.

Related papers

Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning [52.63618112418439]
Sharpness-aware computation (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning.<n>We propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers.
arXiv Detail & Related papers (2026-02-10T04:05:43Z)
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise [31.637051623223346]
We investigate the phenomenon known as msharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases.<n>In practice, the empirical m-sharpness effect underpins the deployment of SAM in training, yet a rigorous theoretical account has remained lacking.<n>We introduce by our theoretical insights, we introduce Reweighted SAM (RWSAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable.
arXiv Detail & Related papers (2025-09-22T16:40:42Z)
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization [92.17866492331524]
Sharpness-Aware Minimization (SAM) enhances the ability of the machine learning model by exploring the flat minima landscape through weight perturbations.<n>SAM introduces an additional hyper- parameter, the perturbation radius, which causes the sensitivity of SAM to it.<n>In this paper, we propose the algorithm LightSAM which sets the perturbation radius and learning rate of SAM adaptively.
arXiv Detail & Related papers (2025-05-30T09:28:38Z)
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning [7.6677237955415]
Sharpness-aware minimization (SAM) is known to improve the generalization performance of neural networks.<n>SAM is not widely used in real-world applications yet due to its expensive model perturbation cost.<n>We propose a lightweight layer-wise gradient norm penalizing method that tackles the expensive computational cost of SAM while maintaining its superior generalization performance.
arXiv Detail & Related papers (2025-03-18T12:30:57Z)
Sharpness-Aware Minimization: General Analysis and Improved Rates [10.11126899274029]
Sharpness-Aware Minimization (SAM) has emerged as a powerful method for improving generalization in machine learning models.<n>We provide an analysis of SAM and its unnormalized variant rule rule (USAM) under one update.<n>We present results of the new size under a relaxed more natural assumption.
arXiv Detail & Related papers (2025-03-04T03:04:06Z)
Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion [41.50816120270017]
In domains such as molecular and protein generation, physical systems exhibit inherent symmetries that are critical to model.<n>We present a framework that reduces training variance and provides a provably lower-variance gradient estimator.<n>We also present a practical implementation of this estimator incorporating the loss and sampling procedure through a method we call Orbit Diffusion.
arXiv Detail & Related papers (2025-02-14T03:26:57Z)
Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry [2.854482269849925]
Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation.<n>We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM.<n>We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
arXiv Detail & Related papers (2025-02-12T14:40:19Z)
Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks [4.877624278656814]
Sharpness-Aware Minimization (SAM) has proven highly effective in improving model generalization in machine learning tasks.<n>We propose the SAM with Adaptive Regularization (SAMAR), which introduces a flexible sharpness ratio rule to update the regularization parameter dynamically.
arXiv Detail & Related papers (2024-12-22T04:40:02Z)
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning [63.55145330447408]
We propose a novel textbfSelf-textbfPerceptinon textbfTuning (textbfSPT) method for anomaly segmentation.<n>The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process.
arXiv Detail & Related papers (2024-11-26T08:33:25Z)
Critical Influence of Overparameterization on Sharpness-aware Minimization [12.321517302762558]
Sharpness-Aware Minimization (SAM) has attracted considerable attention for its effectiveness in improving generalization in deep neural network training.<n>This work presents both empirical and theoretical findings that reveal its critical influence on SAM's effectiveness.
arXiv Detail & Related papers (2023-11-29T11:19:50Z)
Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima. We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z)
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z)
Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation [105.33409035876691]
This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling. We design a novel structured tensor low-rank norm tailored to MVSC. We show that the proposed method outperforms state-of-the-art methods to a significant extent.
arXiv Detail & Related papers (2020-04-30T11:52:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.