Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
- URL: http://arxiv.org/abs/2501.12666v1
- Date: Wed, 22 Jan 2025 06:03:16 GMT
- Title: Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
- Authors: Haocheng Luo, Tuan Truong, Tung Pham, Mehrtash Harandi, Dinh Phung, Trung Le,
- Abstract summary: Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks.
We analyze SAM's training dynamics using maximum eigenvalue of the Hessian as a measure of sharpness.
We introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue.
- Score: 37.515131384121204
- License:
- Abstract: Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks. However, its underlying principles remain poorly understood. In this work, we analyze SAM's training dynamics using the maximum eigenvalue of the Hessian as a measure of sharpness, and propose a third-order stochastic differential equation (SDE), which reveals that the dynamics are driven by a complex mixture of second- and third-order terms. We show that alignment between the perturbation vector and the top eigenvector is crucial for SAM's effectiveness in regularizing sharpness, but find that this alignment is often inadequate in practice, limiting SAM's efficiency. Building on these insights, we introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue by aligning the perturbation vector with the leading eigenvector. We validate the effectiveness of our theory and the practical advantages of our proposed approach through comprehensive experiments. Code is available at https://github.com/RitianLuo/EigenSAM.
Related papers
- Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems [26.377807940655305]
This work introduces a concept termed balancedness, defined as the difference between the squared norm of two variables.
We develop a resource-efficient SAM variant, balancedness-aware regularization (BAR), tailored for scale-invariant problems.
arXiv Detail & Related papers (2024-10-18T18:19:18Z) - A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance.
We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step.
Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - The Crucial Role of Normalization in Sharpness-Aware Minimization [44.00155917998616]
Sharpness-Aware Minimization (SAM) is a gradient-based neural network that greatly improves prediction performance.
We argue that two properties of normalization make SAM robust against the choice of hyper- practicalitys.
arXiv Detail & Related papers (2023-05-24T16:09:41Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - SAM operates far from home: eigenvalue regularization as a dynamical
phenomenon [15.332235979022036]
The Sharpness Aware Minimization (SAM) algorithm has been shown to control large eigenvalues of the loss Hessian.
We show that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory.
Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters.
arXiv Detail & Related papers (2023-02-17T04:51:20Z) - How Does Sharpness-Aware Minimization Minimize Sharpness? [29.90109733192208]
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks.
This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism.
arXiv Detail & Related papers (2022-11-10T17:56:38Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.