ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
of Deep Neural Networks
- URL: http://arxiv.org/abs/2102.11600v1
- Date: Tue, 23 Feb 2021 10:26:54 GMT
- Title: ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
of Deep Neural Networks
- Authors: Jungmin Kwon, Jeongseop Kim, Hyunseo Park and In Kwon Choi
- Abstract summary: We introduce the concept of adaptive sharpness which is scale-invariant and propose the corresponding generalization bound.
We suggest a novel learning method, adaptive sharpness-aware minimization (ASAM), utilizing the proposed generalization bound.
Experimental results in various benchmark datasets show that ASAM contributes to significant improvement of model generalization performance.
- Score: 2.8292841621378844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, learning algorithms motivated from sharpness of loss surface as an
effective measure of generalization gap have shown state-of-the-art
performances. Nevertheless, sharpness defined in a rigid region with a fixed
radius, has a drawback in sensitivity to parameter re-scaling which leaves the
loss unaffected, leading to weakening of the connection between sharpness and
generalization gap. In this paper, we introduce the concept of adaptive
sharpness which is scale-invariant and propose the corresponding generalization
bound. We suggest a novel learning method, adaptive sharpness-aware
minimization (ASAM), utilizing the proposed generalization bound. Experimental
results in various benchmark datasets show that ASAM contributes to significant
improvement of model generalization performance.
Related papers
- Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization [2.8775022881551666]
Sharpness-Aware Minimization (SAM) was proposed to enhance model generalization.
SAM consists of two main steps, the weight perturbation step and the weight updating step.
We propose the Adaptive Adversarial Cross-Entropy (AACE) loss function to replace standard cross-entropy loss for SAM's perturbation.
arXiv Detail & Related papers (2024-06-20T14:00:01Z) - A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - CR-SAM: Curvature Regularized Sharpness-Aware Minimization [8.248964912483912]
Sharpness-Aware Minimization (SAM) aims to enhance the generalizability by minimizing worst-case loss using one-step gradient ascent as an approximation.
In this paper, we introduce a normalized Hessian trace to accurately measure the curvature of loss landscape on em both training and test sets.
In particular, to counter excessive non-linearity of loss landscape, we propose Curvature Regularized SAM (CR-SAM)
arXiv Detail & Related papers (2023-12-21T03:46:29Z) - Enhancing Sharpness-Aware Optimization Through Variance Suppression [48.908966673827734]
This work embraces the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability.
It seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood.
Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization.
arXiv Detail & Related papers (2023-09-27T13:18:23Z) - Gradient constrained sharpness-aware prompt learning for vision-language
models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM)
By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness.
We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z) - Normalization Layers Are All That Sharpness-Aware Minimization Needs [53.799769473526275]
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima.
We show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.
arXiv Detail & Related papers (2023-06-07T08:05:46Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Improving Generalization in Federated Learning by Seeking Flat Minima [23.937135834522145]
Models trained in federated settings often suffer from degraded performances and fail at generalizing.
In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum.
Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) on the server-side can substantially improve generalization.
arXiv Detail & Related papers (2022-03-22T16:01:04Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.