GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization
for Improved Generalization
- URL: http://arxiv.org/abs/2210.06895v1
- Date: Thu, 13 Oct 2022 10:44:10 GMT
- Title: GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization
for Improved Generalization
- Authors: Zhiyuan Zhang, Ruixuan Luo, Qi Su, Xu Sun
- Abstract summary: Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks.
SAM has some difficulty implying SAM to some natural language tasks, especially to models with drastic changes, such as RNNs.
We propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help learn algorithms find flat minima that generalize better.
- Score: 22.53923556656022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Sharpness-Aware Minimization (SAM) algorithm has shown
state-of-the-art generalization abilities in vision tasks. It demonstrates that
flat minima tend to imply better generalization abilities. However, it has some
difficulty implying SAM to some natural language tasks, especially to models
with drastic gradient changes, such as RNNs. In this work, we analyze the
relation between the flatness of the local minimum and its generalization
ability from a novel and straightforward theoretical perspective. We propose
that the shift of the training and test distributions can be equivalently seen
as a virtual parameter corruption or perturbation, which can explain why flat
minima that are robust against parameter corruptions or perturbations have
better generalization performances. On its basis, we propose a
Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM)
algorithm to help to learn algorithms find flat minima that generalize better.
Results in various language benchmarks validate the effectiveness of the
proposed GA-SAM algorithm on natural language tasks.
Related papers
- Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification.
We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs.
Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z) - Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS)
In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS)
By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z) - A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To
Achieve Better Generalization [29.90109733192208]
Existing theory shows that common architectures prefer flatter minimizers of the training loss.
This work critically examines this explanation.
Our results suggest that the relationship between sharpness and generalization subtly depends on the data.
arXiv Detail & Related papers (2023-07-20T16:34:58Z) - Sharpness-Aware Gradient Matching for Domain Generalization [84.14789746460197]
The goal of domain generalization (DG) is to enhance the generalization capability of the model learned from a source domain to other unseen domains.
The recently developed Sharpness-Aware Minimization (SAM) method aims to achieve this goal by minimizing the sharpness measure of the loss landscape.
We present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM)
Our proposed SAGM method consistently outperforms the state-of-the-art methods on five DG benchmarks.
arXiv Detail & Related papers (2023-03-18T07:25:12Z) - Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves
Generalization [33.50116027503244]
We show that the zeroth-order flatness can be insufficient to discriminate minima with low gradient error.
We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions.
arXiv Detail & Related papers (2023-03-03T16:58:53Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Surrogate Gap Minimization Improves Sharpness-Aware Training [52.58252223573646]
Surrogate textbfGap Guided textbfSharpness-textbfAware textbfMinimization (GSAM) is a novel improvement over Sharpness-Aware Minimization (SAM) with negligible computation overhead.
GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities.
arXiv Detail & Related papers (2022-03-15T16:57:59Z) - Questions for Flat-Minima Optimization of Modern Neural Networks [28.12506392321345]
Two methods for finding flat minima stand out: 1. Averaging methods (i.e. Weight Averaging, SWA) and 2. Minimax methods (i.e. Aware, Sharpness Minimization, SAM)
We investigate the loss surfaces from a systematic benchmarking of these approaches across computer vision, natural language processing, and graph learning tasks.
arXiv Detail & Related papers (2022-02-01T18:56:15Z) - ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
of Deep Neural Networks [2.8292841621378844]
We introduce the concept of adaptive sharpness which is scale-invariant and propose the corresponding generalization bound.
We suggest a novel learning method, adaptive sharpness-aware minimization (ASAM), utilizing the proposed generalization bound.
Experimental results in various benchmark datasets show that ASAM contributes to significant improvement of model generalization performance.
arXiv Detail & Related papers (2021-02-23T10:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.