GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization
for Improved Generalization
- URL: http://arxiv.org/abs/2210.06895v1
- Date: Thu, 13 Oct 2022 10:44:10 GMT
- Title: GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization
for Improved Generalization
- Authors: Zhiyuan Zhang, Ruixuan Luo, Qi Su, Xu Sun
- Abstract summary: Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks.
SAM has some difficulty implying SAM to some natural language tasks, especially to models with drastic changes, such as RNNs.
We propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help learn algorithms find flat minima that generalize better.
- Score: 22.53923556656022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Sharpness-Aware Minimization (SAM) algorithm has shown
state-of-the-art generalization abilities in vision tasks. It demonstrates that
flat minima tend to imply better generalization abilities. However, it has some
difficulty implying SAM to some natural language tasks, especially to models
with drastic gradient changes, such as RNNs. In this work, we analyze the
relation between the flatness of the local minimum and its generalization
ability from a novel and straightforward theoretical perspective. We propose
that the shift of the training and test distributions can be equivalently seen
as a virtual parameter corruption or perturbation, which can explain why flat
minima that are robust against parameter corruptions or perturbations have
better generalization performances. On its basis, we propose a
Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM)
algorithm to help to learn algorithms find flat minima that generalize better.
Results in various language benchmarks validate the effectiveness of the
proposed GA-SAM algorithm on natural language tasks.
Related papers
- Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models [42.59948316941217]
Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and generalization degradation.
We propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models.
arXiv Detail & Related papers (2024-06-19T01:03:23Z) - A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To
Achieve Better Generalization [29.90109733192208]
Existing theory shows that common architectures prefer flatter minimizers of the training loss.
This work critically examines this explanation.
Our results suggest that the relationship between sharpness and generalization subtly depends on the data.
arXiv Detail & Related papers (2023-07-20T16:34:58Z) - Sharpness-Aware Gradient Matching for Domain Generalization [84.14789746460197]
The goal of domain generalization (DG) is to enhance the generalization capability of the model learned from a source domain to other unseen domains.
The recently developed Sharpness-Aware Minimization (SAM) method aims to achieve this goal by minimizing the sharpness measure of the loss landscape.
We present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM)
Our proposed SAGM method consistently outperforms the state-of-the-art methods on five DG benchmarks.
arXiv Detail & Related papers (2023-03-18T07:25:12Z) - Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves
Generalization [33.50116027503244]
We show that the zeroth-order flatness can be insufficient to discriminate minima with low gradient error.
We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions.
arXiv Detail & Related papers (2023-03-03T16:58:53Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Surrogate Gap Minimization Improves Sharpness-Aware Training [52.58252223573646]
Surrogate textbfGap Guided textbfSharpness-textbfAware textbfMinimization (GSAM) is a novel improvement over Sharpness-Aware Minimization (SAM) with negligible computation overhead.
GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities.
arXiv Detail & Related papers (2022-03-15T16:57:59Z) - Questions for Flat-Minima Optimization of Modern Neural Networks [28.12506392321345]
Two methods for finding flat minima stand out: 1. Averaging methods (i.e. Weight Averaging, SWA) and 2. Minimax methods (i.e. Aware, Sharpness Minimization, SAM)
We investigate the loss surfaces from a systematic benchmarking of these approaches across computer vision, natural language processing, and graph learning tasks.
arXiv Detail & Related papers (2022-02-01T18:56:15Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z) - ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
of Deep Neural Networks [2.8292841621378844]
We introduce the concept of adaptive sharpness which is scale-invariant and propose the corresponding generalization bound.
We suggest a novel learning method, adaptive sharpness-aware minimization (ASAM), utilizing the proposed generalization bound.
Experimental results in various benchmark datasets show that ASAM contributes to significant improvement of model generalization performance.
arXiv Detail & Related papers (2021-02-23T10:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.