Related papers: GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

URL: http://arxiv.org/abs/2210.06895v1
Date: Thu, 13 Oct 2022 10:44:10 GMT
Title: GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Authors: Zhiyuan Zhang, Ruixuan Luo, Qi Su, Xu Sun
Abstract summary: Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. SAM has some difficulty implying SAM to some natural language tasks, especially to models with drastic changes, such as RNNs. We propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help learn algorithms find flat minima that generalize better.
Score: 22.53923556656022
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty implying SAM to some natural language tasks, especially to models with drastic gradient changes, such as RNNs. In this work, we analyze the relation between the flatness of the local minimum and its generalization ability from a novel and straightforward theoretical perspective. We propose that the shift of the training and test distributions can be equivalently seen as a virtual parameter corruption or perturbation, which can explain why flat minima that are robust against parameter corruptions or perturbations have better generalization performances. On its basis, we propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help to learn algorithms find flat minima that generalize better. Results in various language benchmarks validate the effectiveness of the proposed GA-SAM algorithm on natural language tasks.

Related papers

Avoiding spurious sharpness minimization broadens applicability of SAM [13.21265875272573]
Curvature regularization techniques like Sharpness Aware Minimization (SAM) have shown great promise in improving generalization on vision tasks. We find that SAM performs poorly in domains like natural language processing (NLP), often degrading performance -- even with twice the compute budget. We develop an alternative algorithm we call Functional-SAM, which regularizes curvature only through modification of the statistics of the overall function.
arXiv Detail & Related papers (2025-02-04T15:25:47Z)
Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification. We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs. Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z)
Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS) In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS) By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z)
A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions. We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z)
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization [29.90109733192208]
Existing theory shows that common architectures prefer flatter minimizers of the training loss. This work critically examines this explanation. Our results suggest that the relationship between sharpness and generalization subtly depends on the data.
arXiv Detail & Related papers (2023-07-20T16:34:58Z)
Sharpness-Aware Gradient Matching for Domain Generalization [84.14789746460197]
The goal of domain generalization (DG) is to enhance the generalization capability of the model learned from a source domain to other unseen domains. The recently developed Sharpness-Aware Minimization (SAM) method aims to achieve this goal by minimizing the sharpness measure of the loss landscape. We present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM) Our proposed SAGM method consistently outperforms the state-of-the-art methods on five DG benchmarks.
arXiv Detail & Related papers (2023-03-18T07:25:12Z)
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization [33.50116027503244]
We show that the zeroth-order flatness can be insufficient to discriminate minima with low gradient error. We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions.
arXiv Detail & Related papers (2023-03-03T16:58:53Z)
Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base. SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z)
Surrogate Gap Minimization Improves Sharpness-Aware Training [52.58252223573646]
Surrogate textbfGap Guided textbfSharpness-textbfAware textbfMinimization (GSAM) is a novel improvement over Sharpness-Aware Minimization (SAM) with negligible computation overhead. GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities.
arXiv Detail & Related papers (2022-03-15T16:57:59Z)
Questions for Flat-Minima Optimization of Modern Neural Networks [28.12506392321345]
Two methods for finding flat minima stand out: 1. Averaging methods (i.e. Weight Averaging, SWA) and 2. Minimax methods (i.e. Aware, Sharpness Minimization, SAM) We investigate the loss surfaces from a systematic benchmarking of these approaches across computer vision, natural language processing, and graph learning tasks.
arXiv Detail & Related papers (2022-02-01T18:56:15Z)
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks [2.8292841621378844]
We introduce the concept of adaptive sharpness which is scale-invariant and propose the corresponding generalization bound. We suggest a novel learning method, adaptive sharpness-aware minimization (ASAM), utilizing the proposed generalization bound. Experimental results in various benchmark datasets show that ASAM contributes to significant improvement of model generalization performance.
arXiv Detail & Related papers (2021-02-23T10:26:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.