Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models
- URL: http://arxiv.org/abs/2406.13137v1
- Date: Wed, 19 Jun 2024 01:03:23 GMT
- Title: Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models
- Authors: Yili Wang, Kaixiong Zhou, Ninghao Liu, Ying Wang, Xin Wang,
- Abstract summary: Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and generalization degradation.
We propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models.
- Score: 42.59948316941217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and mitigate generalization degradation. However, SAM requires two sequential gradient computations during the optimization of each step: one to obtain the perturbation gradient and the other to obtain the updating gradient. Compared with the base optimizer (e.g., Adam), SAM doubles the time overhead due to the additional perturbation gradient. By dissecting the theory of SAM and observing the training gradient of the molecular graph transformer, we propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models. There are two key factors that contribute to this result: (i) \textit{gradient approximation}: we use the updating gradient of the previous step to approximate the perturbation gradient at the intermediate steps smoothly (\textbf{increases efficiency}); (ii) \textit{loss landscape approximation}: we theoretically prove that the loss landscape of GraphSAM is limited to a small range centered on the expected loss of SAM (\textbf{guarantees generalization performance}). The extensive experiments on six datasets with different tasks demonstrate the superiority of GraphSAM, especially in optimizing the model update process. The code is in:https://github.com/YL-wang/GraphSAM/tree/graphsam
Related papers
- Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition [39.057791275055074]
We propose a novel method called Random SAM prompt tuning (RSAM-PT) to improve the model generalization.
We employ the deferred re-weight scheme to increase the significance of tail-class samples.
RSAM-PT achieves the state-of-the-art performance of 90.3%, 76.5%, and 50.1% on benchmark datasets.
arXiv Detail & Related papers (2024-10-28T13:58:17Z) - Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification.
We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs.
Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z) - Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness.
We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient.
By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z) - CR-SAM: Curvature Regularized Sharpness-Aware Minimization [8.248964912483912]
Sharpness-Aware Minimization (SAM) aims to enhance the generalizability by minimizing worst-case loss using one-step gradient ascent as an approximation.
In this paper, we introduce a normalized Hessian trace to accurately measure the curvature of loss landscape on em both training and test sets.
In particular, to counter excessive non-linearity of loss landscape, we propose Curvature Regularized SAM (CR-SAM)
arXiv Detail & Related papers (2023-12-21T03:46:29Z) - Efficient Generalization Improvement Guided by Random Weight
Perturbation [24.027159739234524]
Gruesome-aware minimization (SAM) establishes a generic scheme for generalization improvements.
We resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM.
We achieve very competitive performance on CIFAR and remarkably better performance on ImageNet.
arXiv Detail & Related papers (2022-11-21T14:24:34Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Towards Efficient and Scalable Sharpness-Aware Minimization [81.22779501753695]
We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent.
LookSAM achieves similar accuracy gains to SAM while being tremendously faster.
We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
arXiv Detail & Related papers (2022-03-05T11:53:37Z) - Update in Unit Gradient [8.143750358586074]
In Machine Learning, optimization is done by using a descent method to find the minimum value of the loss.
We propose a unit Vector space in algebra, not only consisted the mathematical instinct in algebra but also kept the advantages of gradient algorithm.
arXiv Detail & Related papers (2021-10-01T04:00:51Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.