Related papers: On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees

On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees

URL: http://arxiv.org/abs/2302.11836v3
Date: Fri, 19 May 2023 06:02:43 GMT
Title: On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees
Authors: Kayhan Behdin, Rahul Mazumder
Abstract summary: We present a new theoretical explanation of why Sharpness-Aware Minimization (SAM) generalizes well. SAM is particularly well-suited for both sharp and non-sharp problems. Our findings are validated using numerical experiments on deep neural networks.
Score: 5.91402820967386
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sharpness-Aware Minimization (SAM) is a recent optimization framework aiming to improve the deep neural network generalization, through obtaining flatter (i.e. less sharp) solutions. As SAM has been numerically successful, recent papers have studied the theoretical aspects of the framework and have shown SAM solutions are indeed flat. However, there has been limited theoretical exploration regarding statistical properties of SAM. In this work, we directly study the statistical performance of SAM, and present a new theoretical explanation of why SAM generalizes well. To this end, we study two statistical problems, neural networks with a hidden layer and kernel regression, and prove under certain conditions, SAM has smaller prediction error over Gradient Descent (GD). Our results concern both convex and non-convex settings, and show that SAM is particularly well-suited for non-convex problems. Additionally, we prove that in our setup, SAM solutions are less sharp as well, showing our results are in agreement with the previous work. Our theoretical findings are validated using numerical experiments on numerous scenarios, including deep neural networks.

Related papers

LightSAM: Parameter-Agnostic Sharpness-Aware Minimization [92.17866492331524]
Sharpness-Aware Minimization (SAM) enhances the ability of the machine learning model by exploring the flat minima landscape through weight perturbations.<n>SAM introduces an additional hyper- parameter, the perturbation radius, which causes the sensitivity of SAM to it.<n>In this paper, we propose the algorithm LightSAM which sets the perturbation radius and learning rate of SAM adaptively.
arXiv Detail & Related papers (2025-05-30T09:28:38Z)
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization [21.747141953620698]
Deep neural networks have been increasingly used in safety-critical applications such as medical diagnosis and autonomous driving.<n>Many studies suggest that they are prone to being poorly calibrated and have a propensity for overconfidence, which may have disastrous consequences.<n>We show that the recently proposed sharpness-aware minimization (SAM) counteracts this tendency towards overconfidence.<n>We propose a variant of SAM, coined as CSAM, to ameliorate model calibration.
arXiv Detail & Related papers (2025-05-29T09:55:29Z)
Sharpness-Aware Minimization: General Analysis and Improved Rates [10.11126899274029]
Sharpness-Aware Minimization (SAM) has emerged as a powerful method for improving generalization in machine learning models. We provide an analysis of SAM and its unnormalized variant rule rule (USAM) under one update. We present results of the new size under a relaxed more natural assumption.
arXiv Detail & Related papers (2025-03-04T03:04:06Z)
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm [39.656014609027494]
sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights. A novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates.
arXiv Detail & Related papers (2025-01-11T18:05:33Z)
Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS) In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS) By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z)
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM) AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency. As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Why Does Sharpness-Aware Minimization Generalize Better Than SGD? [102.40907275290891]
We show why Sharpness-Aware Minimization (SAM) generalizes better than Gradient Descent (SGD) for certain data model and two-layer convolutional ReLU networks. Our result explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, thereby facilitating more effective learning of features.
arXiv Detail & Related papers (2023-10-11T07:51:10Z)
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes. SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation. In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z)
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization [20.560184120992094]
Sharpness-Aware Minimization technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM.
arXiv Detail & Related papers (2023-02-19T23:27:12Z)
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z)
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach [132.37966970098645]
One of the popular solutions is Sharpness-Aware Minimization (SAM), which minimizes the change of weight loss when adding a perturbation. In this paper, we propose an efficient effective training scheme coined as Sparse SAM (SSAM), which achieves double overhead of common perturbations. In addition, we theoretically prove that S can converge at the same SAM, i.e., $O(log T/sqrtTTTTTTTTTTTTTTTTT
arXiv Detail & Related papers (2022-10-11T06:30:10Z)
Towards Understanding Sharpness-Aware Minimization [27.666483899332643]
We argue that the existing justifications for the success of Sharpness-Aware Minimization (SAM) are based on a PACBayes generalization. We theoretically analyze its implicit bias for diagonal linear networks. We show that fine-tuning a standard model with SAM can be shown significant improvements on the properties of non-sharp networks.
arXiv Detail & Related papers (2022-06-13T15:07:32Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.