Related papers: Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization

Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization

URL: http://arxiv.org/abs/2505.23866v1
Date: Thu, 29 May 2025 09:55:29 GMT
Title: Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Authors: Chengli Tan, Yubo Zhou, Haishan Ye, Guang Dai, Junmin Liu, Zengjie Song, Jiangshe Zhang, Zixiang Zhao, Yunda Hao, Yong Xu,
Abstract summary: Deep neural networks have been increasingly used in safety-critical applications such as medical diagnosis and autonomous driving.<n>Many studies suggest that they are prone to being poorly calibrated and have a propensity for overconfidence, which may have disastrous consequences.<n>We show that the recently proposed sharpness-aware minimization (SAM) counteracts this tendency towards overconfidence.<n>We propose a variant of SAM, coined as CSAM, to ameliorate model calibration.
Score: 21.747141953620698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have been increasingly used in safety-critical applications such as medical diagnosis and autonomous driving. However, many studies suggest that they are prone to being poorly calibrated and have a propensity for overconfidence, which may have disastrous consequences. In this paper, unlike standard training such as stochastic gradient descent, we show that the recently proposed sharpness-aware minimization (SAM) counteracts this tendency towards overconfidence. The theoretical analysis suggests that SAM allows us to learn models that are already well-calibrated by implicitly maximizing the entropy of the predictive distribution. Inspired by this finding, we further propose a variant of SAM, coined as CSAM, to ameliorate model calibration. Extensive experiments on various datasets, including ImageNet-1K, demonstrate the benefits of SAM in reducing calibration error. Meanwhile, CSAM performs even better than SAM and consistently achieves lower calibration error than other approaches

Related papers

LightSAM: Parameter-Agnostic Sharpness-Aware Minimization [92.17866492331524]
Sharpness-Aware Minimization (SAM) enhances the ability of the machine learning model by exploring the flat minima landscape through weight perturbations.<n>SAM introduces an additional hyper- parameter, the perturbation radius, which causes the sensitivity of SAM to it.<n>In this paper, we propose the algorithm LightSAM which sets the perturbation radius and learning rate of SAM adaptively.
arXiv Detail & Related papers (2025-05-30T09:28:38Z)
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification [113.6840565194525]
Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult.<n>Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM) to improve generalization by flattening the loss landscape.<n>We introduce Focal-SAM, which assigns different penalties to class-wise, achieving fine-grained control without extra backpropagations.
arXiv Detail & Related papers (2025-05-03T03:01:28Z)
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm [39.656014609027494]
sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning.<n>This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights.<n>A novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates.
arXiv Detail & Related papers (2025-01-11T18:05:33Z)
Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS) In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS) By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance. We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z)
Critical Influence of Overparameterization on Sharpness-aware Minimization [12.321517302762558]
We show that sharpness-aware minimization (SAM) is affected by over parameterization.<n>We show that this effect is particularly pronounced in practical settings involving label noise and sparsity.<n>We also provide insights into how over parameterization helps SAM achieve minima with more uniform Hessian moments compared to SGD.
arXiv Detail & Related papers (2023-11-29T11:19:50Z)
Sharpness-Aware Minimization Alone can Improve Adversarial Robustness [7.9810915020234035]
We explore Sharpness-Aware Minimization (SAM) in the context of adversarial robustness. We find that using only SAM can achieve superior adversarial robustness without sacrificing clean accuracy compared to standard training. We show that SAM and adversarial training (AT) differ in terms of perturbation strength, leading to different accuracy and robustness trade-offs.
arXiv Detail & Related papers (2023-05-09T12:39:21Z)
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks. Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored. We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z)
On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees [5.91402820967386]
We present a new theoretical explanation of why Sharpness-Aware Minimization (SAM) generalizes well. SAM is particularly well-suited for both sharp and non-sharp problems. Our findings are validated using numerical experiments on deep neural networks.
arXiv Detail & Related papers (2023-02-23T07:52:31Z)
How Does Sharpness-Aware Minimization Minimize Sharpness? [29.90109733192208]
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism.
arXiv Detail & Related papers (2022-11-10T17:56:38Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.