Flat Posterior Does Matter For Bayesian Model Averaging
- URL: http://arxiv.org/abs/2406.15664v3
- Date: Mon, 21 Oct 2024 10:22:17 GMT
- Title: Flat Posterior Does Matter For Bayesian Model Averaging
- Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, Jinho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song,
- Abstract summary: In this work, we empirically demonstrate that BNNs often struggle to capture the flatness.
We propose Sharpness-Aware Bayesian Model Averaging (SA-BMA), a novel that seeks flat posteriors by calculating neural divergence.
We validate the efficacy of SA-BMA in enhancing generalization performance in few-shot classification and distribution shift.
- Score: 15.371686185626162
- License:
- Abstract: Bayesian neural network (BNN) approximates the posterior distribution of model parameters and utilizes the posterior for prediction via Bayesian Model Averaging (BMA). The quality of the posterior approximation is critical for achieving accurate and robust predictions. It is known that flatness in the loss landscape is strongly associated with generalization performance, and it necessitates consideration to improve the quality of the posterior approximation. In this work, we empirically demonstrate that BNNs often struggle to capture the flatness. Moreover, we provide both experimental and theoretical evidence showing that BMA can be ineffective without ensuring flatness. To address this, we propose Sharpness-Aware Bayesian Model Averaging (SA-BMA), a novel optimizer that seeks flat posteriors by calculating divergence in the parameter space. SA-BMA aligns with the intrinsic nature of BNN and the generalized version of existing sharpness-aware optimizers for DNN. In addition, we suggest a Bayesian Transfer Learning scheme to efficiently leverage pre-trained DNN. We validate the efficacy of SA-BMA in enhancing generalization performance in few-shot classification and distribution shift by ensuring flat posterior.
Related papers
- Flexible Bayesian Last Layer Models Using Implicit Priors and Diffusion Posterior Sampling [7.084307990641011]
We introduce a novel approach that combines diffusion techniques and implicit priors for variational learning of Bayesian last layer weights.
By delivering an explicit and computationally efficient variational lower bound, our method aims to augment the expressive abilities of BLL models.
arXiv Detail & Related papers (2024-08-07T12:59:58Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Flat Seeking Bayesian Neural Networks [32.61417343756841]
We develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior.
Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness.
We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks.
arXiv Detail & Related papers (2023-02-06T11:40:44Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - Posterior Refinement Improves Sample Efficiency in Bayesian Neural
Networks [27.11052209129402]
We experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior itself.
We show that the resulting posterior approximation is competitive with even the gold-standard full-batch Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2022-05-20T09:24:39Z) - What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles.
We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Bayesian Neural Networks With Maximum Mean Discrepancy Regularization [13.97417198693205]
We show that our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks.
We also provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks.
arXiv Detail & Related papers (2020-03-02T14:54:48Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.