Related papers: Flat Posterior Does Matter For Bayesian Model Averaging

Flat Posterior Does Matter For Bayesian Model Averaging

URL: http://arxiv.org/abs/2406.15664v3
Date: Mon, 21 Oct 2024 10:22:17 GMT
Title: Flat Posterior Does Matter For Bayesian Model Averaging
Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, Jinho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song,
Abstract summary: In this work, we empirically demonstrate that BNNs often struggle to capture the flatness. We propose Sharpness-Aware Bayesian Model Averaging (SA-BMA), a novel that seeks flat posteriors by calculating neural divergence. We validate the efficacy of SA-BMA in enhancing generalization performance in few-shot classification and distribution shift.
Score: 15.371686185626162
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bayesian neural network (BNN) approximates the posterior distribution of model parameters and utilizes the posterior for prediction via Bayesian Model Averaging (BMA). The quality of the posterior approximation is critical for achieving accurate and robust predictions. It is known that flatness in the loss landscape is strongly associated with generalization performance, and it necessitates consideration to improve the quality of the posterior approximation. In this work, we empirically demonstrate that BNNs often struggle to capture the flatness. Moreover, we provide both experimental and theoretical evidence showing that BMA can be ineffective without ensuring flatness. To address this, we propose Sharpness-Aware Bayesian Model Averaging (SA-BMA), a novel optimizer that seeks flat posteriors by calculating divergence in the parameter space. SA-BMA aligns with the intrinsic nature of BNN and the generalized version of existing sharpness-aware optimizers for DNN. In addition, we suggest a Bayesian Transfer Learning scheme to efficiently leverage pre-trained DNN. We validate the efficacy of SA-BMA in enhancing generalization performance in few-shot classification and distribution shift by ensuring flat posterior.

Related papers

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Flexible Bayesian Last Layer Models Using Implicit Priors and Diffusion Posterior Sampling [7.084307990641011]
We introduce a novel approach that combines diffusion techniques and implicit priors for variational learning of Bayesian last layer weights. By delivering an explicit and computationally efficient variational lower bound, our method aims to augment the expressive abilities of BLL models.
arXiv Detail & Related papers (2024-08-07T12:59:58Z)
Scalable Bayesian Learning with posteriors [0.856335408411906]
We introduce posteriors, an easily PyTorch library hosting general-purpose implementations of Bayesian learning. We demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models.
arXiv Detail & Related papers (2024-05-31T18:00:12Z)
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z)
Entropy-MCMC: Sampling from Flat Basins with Ease [10.764160559530849]
We introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks.
arXiv Detail & Related papers (2023-10-09T04:40:20Z)
Improving Transferability of Adversarial Examples via Bayesian Attacks [84.90830931076901]
We introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters. Our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively.
arXiv Detail & Related papers (2023-07-21T03:43:07Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters. EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z)
Flat Seeking Bayesian Neural Networks [32.61417343756841]
We develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness. We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks.
arXiv Detail & Related papers (2023-02-06T11:40:44Z)
Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables. We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption. We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z)
Adversarial Bayesian Simulation [0.9137554315375922]
We bridge approximate Bayesian computation (ABC) with deep neural implicit samplers based on adversarial networks (GANs) and adversarial variational Bayes. We develop a Bayesian GAN that directly targets the posterior by solving an adversarial optimization problem. We show that the typical total variation distance between the true and approximate posteriors converges to zero for certain neural network generators and discriminators.
arXiv Detail & Related papers (2022-08-25T14:18:39Z)
Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation. We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z)
Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks [27.11052209129402]
We experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior itself. We show that the resulting posterior approximation is competitive with even the gold-standard full-batch Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2022-05-20T09:24:39Z)
Posterior temperature optimized Bayesian models for inverse problems in medical imaging [59.82184400837329]
We present an unsupervised Bayesian approach to inverse problems in medical imaging using mean-field variational inference with a fully tempered posterior. We show that an optimized posterior temperature leads to improved accuracy and uncertainty estimation. Our source code is publicly available at calibrated.com/Cardio-AI/mfvi-dip-mia.
arXiv Detail & Related papers (2022-02-02T12:16:33Z)
What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles. We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
Bayesian Neural Networks With Maximum Mean Discrepancy Regularization [13.97417198693205]
We show that our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks. We also provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks.
arXiv Detail & Related papers (2020-03-02T14:54:48Z)
Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization. We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.