Related papers: What Are Bayesian Neural Network Posteriors Really Like?

What Are Bayesian Neural Network Posteriors Really Like?

URL: http://arxiv.org/abs/2104.14421v1
Date: Thu, 29 Apr 2021 15:38:46 GMT
Title: What Are Bayesian Neural Network Posteriors Really Like?
Authors: Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson
Abstract summary: We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles. We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
Score: 63.950151520585024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a "cold posterior" effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; (6) while cheaper alternatives such as deep ensembles and SGMCMC methods can provide good generalization, they provide distinct predictive distributions from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.

Related papers

Last Layer Hamiltonian Monte Carlo [1.2582887633807602]
Hamiltonian Monte Carlo (HMC) sampling is a probabilistic last layer approach for deep neural networks (DNNs)<n>Last layer HMC (LL--HMC) reduces the required computations by restricting the HMC sampling to the final layer of a DNN.<n>We compare LL-HMC against five last layer probabilistic deep learning (LL-PDL) methods across three real-world video datasets for driver action and intention.
arXiv Detail & Related papers (2025-07-11T10:24:57Z)
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo [32.46884330460211]
We propose a simple yet effective approach to enhance sample diversity in Gradient Markov Chain Monte Carlo. This approach produces a more diverse set of samples, allowing faster mixing within the same computational budget. Our experiments on image classification tasks, including OOD robustness, diversity, loss surface analyses, and a comparative study with Hamiltonian Monte Carlo, demonstrate the superiority of the proposed approach.
arXiv Detail & Related papers (2025-03-02T02:42:50Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling [48.94555574632823]
Repriorisation transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. We develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN. We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks.
arXiv Detail & Related papers (2022-06-15T17:11:08Z)
Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks [27.11052209129402]
We experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior itself. We show that the resulting posterior approximation is competitive with even the gold-standard full-batch Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2022-05-20T09:24:39Z)
Structured Stochastic Gradient MCMC [20.68905354115655]
We propose a new non-parametric variational approximation that makes no assumptions about the approximate posterior's functional form. We obtain better predictive likelihoods and larger effective sample sizes than full SGMCMC.
arXiv Detail & Related papers (2021-07-19T17:18:10Z)
Dangers of Bayesian Model Averaging under Covariate Shift [45.20204749251884]
We show how a Bayesian model average can in fact be problematic under covariate shift. We additionally show why the same issue does not affect many approximate inference procedures.
arXiv Detail & Related papers (2021-06-22T16:19:52Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process. This gives us a better understanding of the implicit prior NNs place on function space. We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z)
Scaling Hamiltonian Monte Carlo Inference for Bayesian Neural Networks with Symmetric Splitting [6.684193501969829]
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo approach that exhibits favourable exploration properties in high-dimensional models such as neural networks. We introduce a new integration scheme for split HMC that does not rely on symmetric gradients. Our approach demonstrates HMC as a feasible option when considering inference schemes for large-scale machine learning problems.
arXiv Detail & Related papers (2020-10-14T01:58:34Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.