What Are Bayesian Neural Network Posteriors Really Like?
- URL: http://arxiv.org/abs/2104.14421v1
- Date: Thu, 29 Apr 2021 15:38:46 GMT
- Title: What Are Bayesian Neural Network Posteriors Really Like?
- Authors: Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon
Wilson
- Abstract summary: We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles.
We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
- Score: 63.950151520585024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely
high-dimensional and non-convex. For computational reasons, researchers
approximate this posterior using inexpensive mini-batch methods such as
mean-field variational inference or stochastic-gradient Markov chain Monte
Carlo (SGMCMC). To investigate foundational questions in Bayesian deep
learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern
architectures. We show that (1) BNNs can achieve significant performance gains
over standard training and deep ensembles; (2) a single long HMC chain can
provide a comparable representation of the posterior to multiple shorter
chains; (3) in contrast to recent studies, we find posterior tempering is not
needed for near-optimal performance, with little evidence for a "cold
posterior" effect, which we show is largely an artifact of data augmentation;
(4) BMA performance is robust to the choice of prior scale, and relatively
similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5)
Bayesian neural networks show surprisingly poor generalization under domain
shift; (6) while cheaper alternatives such as deep ensembles and SGMCMC methods
can provide good generalization, they provide distinct predictive distributions
from HMC. Notably, deep ensemble predictive distributions are similarly close
to HMC as standard SGLD, and closer than standard variational inference.
Related papers
- Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Wide Bayesian neural networks have a simple weight posterior: theory and
accelerated sampling [48.94555574632823]
Repriorisation transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow.
We develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN.
We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks.
arXiv Detail & Related papers (2022-06-15T17:11:08Z) - Posterior Refinement Improves Sample Efficiency in Bayesian Neural
Networks [27.11052209129402]
We experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior itself.
We show that the resulting posterior approximation is competitive with even the gold-standard full-batch Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2022-05-20T09:24:39Z) - Structured Stochastic Gradient MCMC [20.68905354115655]
We propose a new non-parametric variational approximation that makes no assumptions about the approximate posterior's functional form.
We obtain better predictive likelihoods and larger effective sample sizes than full SGMCMC.
arXiv Detail & Related papers (2021-07-19T17:18:10Z) - Dangers of Bayesian Model Averaging under Covariate Shift [45.20204749251884]
We show how a Bayesian model average can in fact be problematic under covariate shift.
We additionally show why the same issue does not affect many approximate inference procedures.
arXiv Detail & Related papers (2021-06-22T16:19:52Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z) - Scaling Hamiltonian Monte Carlo Inference for Bayesian Neural Networks
with Symmetric Splitting [6.684193501969829]
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo approach that exhibits favourable exploration properties in high-dimensional models such as neural networks.
We introduce a new integration scheme for split HMC that does not rely on symmetric gradients.
Our approach demonstrates HMC as a feasible option when considering inference schemes for large-scale machine learning problems.
arXiv Detail & Related papers (2020-10-14T01:58:34Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.