How Good is the Bayes Posterior in Deep Neural Networks Really?
- URL: http://arxiv.org/abs/2002.02405v2
- Date: Thu, 2 Jul 2020 22:18:12 GMT
- Title: How Good is the Bayes Posterior in Deep Neural Networks Really?
- Authors: Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub
\'Swi\k{a}tkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans,
Rodolphe Jenatton, Sebastian Nowozin
- Abstract summary: We cast doubt on the current understanding of Bayes posteriors in popular deep neural networks.
We demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions.
We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments.
- Score: 46.66866466260469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: During the past five years the Bayesian deep learning community has developed
increasingly accurate and efficient approximate inference procedures that allow
for Bayesian inference in deep neural networks. However, despite this
algorithmic progress and the promise of improved uncertainty quantification and
sample efficiency there are---as of early 2020---no publicized deployments of
Bayesian neural networks in industrial practice. In this work we cast doubt on
the current understanding of Bayes posteriors in popular deep neural networks:
we demonstrate through careful MCMC sampling that the posterior predictive
induced by the Bayes posterior yields systematically worse predictions compared
to simpler methods including point estimates obtained from SGD. Furthermore, we
demonstrate that predictive performance is improved significantly through the
use of a "cold posterior" that overcounts evidence. Such cold posteriors
sharply deviate from the Bayesian paradigm but are commonly used as heuristic
in Bayesian deep learning papers. We put forward several hypotheses that could
explain cold posteriors and evaluate the hypotheses through experiments. Our
work questions the goal of accurate posterior approximations in Bayesian deep
learning: If the true Bayes posterior is poor, what is the use of more accurate
approximations? Instead, we argue that it is timely to focus on understanding
the origin of the improved performance of cold posteriors.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Hessian-Free Laplace in Bayesian Deep Learning [44.16006844888796]
Hessian-free Laplace (HFL) approximation uses curvature of both the log posterior and network prediction to estimate its variance.
We show that, under standard assumptions of LA in Bayesian deep learning, HFL targets the same variance as LA, and can be efficiently amortized in a pre-trained network.
arXiv Detail & Related papers (2024-03-15T20:47:39Z) - Towards Improved Variational Inference for Deep Bayesian Models [7.841254447222393]
In this thesis, we explore the use of variational inference (VI) as an approximation.
VI is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood.
We propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes.
arXiv Detail & Related papers (2024-01-23T00:40:20Z) - Neural Importance Sampling for Rapid and Reliable Gravitational-Wave
Inference [59.040209568168436]
We first generate a rapid proposal for the Bayesian posterior using neural networks, and then attach importance weights based on the underlying likelihood and prior.
This provides (1) a corrected posterior free from network inaccuracies, (2) a performance diagnostic (the sample efficiency) for assessing the proposal and identifying failure cases, and (3) an unbiased estimate of the Bayesian evidence.
We carry out a large study analyzing 42 binary black hole mergers observed by LIGO and Virgo with the SEOBNRv4PHM and IMRPhenomHMXP waveform models.
arXiv Detail & Related papers (2022-10-11T18:00:02Z) - Posterior temperature optimized Bayesian models for inverse problems in
medical imaging [59.82184400837329]
We present an unsupervised Bayesian approach to inverse problems in medical imaging using mean-field variational inference with a fully tempered posterior.
We show that an optimized posterior temperature leads to improved accuracy and uncertainty estimation.
Our source code is publicly available at calibrated.com/Cardio-AI/mfvi-dip-mia.
arXiv Detail & Related papers (2022-02-02T12:16:33Z) - A statistical theory of cold posteriors in deep neural networks [32.45282187405337]
We show that BNNs for image classification use the wrong likelihood.
In particular, standard image benchmark datasets such as CIFAR-10 are carefully curated.
arXiv Detail & Related papers (2020-08-13T13:46:58Z) - Statistical Foundation of Variational Bayes Neural Networks [0.456877715768796]
Variational Bayes (VB) provides a useful alternative to circumvent the computational cost and time complexity associated with the generation of samples from the true posterior.
This paper establishes the fundamental result of posterior consistency for the mean-field variational posterior (VP) for a feed-forward artificial neural network model.
arXiv Detail & Related papers (2020-06-29T03:04:18Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.