A statistical theory of cold posteriors in deep neural networks
- URL: http://arxiv.org/abs/2008.05912v2
- Date: Tue, 27 Apr 2021 14:33:30 GMT
- Title: A statistical theory of cold posteriors in deep neural networks
- Authors: Laurence Aitchison
- Abstract summary: We show that BNNs for image classification use the wrong likelihood.
In particular, standard image benchmark datasets such as CIFAR-10 are carefully curated.
- Score: 32.45282187405337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To get Bayesian neural networks to perform comparably to standard neural
networks it is usually necessary to artificially reduce uncertainty using a
"tempered" or "cold" posterior. This is extremely concerning: if the prior is
accurate, Bayes inference/decision theory is optimal, and any artificial
changes to the posterior should harm performance. While this suggests that the
prior may be at fault, here we argue that in fact, BNNs for image
classification use the wrong likelihood. In particular, standard image
benchmark datasets such as CIFAR-10 are carefully curated. We develop a
generative model describing curation which gives a principled Bayesian account
of cold posteriors, because the likelihood under this new generative model
closely matches the tempered likelihoods used in past work.
Related papers
- Can a Confident Prior Replace a Cold Posterior? [20.018444020989712]
We introduce a "DirClip" prior that is practical to sample and nearly matches the performance of a cold posterior.
Second, we introduce a "confidence prior" that directly approximates a cold likelihood in the limit of decreasing temperature but cannot be easily sampled.
arXiv Detail & Related papers (2024-03-02T17:28:55Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Can pruning improve certified robustness of neural networks? [106.03070538582222]
We show that neural network pruning can improve empirical robustness of deep neural networks (NNs)
Our experiments show that by appropriately pruning an NN, its certified accuracy can be boosted up to 8.2% under standard training.
We additionally observe the existence of certified lottery tickets that can match both standard and certified robust accuracies of the original dense models.
arXiv Detail & Related papers (2022-06-15T05:48:51Z) - Bayesian Neural Network Priors Revisited [29.949163519715952]
We study summary statistics of neural network weights in different networks trained using SGD.
We find that fully connected networks (FCNNs) display heavy-tailed weight distributions, while convolutional neural network (CNN) weights display strong spatial correlations.
arXiv Detail & Related papers (2021-02-12T15:18:06Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Cold Posteriors and Aleatoric Uncertainty [32.341379426923105]
Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set.
We argue that commonly used priors can significantly overestimate the aleatoric uncertainty in the labels on many classification datasets.
arXiv Detail & Related papers (2020-07-31T18:37:31Z) - Neural Networks with Recurrent Generative Feedback [61.90658210112138]
We instantiate this design on convolutional neural networks (CNNs)
In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.
arXiv Detail & Related papers (2020-07-17T19:32:48Z) - Bayesian Neural Network via Stochastic Gradient Descent [0.0]
We show how gradient estimation can be applied on bayesian neural networks by gradient estimation techniques.
Our work considerably beats the previous state of the art approaches for regression using bayesian neural networks.
arXiv Detail & Related papers (2020-06-04T18:33:59Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z) - How Good is the Bayes Posterior in Deep Neural Networks Really? [46.66866466260469]
We cast doubt on the current understanding of Bayes posteriors in popular deep neural networks.
We demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions.
We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments.
arXiv Detail & Related papers (2020-02-06T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.