Exact posterior distributions of wide Bayesian neural networks
- URL: http://arxiv.org/abs/2006.10541v2
- Date: Thu, 26 Nov 2020 10:36:55 GMT
- Title: Exact posterior distributions of wide Bayesian neural networks
- Authors: Jiri Hron and Yasaman Bahri and Roman Novak and Jeffrey Pennington and
Jascha Sohl-Dickstein
- Abstract summary: We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior.
For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
- Score: 51.20413322972014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that the prior over functions induced by a deep
Bayesian neural network (BNN) behaves as a Gaussian process (GP) as the width
of all layers becomes large. However, many BNN applications are concerned with
the BNN function space posterior. While some empirical evidence of the
posterior convergence was provided in the original works of Neal (1996) and
Matthews et al. (2018), it is limited to small datasets or architectures due to
the notorious difficulty of obtaining and verifying exactness of BNN posterior
approximations. We provide the missing theoretical proof that the exact BNN
posterior converges (weakly) to the one induced by the GP limit of the prior.
For empirical validation, we show how to generate exact samples from a finite
BNN on a small dataset via rejection sampling.
Related papers
- Posterior concentrations of fully-connected Bayesian neural networks with general priors on the weights [3.5865188519566003]
We present a new approximation theory for non-sparse Deep Neural Networks (DNNs) with bounded parameters.
We show that BNNs with non-sparse general priors can achieve near-minimax optimal posterior concentration rates to the true model.
arXiv Detail & Related papers (2024-03-21T08:31:36Z) - Wide Bayesian neural networks have a simple weight posterior: theory and
accelerated sampling [48.94555574632823]
Repriorisation transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow.
We develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN.
We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks.
arXiv Detail & Related papers (2022-06-15T17:11:08Z) - Wide Mean-Field Bayesian Neural Networks Ignore the Data [29.050507540280922]
We show that mean-field variational inference entirely fails to model the data when the network width is large.
We show that the optimal approximate posterior need not tend to the prior if the activation function is not odd.
arXiv Detail & Related papers (2022-02-23T18:21:50Z) - A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data.
In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem.
We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z) - Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z) - An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their
Asymptotic Overconfidence [65.24701908364383]
A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data.
But far away from them, ReLU neural networks (BNNs) can still underestimate uncertainty and thus be overconfident.
We show that it can be applied emphpost-hoc to any pre-trained ReLU BNN at a low cost.
arXiv Detail & Related papers (2020-10-06T13:32:18Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Predicting the outputs of finite deep neural networks trained with noisy
gradients [1.1470070927586014]
A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian Processes (GPs)
Here we consider a DNN training protocol involving noise, weight decay and finite width, whose outcome corresponds to a certain non-Gaussian process.
An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width.
arXiv Detail & Related papers (2020-04-02T18:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.