Wide Bayesian neural networks have a simple weight posterior: theory and
accelerated sampling
- URL: http://arxiv.org/abs/2206.07673v1
- Date: Wed, 15 Jun 2022 17:11:08 GMT
- Title: Wide Bayesian neural networks have a simple weight posterior: theory and
accelerated sampling
- Authors: Jiri Hron and Roman Novak and Jeffrey Pennington and Jascha
Sohl-Dickstein
- Abstract summary: Repriorisation transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow.
We develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN.
We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks.
- Score: 48.94555574632823
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce repriorisation, a data-dependent reparameterisation which
transforms a Bayesian neural network (BNN) posterior to a distribution whose KL
divergence to the BNN prior vanishes as layer widths grow. The repriorisation
map acts directly on parameters, and its analytic simplicity complements the
known neural network Gaussian process (NNGP) behaviour of wide BNNs in function
space. Exploiting the repriorisation, we develop a Markov chain Monte Carlo
(MCMC) posterior sampling algorithm which mixes faster the wider the BNN. This
contrasts with the typically poor performance of MCMC in high dimensions. We
observe up to 50x higher effective sample size relative to no reparametrisation
for both fully-connected and residual networks. Improvements are achieved at
all widths, with the margin between reparametrised and standard BNNs growing
with layer width.
Related papers
- Feature Learning and Generalization in Deep Networks with Orthogonal Weights [1.7956122940209063]
Deep neural networks with numerically weights from independent Gaussian distributions can be tuned to criticality.
These networks still exhibit fluctuations that grow linearly with the depth of the network.
We show analytically that rectangular networks with tanh activations and weights from the ensemble of matrices have corresponding preactivation fluctuations.
arXiv Detail & Related papers (2023-10-11T18:00:02Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Masked Bayesian Neural Networks : Computation and Optimality [1.3649494534428745]
We propose a novel sparse Bayesian neural network (BNN) which searches a good deep neural network with an appropriate complexity.
We employ the masking variables at each node which can turn off some nodes according to the posterior distribution to yield a nodewise sparse DNN.
By analyzing several benchmark datasets, we illustrate that the proposed BNN performs well compared to other existing methods.
arXiv Detail & Related papers (2022-06-02T02:59:55Z) - What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles.
We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z) - Bayesian Neural Network Priors Revisited [29.949163519715952]
We study summary statistics of neural network weights in different networks trained using SGD.
We find that fully connected networks (FCNNs) display heavy-tailed weight distributions, while convolutional neural network (CNN) weights display strong spatial correlations.
arXiv Detail & Related papers (2021-02-12T15:18:06Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Study on the Large Batch Size Training of Neural Networks Based on the
Second Order Gradient [1.3794617022004712]
Large batch size training in deep neural networks (DNNs) possesses a well-known 'generalization gap' that remarkably induces generalization performance degradation.
Here, we combine theory with experiments to explore the evolution of the basic structural properties, including gradient, parameter update step length, and loss update step length of NNs under varying batch sizes.
arXiv Detail & Related papers (2020-12-16T08:43:15Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior.
For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.