Related papers: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

URL: http://arxiv.org/abs/2206.07673v1
Date: Wed, 15 Jun 2022 17:11:08 GMT
Title: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
Authors: Jiri Hron and Roman Novak and Jeffrey Pennington and Jascha Sohl-Dickstein
Abstract summary: Repriorisation transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. We develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN. We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks.
Score: 48.94555574632823
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. The repriorisation map acts directly on parameters, and its analytic simplicity complements the known neural network Gaussian process (NNGP) behaviour of wide BNNs in function space. Exploiting the repriorisation, we develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN. This contrasts with the typically poor performance of MCMC in high dimensions. We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks. Improvements are achieved at all widths, with the margin between reparametrised and standard BNNs growing with layer width.

Related papers

Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics [0.8749675983608172]
We propose a scalable kinetic Langevin dynamics algorithm for sampling parameter spaces of big data and AI applications. We show that the resulting Symmetric Minibatch Splitting-UBU (SMS-UBU) integrator has bias $O(h2 d1/2)$ in dimension $d>0$ with stepsize $h>0$. We apply the algorithm to explore local modes of the posterior distribution of Bayesian neural networks (BNNs) and evaluate the calibration performance of the posterior predictive probabilities for neural networks with convolutional neural network architectures.
arXiv Detail & Related papers (2024-10-14T13:47:02Z)
Function-Space MCMC for Bayesian Wide Neural Networks [9.899763598214124]
We investigate the use of the preconditioned Crank-Nicolson algorithm and its Langevin version to sample from the reparametrised posterior distribution of the weights. We prove that the acceptance probabilities of the proposed methods approach 1 as the width of the network increases.
arXiv Detail & Related papers (2024-08-26T14:54:13Z)
Feature Learning and Generalization in Deep Networks with Orthogonal Weights [1.7956122940209063]
Deep neural networks with numerically weights from independent Gaussian distributions can be tuned to criticality. These networks still exhibit fluctuations that grow linearly with the depth of the network. We show analytically that rectangular networks with tanh activations and weights from the ensemble of matrices have corresponding preactivation fluctuations.
arXiv Detail & Related papers (2023-10-11T18:00:02Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles. We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient [1.3794617022004712]
Large batch size training in deep neural networks (DNNs) possesses a well-known 'generalization gap' that remarkably induces generalization performance degradation. Here, we combine theory with experiments to explore the evolution of the basic structural properties, including gradient, parameter update step length, and loss update step length of NNs under varying batch sizes.
arXiv Detail & Related papers (2020-12-16T08:43:15Z)
Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks. Centered and ensembled finite networks have reduced posterior variance. Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior. For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.