Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight
Posterior Approximations
- URL: http://arxiv.org/abs/2002.03704v4
- Date: Wed, 10 Mar 2021 09:19:13 GMT
- Title: Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight
Posterior Approximations
- Authors: Sebastian Farquhar, Lewis Smith, Yarin Gal
- Abstract summary: We prove that deep mean-field variational weight posteriors can induce similar distributions in function-space to those induced by shallower networks.
Our results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.
- Score: 40.384018112884874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We challenge the longstanding assumption that the mean-field approximation
for variational inference in Bayesian neural networks is severely restrictive,
and show this is not the case in deep networks. We prove several results
indicating that deep mean-field variational weight posteriors can induce
similar distributions in function-space to those induced by shallower networks
with complex weight posteriors. We validate our theoretical contributions
empirically, both through examination of the weight posterior using Hamiltonian
Monte Carlo in small models and by comparing diagonal- to structured-covariance
in large settings. Since complex variational posteriors are often expensive and
cumbersome to implement, our results suggest that using mean-field variational
inference in a deeper model is both a practical and theoretically justified
alternative to structured approximations.
Related papers
- Function-Space MCMC for Bayesian Wide Neural Networks [9.899763598214124]
We investigate the use of the preconditioned Crank-Nicolson algorithm and its Langevin version to sample from the reparametrised posterior distribution of the weights.
We prove that the acceptance probabilities of the proposed methods approach 1 as the width of the network increases.
arXiv Detail & Related papers (2024-08-26T14:54:13Z) - Posterior and variational inference for deep neural networks with heavy-tailed weights [0.0]
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random.
We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates.
We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
arXiv Detail & Related papers (2024-06-05T15:24:20Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks.
We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$.
Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z) - Improving Bayesian Inference in Deep Neural Networks with Variational
Structured Dropout [19.16094166903702]
We introduce a new variational structured approximation inspired by the interpretation of Dropout training as approximate inference in Bayesian networks.
We then propose a novel method called Variational Structured Dropout (VSD) to overcome this limitation.
We conduct experiments on standard benchmarks to demonstrate the effectiveness of VSD over state-of-the-art methods on both predictive accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-02-16T02:33:43Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z) - The k-tied Normal Distribution: A Compact Parameterization of Gaussian
Mean Field Posteriors in Bayesian Neural Networks [46.677567663908185]
Variational Bayesian Inference is a popular methodology for approxing posteriorimating over Bayesian neural network weights.
Recent work has explored ever richer parameterizations of the approximate posterior in the hope of improving performance.
We find that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance.
arXiv Detail & Related papers (2020-02-07T07:33:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.