Wide Mean-Field Bayesian Neural Networks Ignore the Data
- URL: http://arxiv.org/abs/2202.11670v1
- Date: Wed, 23 Feb 2022 18:21:50 GMT
- Title: Wide Mean-Field Bayesian Neural Networks Ignore the Data
- Authors: Beau Coker, Wessel P. Bruinsma, David R. Burt, Weiwei Pan, Finale
Doshi-Velez
- Abstract summary: We show that mean-field variational inference entirely fails to model the data when the network width is large.
We show that the optimal approximate posterior need not tend to the prior if the activation function is not odd.
- Score: 29.050507540280922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian neural networks (BNNs) combine the expressive power of deep learning
with the advantages of Bayesian formalism. In recent years, the analysis of
wide, deep BNNs has provided theoretical insight into their priors and
posteriors. However, we have no analogous insight into their posteriors under
approximate inference. In this work, we show that mean-field variational
inference entirely fails to model the data when the network width is large and
the activation function is odd. Specifically, for fully-connected BNNs with odd
activation functions and a homoscedastic Gaussian likelihood, we show that the
optimal mean-field variational posterior predictive (i.e., function space)
distribution converges to the prior predictive distribution as the width tends
to infinity. We generalize aspects of this result to other likelihoods. Our
theoretical results are suggestive of underfitting behavior previously
observered in BNNs. While our convergence bounds are non-asymptotic and
constants in our analysis can be computed, they are currently too loose to be
applicable in standard training regimes. Finally, we show that the optimal
approximate posterior need not tend to the prior if the activation function is
not odd, showing that our statements cannot be generalized arbitrarily.
Related papers
- Posterior concentrations of fully-connected Bayesian neural networks with general priors on the weights [3.5865188519566003]
We present a new approximation theory for non-sparse Deep Neural Networks (DNNs) with bounded parameters.
We show that BNNs with non-sparse general priors can achieve near-minimax optimal posterior concentration rates to the true model.
arXiv Detail & Related papers (2024-03-21T08:31:36Z) - Bayesian Neural Networks with Domain Knowledge Priors [52.80929437592308]
We propose a framework for integrating general forms of domain knowledge into a BNN prior.
We show that BNNs using our proposed domain knowledge priors outperform those with standard priors.
arXiv Detail & Related papers (2024-02-20T22:34:53Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data [30.955325548635425]
We prove that as the number of hidden units in a single-layer Bayesian neural network tends to infinity, the function-space posterior mean under mean-field variational inference actually converges to zero.
This is in contrast to the true posterior, which converges to a Gaussian process.
arXiv Detail & Related papers (2021-06-13T17:36:38Z) - An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their
Asymptotic Overconfidence [65.24701908364383]
A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data.
But far away from them, ReLU neural networks (BNNs) can still underestimate uncertainty and thus be overconfident.
We show that it can be applied emphpost-hoc to any pre-trained ReLU BNN at a low cost.
arXiv Detail & Related papers (2020-10-06T13:32:18Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Characteristics of Monte Carlo Dropout in Wide Neural Networks [16.639005039546745]
Monte Carlo (MC) dropout is one of the state-of-the-art approaches for uncertainty estimation in neural networks (NNs)
We study the limiting distribution of wide untrained NNs under dropout more rigorously and prove that they as well converge to Gaussian processes for fixed sets of weights and biases.
We investigate how (strongly) correlated pre-activations can induce non-Gaussian behavior in NNs with strongly correlated weights.
arXiv Detail & Related papers (2020-07-10T15:14:43Z) - Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior.
For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.