Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data
- URL: http://arxiv.org/abs/2106.07052v1
- Date: Sun, 13 Jun 2021 17:36:38 GMT
- Title: Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data
- Authors: Beau Coker, Weiwei Pan, Finale Doshi-Velez
- Abstract summary: We prove that as the number of hidden units in a single-layer Bayesian neural network tends to infinity, the function-space posterior mean under mean-field variational inference actually converges to zero.
This is in contrast to the true posterior, which converges to a Gaussian process.
- Score: 30.955325548635425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variational inference enables approximate posterior inference of the highly
over-parameterized neural networks that are popular in modern machine learning.
Unfortunately, such posteriors are known to exhibit various pathological
behaviors. We prove that as the number of hidden units in a single-layer
Bayesian neural network tends to infinity, the function-space posterior mean
under mean-field variational inference actually converges to zero, completely
ignoring the data. This is in contrast to the true posterior, which converges
to a Gaussian process. Our work provides insight into the over-regularization
of the KL divergence in variational inference.
Related papers
- On the detrimental effect of invariances in the likelihood for
variational inference [21.912271882110986]
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability.
Prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes.
arXiv Detail & Related papers (2022-09-15T09:13:30Z) - Asymptotic Properties for Bayesian Neural Network in Besov Space [1.90365714903665]
We show that the Bayesian neural network using spike-and-slab prior consistency has nearly minimax convergence rate when the true regression function is in the Besov space.
We propose a practical neural network with guaranteed properties.
arXiv Detail & Related papers (2022-06-01T05:47:06Z) - Wide Mean-Field Bayesian Neural Networks Ignore the Data [29.050507540280922]
We show that mean-field variational inference entirely fails to model the data when the network width is large.
We show that the optimal approximate posterior need not tend to the prior if the activation function is not odd.
arXiv Detail & Related papers (2022-02-23T18:21:50Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters.
We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z) - Probabilistic Numeric Convolutional Neural Networks [80.42120128330411]
Continuous input signals like images and time series that are irregularly sampled or have missing values are challenging for existing deep learning methods.
We propose Probabilistic Convolutional Neural Networks which represent features as Gaussian processes (GPs)
We then define a convolutional layer as the evolution of a PDE defined on this GP, followed by a nonlinearity.
In experiments we show that our approach yields a $3times$ reduction of error from the previous state of the art on the SuperPixel-MNIST dataset and competitive performance on the medical time2012 dataset PhysioNet.
arXiv Detail & Related papers (2020-10-21T10:08:21Z) - An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their
Asymptotic Overconfidence [65.24701908364383]
A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data.
But far away from them, ReLU neural networks (BNNs) can still underestimate uncertainty and thus be overconfident.
We show that it can be applied emphpost-hoc to any pre-trained ReLU BNN at a low cost.
arXiv Detail & Related papers (2020-10-06T13:32:18Z) - Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior.
For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z) - Bayesian Neural Network via Stochastic Gradient Descent [0.0]
We show how gradient estimation can be applied on bayesian neural networks by gradient estimation techniques.
Our work considerably beats the previous state of the art approaches for regression using bayesian neural networks.
arXiv Detail & Related papers (2020-06-04T18:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.