Variational Inference of overparameterized Bayesian Neural Networks: a
theoretical and empirical study
- URL: http://arxiv.org/abs/2207.03859v1
- Date: Fri, 8 Jul 2022 12:31:08 GMT
- Title: Variational Inference of overparameterized Bayesian Neural Networks: a
theoretical and empirical study
- Authors: Tom Huix, Szymon Majewski, Alain Durmus, Eric Moulines, Anna Korba
- Abstract summary: This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN)
We point out a critical issue in the mean-field VI training.
This problem arises from the decomposition of the lower bound on the evidence (ELBO) into two terms.
- Score: 27.86555142135798
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the Variational Inference (VI) used for training Bayesian
Neural Networks (BNN) in the overparameterized regime, i.e., when the number of
neurons tends to infinity. More specifically, we consider overparameterized
two-layer BNN and point out a critical issue in the mean-field VI training.
This problem arises from the decomposition of the lower bound on the evidence
(ELBO) into two terms: one corresponding to the likelihood function of the
model and the second to the Kullback-Leibler (KL) divergence between the prior
distribution and the variational posterior. In particular, we show both
theoretically and empirically that there is a trade-off between these two terms
in the overparameterized regime only when the KL is appropriately re-scaled
with respect to the ratio between the the number of observations and neurons.
We also illustrate our theoretical results with numerical experiments that
highlight the critical choice of this ratio.
Related papers
- Evidential Physics-Informed Neural Networks [0.0]
We present a novel class of Physics-Informed Neural Networks that is formulated based on the principles of Evidential Deep Learning.
We show how to apply our model to inverse problems involving 1D and 2D nonlinear differential equations.
arXiv Detail & Related papers (2025-01-27T10:01:10Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Law of Large Numbers for Bayesian two-layer Neural Network trained with
Variational Inference [0.0]
We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks.
We prove a law of large numbers for three different training schemes.
An important result is that all methods converge to the same mean-field limit.
arXiv Detail & Related papers (2023-07-10T07:50:09Z) - Variational Bayesian Neural Networks via Resolution of Singularities [1.2183405753834562]
We advocate for the importance of singular learning theory (SLT) as it pertains to the theory and practice of variational inference in Bayesian neural networks (BNNs)
We lay to rest some of the confusion surrounding discrepancies between downstream predictive performance measured via e.g., the test log predictive density, and the variational objective.
We use the SLT-corrected form for singular posterior distributions to inform the design of the variational family itself.
arXiv Detail & Related papers (2023-02-13T00:32:49Z) - Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Phenomenology of Double Descent in Finite-Width Neural Networks [29.119232922018732]
Double descent delineates the behaviour of models depending on the regime they belong to.
We use influence functions to derive suitable expressions of the population loss and its lower bound.
Building on our analysis, we investigate how the loss function affects double descent.
arXiv Detail & Related papers (2022-03-14T17:39:49Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Statistical Foundation of Variational Bayes Neural Networks [0.456877715768796]
Variational Bayes (VB) provides a useful alternative to circumvent the computational cost and time complexity associated with the generation of samples from the true posterior.
This paper establishes the fundamental result of posterior consistency for the mean-field variational posterior (VP) for a feed-forward artificial neural network model.
arXiv Detail & Related papers (2020-06-29T03:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.