Law of Large Numbers for Bayesian two-layer Neural Network trained with
Variational Inference
- URL: http://arxiv.org/abs/2307.04779v1
- Date: Mon, 10 Jul 2023 07:50:09 GMT
- Title: Law of Large Numbers for Bayesian two-layer Neural Network trained with
Variational Inference
- Authors: Arnaud Descours (LMBP), Tom Huix (X), Arnaud Guillin (LMBP), Manon
Michel (LMBP), \'Eric Moulines (X), Boris Nectoux (LMBP)
- Abstract summary: We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks.
We prove a law of large numbers for three different training schemes.
An important result is that all methods converge to the same mean-field limit.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a rigorous analysis of training by variational inference (VI) of
Bayesian neural networks in the two-layer and infinite-width case. We consider
a regression problem with a regularized evidence lower bound (ELBO) which is
decomposed into the expected log-likelihood of the data and the
Kullback-Leibler (KL) divergence between the a priori distribution and the
variational posterior. With an appropriate weighting of the KL, we prove a law
of large numbers for three different training schemes: (i) the idealized case
with exact estimation of a multiple Gaussian integral from the
reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling,
commonly known as Bayes by Backprop, and (iii) a new and computationally
cheaper algorithm which we introduce as Minimal VI. An important result is that
all methods converge to the same mean-field limit. Finally, we illustrate our
results numerically and discuss the need for the derivation of a central limit
theorem.
Related papers
- Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows [10.153270126742369]
We study efficient approximate sampling for probability distributions known up to normalization constants.
We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications.
arXiv Detail & Related papers (2024-06-25T04:07:22Z) - Central Limit Theorem for Bayesian Neural Network trained with Variational Inference [0.32985979395737786]
We derive Central Limit Theorems (CLT) for Bayesian two-layerneural networks in the infinite-width limit and trained by variational inference on a regression task.
By deriving CLT, this work shows that the idealized Bayes-by-Backprop schemes have similar fluctuation behavior, that is different from the Minimal VI one.
arXiv Detail & Related papers (2024-06-10T11:05:48Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems.
We control the Lipschitz constant explicitly using a user-defined upper-bound.
We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand.
We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice.
Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.