The k-tied Normal Distribution: A Compact Parameterization of Gaussian
Mean Field Posteriors in Bayesian Neural Networks
- URL: http://arxiv.org/abs/2002.02655v2
- Date: Sun, 5 Jul 2020 19:05:09 GMT
- Title: The k-tied Normal Distribution: A Compact Parameterization of Gaussian
Mean Field Posteriors in Bayesian Neural Networks
- Authors: Jakub Swiatkowski, Kevin Roth, Bastiaan S. Veeling, Linh Tran, Joshua
V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton,
Sebastian Nowozin
- Abstract summary: Variational Bayesian Inference is a popular methodology for approxing posteriorimating over Bayesian neural network weights.
Recent work has explored ever richer parameterizations of the approximate posterior in the hope of improving performance.
We find that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance.
- Score: 46.677567663908185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational Bayesian Inference is a popular methodology for approximating
posterior distributions over Bayesian neural network weights. Recent work
developing this class of methods has explored ever richer parameterizations of
the approximate posterior in the hope of improving performance. In contrast,
here we share a curious experimental finding that suggests instead restricting
the variational distribution to a more compact parameterization. For a variety
of deep Bayesian neural networks trained using Gaussian mean-field variational
inference, we find that the posterior standard deviations consistently exhibit
strong low-rank structure after convergence. This means that by decomposing
these variational parameters into a low-rank factorization, we can make our
variational approximation more compact without decreasing the models'
performance. Furthermore, we find that such factorized parameterizations
improve the signal-to-noise ratio of stochastic gradient estimates of the
variational lower bound, resulting in faster convergence.
Related papers
- Reparameterization invariance in approximate Bayesian inference [32.88960624085645]
We develop a new geometric view of reparametrizations from which we explain the success of linearization.
We demonstrate that these re parameterization invariance properties can be extended to the original neural network predictive.
arXiv Detail & Related papers (2024-06-05T14:49:15Z) - Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance [52.093434664236014]
Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems.
Inspired by this finding, we propose to improve recent methods by using more principled covariance determined by maximum likelihood estimation.
arXiv Detail & Related papers (2024-02-03T13:35:39Z) - On the detrimental effect of invariances in the likelihood for
variational inference [21.912271882110986]
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability.
Prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes.
arXiv Detail & Related papers (2022-09-15T09:13:30Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Variational Laplace for Bayesian neural networks [25.055754094939527]
Variational Laplace exploits a local approximation of the likelihood to estimate the ELBO without the need for sampling the neural-network weights.
We show that early-stopping can be avoided by increasing the learning rate for the variance parameters.
arXiv Detail & Related papers (2021-02-27T14:06:29Z) - Approximation Based Variance Reduction for Reparameterization Gradients [38.73307745906571]
Flexible variational distributions improve variational inference but are harder to optimize.
We present a control variate that is applicable for any reizable distribution with known mean and covariance matrix.
It leads to large improvements in gradient variance and optimization convergence for inference with non-factorized variational distributions.
arXiv Detail & Related papers (2020-07-29T06:55:11Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.