Towards Improved Variational Inference for Deep Bayesian Models
- URL: http://arxiv.org/abs/2401.12418v1
- Date: Tue, 23 Jan 2024 00:40:20 GMT
- Title: Towards Improved Variational Inference for Deep Bayesian Models
- Authors: Sebastian W. Ober
- Abstract summary: In this thesis, we explore the use of variational inference (VI) as an approximation.
VI is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood.
We propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes.
- Score: 7.841254447222393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has revolutionized the last decade, being at the forefront of
extraordinary advances in a wide range of tasks including computer vision,
natural language processing, and reinforcement learning, to name but a few.
However, it is well-known that deep models trained via maximum likelihood
estimation tend to be overconfident and give poorly-calibrated predictions.
Bayesian deep learning attempts to address this by placing priors on the model
parameters, which are then combined with a likelihood to perform posterior
inference. Unfortunately, for deep models, the true posterior is intractable,
forcing the user to resort to approximations. In this thesis, we explore the
use of variational inference (VI) as an approximation, as it is unique in
simultaneously approximating the posterior and providing a lower bound to the
marginal likelihood. If tight enough, this lower bound can be used to optimize
hyperparameters and to facilitate model selection. However, this capacity has
rarely been used to its full extent for Bayesian neural networks, likely
because the approximate posteriors typically used in practice can lack the
flexibility to effectively bound the marginal likelihood. We therefore explore
three aspects of Bayesian learning for deep models: 1) we ask whether it is
necessary to perform inference over as many parameters as possible, or whether
it is reasonable to treat many of them as optimizable hyperparameters; 2) we
propose a variational posterior that provides a unified view of inference in
Bayesian neural networks and deep Gaussian processes; 3) we demonstrate how VI
can be improved in certain deep Gaussian process models by analytically
removing symmetries from the posterior, and performing inference on Gram
matrices instead of features. We hope that our contributions will provide a
stepping stone to fully realize the promises of VI in the future.
Related papers
- Scalable Bayesian Learning with posteriors [0.856335408411906]
We introduce posteriors, an easily PyTorch library hosting general-purpose implementations of Bayesian learning.
We demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models.
arXiv Detail & Related papers (2024-05-31T18:00:12Z) - Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks [0.5827521884806072]
Large neural networks trained on large datasets have become the dominant paradigm in machine learning.
This thesis develops scalable methods to equip neural networks with model uncertainty.
arXiv Detail & Related papers (2024-04-29T23:38:58Z) - Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature [13.36200518068162]
We propose the framework of post-process Bayesian inference as a means to obtain a quick posterior approximation from existing target density evaluations.
Within this framework, we introduce Variational Sparse Bayesian Quadrature (VSBQ), a method for post-process approximate inference for models with black-box and potentially noisy likelihoods.
We validate our method on challenging synthetic scenarios and real-world applications from computational neuroscience.
arXiv Detail & Related papers (2023-03-09T13:58:35Z) - Flat Seeking Bayesian Neural Networks [32.61417343756841]
We develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior.
Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness.
We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks.
arXiv Detail & Related papers (2023-02-06T11:40:44Z) - Bayesian Neural Network Inference via Implicit Models and the Posterior
Predictive Distribution [0.8122270502556371]
We propose a novel approach to perform approximate Bayesian inference in complex models such as Bayesian neural networks.
The approach is more scalable to large data than Markov Chain Monte Carlo.
We see this being useful in applications such as surrogate and physics-based models.
arXiv Detail & Related papers (2022-09-06T02:43:19Z) - Quasi Black-Box Variational Inference with Natural Gradients for
Bayesian Learning [84.90242084523565]
We develop an optimization algorithm suitable for Bayesian learning in complex models.
Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations.
arXiv Detail & Related papers (2022-05-23T18:54:27Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Challenges and Opportunities in High-dimensional Variational Inference [65.53746326245059]
We show why intuitions about approximate families and divergences for low-dimensional posteriors fail for higher-dimensional posteriors.
For high-dimensional posteriors we recommend using the exclusive KL divergence that is most stable and easiest to optimize.
In low to moderate dimensions, heavy-tailed variational families and mass-covering divergences can increase the chances that the approximation can be improved by importance sampling.
arXiv Detail & Related papers (2021-03-01T15:53:34Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.