Bayesian Online Natural Gradient (BONG)
- URL: http://arxiv.org/abs/2405.19681v1
- Date: Thu, 30 May 2024 04:27:36 GMT
- Title: Bayesian Online Natural Gradient (BONG)
- Authors: Matt Jones, Peter Chang, Kevin Murphy,
- Abstract summary: We propose a novel approach to sequential Bayesian inference based on variational Bayes.
In the online setting, we do not need to add the KL term to regularize to the prior.
We prove this method recovers exact Bayesian inference if the model is conjugate.
- Score: 9.800443064368467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach to sequential Bayesian inference based on variational Bayes. The key insight is that, in the online setting, we do not need to add the KL term to regularize to the prior (which comes from the posterior at the previous timestep); instead we can optimize just the expected log-likelihood, performing a single step of natural gradient descent starting at the prior predictive. We prove this method recovers exact Bayesian inference if the model is conjugate, and empirically outperforms other online VB methods in the non-conjugate setting, such as online learning for neural networks, especially when controlling for computational costs.
Related papers
- Discounted Adaptive Online Learning: Towards Better Regularization [5.5899168074961265]
We study online learning in adversarial nonstationary environments.
We propose an adaptive (i.e., instance optimal) algorithm that improves the widespread non-adaptive baseline.
We also consider the (Gibbs and Candes, 2021)-style online conformal prediction problem.
arXiv Detail & Related papers (2024-02-05T04:29:39Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - On Sequential Bayesian Inference for Continual Learning [17.257360928583974]
We revisit sequential Bayesian inference and test whether having access to the true posterior is guaranteed to prevent catastrophic forgetting in neural networks.
We find that this approach fails to prevent catastrophic forgetting demonstrating the difficulty in performing sequential Bayesian inference in neural networks.
We propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with state-of-the-art Bayesian continual learning methods.
arXiv Detail & Related papers (2023-01-04T21:33:13Z) - Quasi Black-Box Variational Inference with Natural Gradients for
Bayesian Learning [84.90242084523565]
We develop an optimization algorithm suitable for Bayesian learning in complex models.
Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations.
arXiv Detail & Related papers (2022-05-23T18:54:27Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Variational Auto-Regressive Gaussian Processes for Continual Learning [17.43751039943161]
We develop a principled posterior updating mechanism to solve sequential tasks in continual learning.
By relying on sparse inducing point approximations for scalable posteriors, we propose a novel auto-regressive variational distribution.
Mean predictive entropy estimates show VAR-GPs prevent catastrophic forgetting.
arXiv Detail & Related papers (2020-06-09T19:23:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.