On Sequential Bayesian Inference for Continual Learning
- URL: http://arxiv.org/abs/2301.01828v2
- Date: Sun, 9 Jul 2023 18:57:45 GMT
- Title: On Sequential Bayesian Inference for Continual Learning
- Authors: Samuel Kessler, Adam Cobb, Tim G. J. Rudner, Stefan Zohren, Stephen J.
Roberts
- Abstract summary: We revisit sequential Bayesian inference and test whether having access to the true posterior is guaranteed to prevent catastrophic forgetting in neural networks.
We find that this approach fails to prevent catastrophic forgetting demonstrating the difficulty in performing sequential Bayesian inference in neural networks.
We propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with state-of-the-art Bayesian continual learning methods.
- Score: 17.257360928583974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential Bayesian inference can be used for continual learning to prevent
catastrophic forgetting of past tasks and provide an informative prior when
learning new tasks. We revisit sequential Bayesian inference and test whether
having access to the true posterior is guaranteed to prevent catastrophic
forgetting in Bayesian neural networks. To do this we perform sequential
Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as
a prior for new tasks by fitting a density estimator on Hamiltonian Monte Carlo
samples. We find that this approach fails to prevent catastrophic forgetting
demonstrating the difficulty in performing sequential Bayesian inference in
neural networks. From there we study simple analytical examples of sequential
Bayesian inference and CL and highlight the issue of model misspecification
which can lead to sub-optimal continual learning performance despite exact
inference. Furthermore, we discuss how task data imbalances can cause
forgetting. From these limitations, we argue that we need probabilistic models
of the continual learning generative process rather than relying on sequential
Bayesian inference over Bayesian neural network weights. In this vein, we also
propose a simple baseline called Prototypical Bayesian Continual Learning,
which is competitive with state-of-the-art Bayesian continual learning methods
on class incremental continual learning vision benchmarks.
Related papers
- BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition [78.70453964041718]
Current deep learning algorithms usually solve for the optimal classifier by emphimplicitly estimating the posterior probabilities.<n>This simple methodology has been proven effective for meticulously balanced academic benchmark datasets.<n>However, it is not applicable to the long-tailed data distributions in the real world.<n>This paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions.
arXiv Detail & Related papers (2025-06-29T15:12:50Z) - Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - A Metalearned Neural Circuit for Nonparametric Bayesian Inference [4.767884267554628]
Most applications of machine learning to classification assume a closed set of balanced classes.
This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution.
We present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network.
arXiv Detail & Related papers (2023-11-24T16:43:17Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Towards Out-of-Distribution Sequential Event Prediction: A Causal
Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events.
In practice, the next-event prediction models are trained with sequential data collected at one time.
We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z) - Challenges and Pitfalls of Bayesian Unlearning [6.931200003384123]
Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model.
Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data.
Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data.
arXiv Detail & Related papers (2022-07-07T11:24:50Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Posterior Meta-Replay for Continual Learning [4.319932092720977]
Continual Learning (CL) algorithms have recently received a lot of attention as they attempt to overcome the need to train with an i.i.d. sample from some unknown target data distribution.
We study principled ways to tackle the CL problem by adopting a Bayesian perspective and focus on continually learning a task-specific posterior distribution.
arXiv Detail & Related papers (2021-03-01T17:08:35Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.