On Sequential Bayesian Inference for Continual Learning
- URL: http://arxiv.org/abs/2301.01828v2
- Date: Sun, 9 Jul 2023 18:57:45 GMT
- Title: On Sequential Bayesian Inference for Continual Learning
- Authors: Samuel Kessler, Adam Cobb, Tim G. J. Rudner, Stefan Zohren, Stephen J.
Roberts
- Abstract summary: We revisit sequential Bayesian inference and test whether having access to the true posterior is guaranteed to prevent catastrophic forgetting in neural networks.
We find that this approach fails to prevent catastrophic forgetting demonstrating the difficulty in performing sequential Bayesian inference in neural networks.
We propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with state-of-the-art Bayesian continual learning methods.
- Score: 17.257360928583974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential Bayesian inference can be used for continual learning to prevent
catastrophic forgetting of past tasks and provide an informative prior when
learning new tasks. We revisit sequential Bayesian inference and test whether
having access to the true posterior is guaranteed to prevent catastrophic
forgetting in Bayesian neural networks. To do this we perform sequential
Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as
a prior for new tasks by fitting a density estimator on Hamiltonian Monte Carlo
samples. We find that this approach fails to prevent catastrophic forgetting
demonstrating the difficulty in performing sequential Bayesian inference in
neural networks. From there we study simple analytical examples of sequential
Bayesian inference and CL and highlight the issue of model misspecification
which can lead to sub-optimal continual learning performance despite exact
inference. Furthermore, we discuss how task data imbalances can cause
forgetting. From these limitations, we argue that we need probabilistic models
of the continual learning generative process rather than relying on sequential
Bayesian inference over Bayesian neural network weights. In this vein, we also
propose a simple baseline called Prototypical Bayesian Continual Learning,
which is competitive with state-of-the-art Bayesian continual learning methods
on class incremental continual learning vision benchmarks.
Related papers
- Bayesian Online Natural Gradient (BONG) [9.800443064368467]
We propose a novel approach to sequential Bayesian inference based on variational Bayes.
In the online setting, we do not need to add the KL term to regularize to the prior.
We prove this method recovers exact Bayesian inference if the model is conjugate.
arXiv Detail & Related papers (2024-05-30T04:27:36Z) - A Metalearned Neural Circuit for Nonparametric Bayesian Inference [4.767884267554628]
Most applications of machine learning to classification assume a closed set of balanced classes.
This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution.
We present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network.
arXiv Detail & Related papers (2023-11-24T16:43:17Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Towards Out-of-Distribution Sequential Event Prediction: A Causal
Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events.
In practice, the next-event prediction models are trained with sequential data collected at one time.
We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z) - Challenges and Pitfalls of Bayesian Unlearning [6.931200003384123]
Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model.
Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data.
Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data.
arXiv Detail & Related papers (2022-07-07T11:24:50Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Transformers Can Do Bayesian Inference [28.936428431504165]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need
in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility.
Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support.
With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention.
This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.