Continual Learning via Sequential Function-Space Variational Inference
- URL: http://arxiv.org/abs/2312.17210v1
- Date: Thu, 28 Dec 2023 18:44:32 GMT
- Title: Continual Learning via Sequential Function-Space Variational Inference
- Authors: Tim G. J. Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh,
Yarin Gal
- Abstract summary: We propose an objective derived by formulating continual learning as sequential function-space variational inference.
Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions.
We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods.
- Score: 65.96686740015902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential Bayesian inference over predictive functions is a natural
framework for continual learning from streams of data. However, applying it to
neural networks has proved challenging in practice. Addressing the drawbacks of
existing techniques, we propose an optimization objective derived by
formulating continual learning as sequential function-space variational
inference. In contrast to existing methods that regularize neural network
parameters directly, this objective allows parameters to vary widely during
training, enabling better adaptation to new tasks. Compared to objectives that
directly regularize neural network predictions, the proposed objective allows
for more flexible variational distributions and more effective regularization.
We demonstrate that, across a range of task sequences, neural networks trained
via sequential function-space variational inference achieve better predictive
accuracy than networks trained with related methods while depending less on
maintaining a set of representative points from previous tasks.
Related papers
- High-Fidelity Transfer of Functional Priors for Wide Bayesian Neural Networks by Learning Activations [1.0468715529145969]
We show how trainable activations can accommodate complex function-space priors on BNNs.
We discuss critical learning challenges, including identifiability, loss construction, and symmetries.
Our empirical findings demonstrate that even BNNs with a single wide hidden layer, can effectively achieve high-fidelity function-space priors.
arXiv Detail & Related papers (2024-10-21T08:42:10Z) - Tractable Function-Space Variational Inference in Bayesian Neural
Networks [72.97620734290139]
A popular approach for estimating the predictive uncertainty of neural networks is to define a prior distribution over the network parameters.
We propose a scalable function-space variational inference method that allows incorporating prior information.
We show that the proposed method leads to state-of-the-art uncertainty estimation and predictive performance on a range of prediction tasks.
arXiv Detail & Related papers (2023-12-28T18:33:26Z) - Function-Space Regularization in Neural Networks: A Probabilistic
Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.
We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z) - Continual Learning with Pretrained Backbones by Tuning in the Input
Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks.
We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - Natural continual learning: success is a journey, not (just) a
destination [9.462808515258464]
Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent.
Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs.
The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
arXiv Detail & Related papers (2021-06-15T12:24:53Z) - Efficient Variational Inference for Sparse Deep Learning with
Theoretical Guarantee [20.294908538266867]
Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks.
In this paper, we train sparse deep neural networks with a fully Bayesian treatment under spike-and-slab priors.
We develop a set of computationally efficient variational inferences via continuous relaxation of Bernoulli distribution.
arXiv Detail & Related papers (2020-11-15T03:27:54Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space.
We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.