Learning to Continually Learn with the Bayesian Principle
- URL: http://arxiv.org/abs/2405.18758v1
- Date: Wed, 29 May 2024 04:53:31 GMT
- Title: Learning to Continually Learn with the Bayesian Principle
- Authors: Soochan Lee, Hyeonseong Jeon, Jaehyeon Son, Gunhee Kim,
- Abstract summary: In this work, we adopt the meta-learning paradigm to combine the strong representational power of neural networks and simple statistical models' robustness to forgetting.
Since the neural networks remain fixed during continual learning, they are protected from catastrophic forgetting.
- Score: 36.75558255534538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the present era of deep learning, continual learning research is mainly focused on mitigating forgetting when training a neural network with stochastic gradient descent on a non-stationary stream of data. On the other hand, in the more classical literature of statistical machine learning, many models have sequential Bayesian update rules that yield the same learning outcome as the batch training, i.e., they are completely immune to catastrophic forgetting. However, they are often overly simple to model complex real-world data. In this work, we adopt the meta-learning paradigm to combine the strong representational power of neural networks and simple statistical models' robustness to forgetting. In our novel meta-continual learning framework, continual learning takes place only in statistical models via ideal sequential Bayesian update rules, while neural networks are meta-learned to bridge the raw data and the statistical models. Since the neural networks remain fixed during continual learning, they are protected from catastrophic forgetting. This approach not only achieves significantly improved performance but also exhibits excellent scalability. Since our approach is domain-agnostic and model-agnostic, it can be applied to a wide range of problems and easily integrated with existing model architectures.
Related papers
- Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens [14.424050371971354]
We introduce and study the bilinear sequence regression (BSR) as one of the most basic models for sequences of tokens.
We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression.
arXiv Detail & Related papers (2024-10-24T15:44:03Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for
Adaptive Learning [0.0]
A new type of neural network is presented using the rapid neural network paradigm.
It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy.
arXiv Detail & Related papers (2023-09-12T22:34:34Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural
Networks [12.525959293825318]
We introduce Learn, Unlearn, and Relearn (LURE) an online learning paradigm for deep neural networks (DNNs)
LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model, and the relearning phase, which emphasizes learning on generalizable features.
We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings.
arXiv Detail & Related papers (2023-03-18T16:45:54Z) - Online Evolutionary Neural Architecture Search for Multivariate
Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm.
ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks.
Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Model-Based Robust Deep Learning: Generalizing to Natural,
Out-of-Distribution Data [104.69689574851724]
We propose a paradigm shift from perturbation-based adversarial robustness toward model-based robust deep learning.
Our objective is to provide general training algorithms that can be used to train deep neural networks to be robust against natural variation in data.
arXiv Detail & Related papers (2020-05-20T13:46:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.