Mixture-of-Variational-Experts for Continual Learning
- URL: http://arxiv.org/abs/2110.12667v1
- Date: Mon, 25 Oct 2021 06:32:06 GMT
- Title: Mixture-of-Variational-Experts for Continual Learning
- Authors: Heinke Hihn and Daniel A. Braun
- Abstract summary: We propose an optimality principle that facilitates a trade-off between learning and forgetting.
We propose a neural network layer for continual learning, called Mixture-of-Variational-Experts (MoVE)
Our experiments on variants of the MNIST and CIFAR10 datasets demonstrate the competitive performance of MoVE layers.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One significant shortcoming of machine learning is the poor ability of models
to solve new problems quicker and without forgetting acquired knowledge. To
better understand this issue, continual learning has emerged to systematically
investigate learning protocols where the model sequentially observes samples
generated by a series of tasks. First, we propose an optimality principle that
facilitates a trade-off between learning and forgetting. We derive this
principle from an information-theoretic formulation of bounded rationality and
show its connections to other continual learning methods. Second, based on this
principle, we propose a neural network layer for continual learning, called
Mixture-of-Variational-Experts (MoVE), that alleviates forgetting while
enabling the beneficial transfer of knowledge to new tasks. Our experiments on
variants of the MNIST and CIFAR10 datasets demonstrate the competitive
performance of MoVE layers when compared to state-of-the-art approaches.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - ReconBoost: Boosting Can Achieve Modality Reconcilement [89.4377895465204]
We study the modality-alternating learning paradigm to achieve reconcilement.
We propose a new method called ReconBoost to update a fixed modality each time.
We show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others.
arXiv Detail & Related papers (2024-05-15T13:22:39Z) - Online Continual Learning via the Knowledge Invariant and Spread-out
Properties [4.109784267309124]
Key challenge in continual learning is catastrophic forgetting.
We propose a new method, named Online Continual Learning via the Knowledge Invariant and Spread-out Properties (OCLKISP)
We empirically evaluate our proposed method on four popular benchmarks for continual learning: Split CIFAR 100, Split SVHN, Split CUB200 and Split Tiny-Image-Net.
arXiv Detail & Related papers (2023-02-02T04:03:38Z) - Progressive Learning without Forgetting [8.563323015260709]
We focus on two challenging problems in the paradigm of Continual Learning (CL)
PLwF introduces functions from previous tasks to construct a knowledge space that contains the most reliable knowledge on each task.
Credit assignment controls the tug-of-war dynamics by removing gradient conflict through projection.
In comparison with other CL methods, we report notably better results even without relying on any raw data.
arXiv Detail & Related papers (2022-11-28T10:53:14Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Relational Experience Replay: Continual Learning by Adaptively Tuning
Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff.
ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z) - Learning Invariant Representation for Continual Learning [5.979373021392084]
A key challenge in Continual learning is catastrophically forgetting previously learned tasks when the agent faces a new one.
We propose a new pseudo-rehearsal-based method, named learning Invariant Representation for Continual Learning (IRCL)
Disentangling the shared invariant representation helps to learn continually a sequence of tasks, while being more robust to forgetting and having better knowledge transfer.
arXiv Detail & Related papers (2021-01-15T15:12:51Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Dynamic Knowledge embedding and tracing [18.717482292051788]
We propose a novel approach to knowledge tracing that combines techniques from matrix factorization with recent progress in recurrent neural networks (RNNs)
The proposed emphDynEmb framework enables the tracking of student knowledge even without the concept/skill tag information.
arXiv Detail & Related papers (2020-05-18T21:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.