Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet
Process
- URL: http://arxiv.org/abs/2108.12278v1
- Date: Wed, 25 Aug 2021 21:06:20 GMT
- Title: Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet
Process
- Authors: Fei Ye and Adrian G. Bors
- Abstract summary: Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks.
We perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data.
Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model.
- Score: 15.350366047108103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research efforts in lifelong learning propose to grow a mixture of
models to adapt to an increasing number of tasks. The proposed methodology
shows promising results in overcoming catastrophic forgetting. However, the
theory behind these successful models is still not well understood. In this
paper, we perform the theoretical analysis for lifelong learning models by
deriving the risk bounds based on the discrepancy distance between the
probabilistic representation of data generated by the model and that
corresponding to the target dataset. Inspired by the theoretical analysis, we
introduce a new lifelong learning approach, namely the Lifelong Infinite
Mixture (LIMix) model, which can automatically expand its network architectures
or choose an appropriate component to adapt its parameters for learning a new
task, while preserving its previously learnt information. We propose to
incorporate the knowledge by means of Dirichlet processes by using a gating
mechanism which computes the dependence between the knowledge learnt previously
and stored in each component, and a new set of data. Besides, we train a
compact Student model which can accumulate cross-domain representations over
time and make quick inferences. The code is available at
https://github.com/dtuzi123/Lifelong-infinite-mixture-model.
Related papers
- MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.68829963458408]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models.
The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters.
MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Learning an evolved mixture model for task-free continual learning [11.540150938141034]
We address the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information.
We introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload.
arXiv Detail & Related papers (2022-07-11T16:01:27Z) - Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks [53.09649785009528]
In this paper, we explore a paradigm that does not require training to obtain new models.
Similar to the birth of CNN inspired by receptive fields in the biological visual system, we propose Model Disassembling and Assembling.
For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task.
arXiv Detail & Related papers (2022-03-25T05:27:28Z) - Blockwise Sequential Model Learning for Partially Observable
Reinforcement Learning [14.642266310020505]
This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems.
The proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization.
Numerical results show that the proposed method significantly outperforms previous methods in various partially observable environments.
arXiv Detail & Related papers (2021-12-10T05:38:24Z) - Fully differentiable model discovery [0.0]
We propose an approach by combining neural network based surrogates with Sparse Bayesian Learning.
Our work expands PINNs to various types of neural network architectures, and connects neural network-based surrogates to the rich field of Bayesian parameter inference.
arXiv Detail & Related papers (2021-06-09T08:11:23Z) - Streaming Graph Neural Networks via Continual Learning [31.810308087441445]
Graph neural networks (GNNs) have achieved strong performance in various applications.
In this paper, we propose a streaming GNN model based on continual learning.
We show that our model can efficiently update model parameters and achieve comparable performance to model retraining.
arXiv Detail & Related papers (2020-09-23T06:52:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.