Multi-Domain Multi-Task Rehearsal for Lifelong Learning
- URL: http://arxiv.org/abs/2012.07236v1
- Date: Mon, 14 Dec 2020 03:36:25 GMT
- Title: Multi-Domain Multi-Task Rehearsal for Lifelong Learning
- Authors: Fan Lyu, Shuai Wang, Wei Feng, Zihan Ye, Fuyuan Hu, Song Wang
- Abstract summary: We propose Multi-Domain Multi-Task (MDMT) rehearsal to train the old tasks and new task parallelly and equally to break the isolation among tasks.
Specifically, a two-level angular margin loss is proposed to encourage the intra-class/task compactness and inter-class/task discrepancy.
In addition, to further address domain shift of the old tasks, we propose an optional episodic distillation loss on the memory to anchor the knowledge for each old task.
- Score: 16.02037222114105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rehearsal, seeking to remind the model by storing old knowledge in lifelong
learning, is one of the most effective ways to mitigate catastrophic
forgetting, i.e., biased forgetting of previous knowledge when moving to new
tasks. However, the old tasks of the most previous rehearsal-based methods
suffer from the unpredictable domain shift when training the new task. This is
because these methods always ignore two significant factors. First, the Data
Imbalance between the new task and old tasks that makes the domain of old tasks
prone to shift. Second, the Task Isolation among all tasks will make the domain
shift toward unpredictable directions; To address the unpredictable domain
shift, in this paper, we propose Multi-Domain Multi-Task (MDMT) rehearsal to
train the old tasks and new task parallelly and equally to break the isolation
among tasks. Specifically, a two-level angular margin loss is proposed to
encourage the intra-class/task compactness and inter-class/task discrepancy,
which keeps the model from domain chaos. In addition, to further address domain
shift of the old tasks, we propose an optional episodic distillation loss on
the memory to anchor the knowledge for each old task. Experiments on benchmark
datasets validate the proposed approach can effectively mitigate the
unpredictable domain shift.
Related papers
- Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Clustering-based Domain-Incremental Learning [4.835091081509403]
Key challenge in continual learning is the so-called "catastrophic forgetting problem"
We propose an online clustering-based approach on a dynamically updated finite pool of samples or gradients.
We demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-09-21T13:49:05Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks.
In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks.
We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z) - Interval Bound Interpolation for Few-shot Learning with Few Tasks [15.85259386116784]
Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks to unseen tasks with a limited amount of labeled data.
We introduce the notion of interval bounds from the provably robust training literature to few-shot learning.
We then use a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds.
arXiv Detail & Related papers (2022-04-07T15:29:27Z) - TRGP: Trust Region Gradient Projection for Continual Learning [39.99577526417276]
Catastrophic forgetting is one of the major challenges in continual learning.
We propose Trust Region Gradient Projection to facilitate the forward knowledge transfer.
Our approach achieves significant improvement over related state-of-the-art methods.
arXiv Detail & Related papers (2022-02-07T04:21:54Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z) - TAG: Task-based Accumulated Gradients for Lifelong learning [21.779858050277475]
We propose a task-aware system that adapts the learning rate based on the relatedness among tasks.
We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer.
arXiv Detail & Related papers (2021-05-11T16:10:32Z) - Beneficial Perturbation Network for designing general adaptive
artificial intelligence systems [14.226973149346886]
We propose a new type of deep neural network with extra, out-of-network, task-dependent biasing units to accommodate dynamic situations.
Our approach is memory-efficient and parameter-efficient, can accommodate many tasks, and achieves state-of-the-art performance across different tasks and domains.
arXiv Detail & Related papers (2020-09-27T01:28:10Z) - Learning Task-oriented Disentangled Representations for Unsupervised
Domain Adaptation [165.61511788237485]
Unsupervised domain adaptation (UDA) aims to address the domain-shift problem between a labeled source domain and an unlabeled target domain.
We propose a dynamic task-oriented disentangling network (DTDN) to learn disentangled representations in an end-to-end fashion for UDA.
arXiv Detail & Related papers (2020-07-27T01:21:18Z) - Ternary Feature Masks: zero-forgetting for task-incremental learning [68.34518408920661]
We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
arXiv Detail & Related papers (2020-01-23T18:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.