Preventing Catastrophic Forgetting in Continual Learning of New Natural
Language Tasks
- URL: http://arxiv.org/abs/2302.11074v1
- Date: Wed, 22 Feb 2023 00:18:25 GMT
- Title: Preventing Catastrophic Forgetting in Continual Learning of New Natural
Language Tasks
- Authors: Sudipta Kar, Giuseppe Castellucci, Simone Filice, Shervin Malmasi,
Oleg Rokhlenko
- Abstract summary: Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model.
As systems usually evolve over time, adding a new task to an existing MTL model usually requires retraining the model from scratch on all the tasks.
In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks.
- Score: 17.879087904904935
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing
as a standard technique for learning multiple related tasks in one model.
Training an MTL model requires having the training data for all tasks available
at the same time. As systems usually evolve over time, (e.g., to support new
functionalities), adding a new task to an existing MTL model usually requires
retraining the model from scratch on all the tasks and this can be
time-consuming and computationally expensive. Moreover, in some scenarios, the
data used to train the original training may be no longer available, for
example, due to storage or privacy concerns. In this paper, we approach the
problem of incrementally expanding MTL models' capability to solve new tasks
over time by distilling the knowledge of an already trained model on n tasks
into a new one for solving n+1 tasks. To avoid catastrophic forgetting, we
propose to exploit unlabeled data from the same distributions of the old tasks.
Our experiments on publicly available benchmarks show that such a technique
dramatically benefits the distillation by preserving the already acquired
knowledge (i.e., preventing up to 20% performance drops on old tasks) while
obtaining good performance on the incrementally added tasks. Further, we also
show that our approach is beneficial in practical settings by using data from a
leading voice assistant.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Exploring intra-task relations to improve meta-learning algorithms [1.223779595809275]
We aim to exploit external knowledge of task relations to improve training stability via effective mini-batching of tasks.
We hypothesize that selecting a diverse set of tasks in a mini-batch will lead to a better estimate of the full gradient and hence will lead to a reduction of noise in training.
arXiv Detail & Related papers (2023-12-27T15:33:52Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Lifelong Learning of Few-shot Learners across NLP Tasks [45.273018249235705]
We study the challenge of lifelong learning to few-shot learn over a sequence of diverse NLP tasks.
We propose a continual meta-learning approach which learns to generate adapter weights from a few examples.
We demonstrate our approach preserves model performance over training tasks and leads to positive knowledge transfer when the future tasks are learned.
arXiv Detail & Related papers (2021-04-18T10:41:56Z) - Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting.
We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner.
Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z) - Learning Adaptable Policy via Meta-Adversarial Inverse Reinforcement
Learning for Decision-making Tasks [2.1485350418225244]
We build an adaptable imitation learning model based on the integration of Meta-learning and Adversarial Inverse Reinforcement Learning.
We exploit the adversarial learning and inverse reinforcement learning mechanisms to learn policies and reward functions simultaneously from available training tasks.
arXiv Detail & Related papers (2021-03-23T17:16:38Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning
in NLP Using Fewer Parameters & Less Data [5.689320790746046]
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks.
However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer.
We propose a novel Transformer architecture consisting of a new conditional attention mechanism and a set of task-conditioned modules.
arXiv Detail & Related papers (2020-09-19T02:04:34Z) - Continual Learning Using Multi-view Task Conditional Neural Networks [6.27221711890162]
Conventional deep learning models have limited capacity in learning multiple tasks sequentially.
We propose Multi-view Task Conditional Neural Networks (Mv-TCNN) that does not require to known the reoccurring tasks in advance.
arXiv Detail & Related papers (2020-05-08T01:03:30Z) - iTAML: An Incremental Task-Agnostic Meta-learning Approach [123.10294801296926]
Humans can continuously learn new knowledge as their experience grows.
Previous learning in deep neural networks can quickly fade out when they are trained on a new task.
We introduce a novel meta-learning approach that seeks to maintain an equilibrium between all encountered tasks.
arXiv Detail & Related papers (2020-03-25T21:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.