Sub-network Discovery and Soft-masking for Continual Learning of Mixed
Tasks
- URL: http://arxiv.org/abs/2310.09436v1
- Date: Fri, 13 Oct 2023 23:00:39 GMT
- Title: Sub-network Discovery and Soft-masking for Continual Learning of Mixed
Tasks
- Authors: Zixuan Ke, Bing Liu, Wenhan Xiong, Asli Celikyilmaz, Haoran Li
- Abstract summary: This paper proposes a new CL method to overcome CF and/or limited KT.
It overcomes CF by isolating the knowledge of each task via discovering a subnetwork for it.
A soft-masking mechanism is also proposed to preserve the previous knowledge and to enable the new task to leverage the past knowledge to achieve KT.
- Score: 46.96149283885802
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Continual learning (CL) has two main objectives: preventing catastrophic
forgetting (CF) and encouraging knowledge transfer (KT). The existing
literature mainly focused on overcoming CF. Some work has also been done on KT
when the tasks are similar. To our knowledge, only one method has been proposed
to learn a sequence of mixed tasks. However, these techniques still suffer from
CF and/or limited KT. This paper proposes a new CL method to achieve both. It
overcomes CF by isolating the knowledge of each task via discovering a
subnetwork for it. A soft-masking mechanism is also proposed to preserve the
previous knowledge and to enable the new task to leverage the past knowledge to
achieve KT. Experiments using classification, generation, information
extraction, and their mixture (i.e., heterogeneous tasks) show that the
proposed method consistently outperforms strong baselines.
Related papers
- Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing [59.480951050911436]
We present KCQRL, a framework for automated knowledge concept annotation and question representation learning.
We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets.
arXiv Detail & Related papers (2024-10-02T16:37:19Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge.
Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques.
This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z) - Learning to Prompt Knowledge Transfer for Open-World Continual Learning [13.604171414847531]
Pro-KT is a novel prompt-enhanced knowledge transfer model for Open-world Continual Learning.
Pro-KT includes two key components: (1) a prompt bank to encode and transfer both task-generic and task-specific knowledge, and (2) a task-aware open-set boundary to identify unknowns in the new tasks.
arXiv Detail & Related papers (2023-12-22T11:53:31Z) - Parameter-Level Soft-Masking for Continual Learning [12.290968171255349]
A novel technique (called SPG) is proposed that soft-masks parameter updating in training based on the importance of each parameter to old tasks.
To our knowledge, this is the first work that soft-masks a model at the parameter-level for continual learning.
arXiv Detail & Related papers (2023-06-26T15:35:27Z) - Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment
Classification Tasks [22.28374603976649]
This paper studies continual learning of a sequence of aspect sentiment classification (ASC) tasks.
A CL system that incrementally learns a sequence of ASC tasks should address the following two issues.
A novel capsule network based model called B-CL is proposed to address these issues.
arXiv Detail & Related papers (2021-12-06T02:46:06Z) - Achieving Forgetting Prevention and Knowledge Transfer in Continual
Learning [22.83874590642864]
Continual learning learns a sequence of tasks with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT)
Most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT.
This paper proposes a novel model called CTR to solve these problems.
arXiv Detail & Related papers (2021-12-05T23:13:13Z) - Knowledge-Aware Meta-learning for Low-Resource Text Classification [87.89624590579903]
This paper studies a low-resource text classification problem and bridges the gap between meta-training and meta-testing tasks.
We propose KGML to introduce additional representation for each sentence learned from the extracted sentence-specific knowledge graph.
arXiv Detail & Related papers (2021-09-10T07:20:43Z) - Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks.
We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI)
In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z) - Knowledge Transfer via Dense Cross-Layer Mutual-Distillation [24.24969126783315]
We propose Dense Cross-layer Mutual-distillation (DCM) in which the teacher and student networks are trained collaboratively from scratch.
To boost KT performance, we introduce dense bidirectional KD operations between the layers with appended classifiers.
We test our method on a variety of KT tasks, showing its superiorities over related methods.
arXiv Detail & Related papers (2020-08-18T09:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.