Parameter-Level Soft-Masking for Continual Learning
- URL: http://arxiv.org/abs/2306.14775v1
- Date: Mon, 26 Jun 2023 15:35:27 GMT
- Title: Parameter-Level Soft-Masking for Continual Learning
- Authors: Tatsuya Konishi, Mori Kurokawa, Chihiro Ono, Zixuan Ke, Gyuhak Kim,
Bing Liu
- Abstract summary: A novel technique (called SPG) is proposed that soft-masks parameter updating in training based on the importance of each parameter to old tasks.
To our knowledge, this is the first work that soft-masks a model at the parameter-level for continual learning.
- Score: 12.290968171255349
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Existing research on task incremental learning in continual learning has
primarily focused on preventing catastrophic forgetting (CF). Although several
techniques have achieved learning with no CF, they attain it by letting each
task monopolize a sub-network in a shared network, which seriously limits
knowledge transfer (KT) and causes over-consumption of the network capacity,
i.e., as more tasks are learned, the performance deteriorates. The goal of this
paper is threefold: (1) overcoming CF, (2) encouraging KT, and (3) tackling the
capacity problem. A novel technique (called SPG) is proposed that soft-masks
(partially blocks) parameter updating in training based on the importance of
each parameter to old tasks. Each task still uses the full network, i.e., no
monopoly of any part of the network by any task, which enables maximum KT and
reduction in capacity usage. To our knowledge, this is the first work that
soft-masks a model at the parameter-level for continual learning. Extensive
experiments demonstrate the effectiveness of SPG in achieving all three
objectives. More notably, it attains significant transfer of knowledge not only
among similar tasks (with shared knowledge) but also among dissimilar tasks
(with little shared knowledge) while mitigating CF.
Related papers
- Order parameters and phase transitions of continual learning in deep neural networks [6.349503549199403]
Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge.
CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks.
We present a statistical-mechanics theory of CL in deep, wide NNs, which characterizes the network's input-output mapping as it learns a sequence of tasks.
arXiv Detail & Related papers (2024-07-14T20:22:36Z) - Investigating the Impact of Weight Sharing Decisions on Knowledge
Transfer in Continual Learning [7.25130576615102]
Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks.
This paper investigates how different sharing decisions affect the Forward Knowledge Transfer (FKT) between tasks.
arXiv Detail & Related papers (2023-11-16T02:06:23Z) - Sub-network Discovery and Soft-masking for Continual Learning of Mixed
Tasks [46.96149283885802]
This paper proposes a new CL method to overcome CF and/or limited KT.
It overcomes CF by isolating the knowledge of each task via discovering a subnetwork for it.
A soft-masking mechanism is also proposed to preserve the previous knowledge and to enable the new task to leverage the past knowledge to achieve KT.
arXiv Detail & Related papers (2023-10-13T23:00:39Z) - Task-Attentive Transformer Architecture for Continual Learning of
Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks.
We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks.
Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z) - Factorizing Knowledge in Neural Networks [65.57381498391202]
We propose a novel knowledge-transfer task, Knowledge Factorization(KF)
KF aims to decompose it into several factor networks, each of which handles only a dedicated task and maintains task-specific knowledge factorized from the source network.
We introduce an information-theoretic objective, InfoMax-Bottleneck(IMB), to carry out KF by optimizing the mutual information between the learned representations and input.
arXiv Detail & Related papers (2022-07-04T09:56:49Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Defeating Catastrophic Forgetting via Enhanced Orthogonal Weights
Modification [8.091211518374598]
We show that of the weight gradient of a new learning task is determined by both the input space of the new task and the weight space of the previous learned tasks sequentially.
We propose a new efficient and effective continual learning method EOWM via enhanced OWM.
arXiv Detail & Related papers (2021-11-19T07:40:48Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z) - Federated Continual Learning with Weighted Inter-client Transfer [79.93004004545736]
We propose a novel federated continual learning framework, Federated Weighted Inter-client Transfer (FedWeIT)
FedWeIT decomposes the network weights into global federated parameters and sparse task-specific parameters, and each client receives selective knowledge from other clients.
We validate our FedWeIT against existing federated learning and continual learning methods, and our model significantly outperforms them with a large reduction in the communication cost.
arXiv Detail & Related papers (2020-03-06T13:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.