Exclusive Supermask Subnetwork Training for Continual Learning
- URL: http://arxiv.org/abs/2210.10209v2
- Date: Wed, 5 Jul 2023 16:57:43 GMT
- Title: Exclusive Supermask Subnetwork Training for Continual Learning
- Authors: Prateek Yadav, Mohit Bansal
- Abstract summary: Continual Learning (CL) methods focus on accumulating knowledge over time while avoiding forgetting.
We propose ExSSNeT (Exclusive Supermask SubNEtwork Training), that performs exclusive and non-overlapping subnetwork weight training.
We demonstrate that ExSSNeT outperforms strong previous methods on both NLP and Vision domains while preventing forgetting.
- Score: 95.5186263127864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Learning (CL) methods focus on accumulating knowledge over time
while avoiding catastrophic forgetting. Recently, Wortsman et al. (2020)
proposed a CL method, SupSup, which uses a randomly initialized, fixed base
network (model) and finds a supermask for each new task that selectively keeps
or removes each weight to produce a subnetwork. They prevent forgetting as the
network weights are not being updated. Although there is no forgetting, the
performance of SupSup is sub-optimal because fixed weights restrict its
representational power. Furthermore, there is no accumulation or transfer of
knowledge inside the model when new tasks are learned. Hence, we propose
ExSSNeT (Exclusive Supermask SubNEtwork Training), that performs exclusive and
non-overlapping subnetwork weight training. This avoids conflicting updates to
the shared weights by subsequent tasks to improve performance while still
preventing forgetting. Furthermore, we propose a novel KNN-based Knowledge
Transfer (KKT) module that utilizes previously acquired knowledge to learn new
tasks better and faster. We demonstrate that ExSSNeT outperforms strong
previous methods on both NLP and Vision domains while preventing forgetting.
Moreover, ExSSNeT is particularly advantageous for sparse masks that activate
2-10% of the model parameters, resulting in an average improvement of 8.3% over
SupSup. Furthermore, ExSSNeT scales to a large number of tasks (100). Our code
is available at https://github.com/prateeky2806/exessnet.
Related papers
- MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters [19.358670728803336]
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes.
To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size.
This naive approach falls short in practice as it brings too much noise to the growing process.
arXiv Detail & Related papers (2023-11-07T11:37:08Z) - IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net)
IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z) - Forget-free Continual Learning with Soft-Winning SubNetworks [67.0373924836107]
We investigate two proposed continual learning methods which sequentially learn and select adaptive binary- (WSN) and non-binary Soft-Subnetworks (SoftNet) for each task.
WSN and SoftNet jointly learn the regularized model weights and task-adaptive non-binary masks ofworks associated with each task.
In Task Incremental Learning (TIL), binary masks spawned per winning ticket are encoded into one N-bit binary digit mask, then compressed using Huffman coding for a sub-linear increase in network capacity to the number of tasks.
arXiv Detail & Related papers (2023-03-27T07:53:23Z) - Continual Prune-and-Select: Class-incremental learning with specialized
subnetworks [66.4795381419701]
Continual-Prune-and-Select (CP&S) is capable of sequentially learning 10 tasks from ImageNet-1000 keeping an accuracy around 94% with negligible forgetting.
This is a first-of-its-kind result in class-incremental learning.
arXiv Detail & Related papers (2022-08-09T10:49:40Z) - Incremental Task Learning with Incremental Rank Updates [20.725181015069435]
We propose a new incremental task learning framework based on low-rank factorization.
We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting.
arXiv Detail & Related papers (2022-07-19T05:21:14Z) - Defeating Catastrophic Forgetting via Enhanced Orthogonal Weights
Modification [8.091211518374598]
We show that of the weight gradient of a new learning task is determined by both the input space of the new task and the weight space of the previous learned tasks sequentially.
We propose a new efficient and effective continual learning method EOWM via enhanced OWM.
arXiv Detail & Related papers (2021-11-19T07:40:48Z) - KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z) - Supermasks in Superposition [70.5780643117055]
We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting.
Our approach uses a randomly, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance.
In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks.
arXiv Detail & Related papers (2020-06-26T03:16:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.