Forget-free Continual Learning with Soft-Winning SubNetworks
- URL: http://arxiv.org/abs/2303.14962v1
- Date: Mon, 27 Mar 2023 07:53:23 GMT
- Title: Forget-free Continual Learning with Soft-Winning SubNetworks
- Authors: Haeyong Kang, Jaehong Yoon, Sultan Rizky Madjid, Sung Ju Hwang, Chang
D. Yoo
- Abstract summary: We investigate two proposed continual learning methods which sequentially learn and select adaptive binary- (WSN) and non-binary Soft-Subnetworks (SoftNet) for each task.
WSN and SoftNet jointly learn the regularized model weights and task-adaptive non-binary masks ofworks associated with each task.
In Task Incremental Learning (TIL), binary masks spawned per winning ticket are encoded into one N-bit binary digit mask, then compressed using Huffman coding for a sub-linear increase in network capacity to the number of tasks.
- Score: 67.0373924836107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by Regularized Lottery Ticket Hypothesis (RLTH), which states that
competitive smooth (non-binary) subnetworks exist within a dense network in
continual learning tasks, we investigate two proposed architecture-based
continual learning methods which sequentially learn and select adaptive binary-
(WSN) and non-binary Soft-Subnetworks (SoftNet) for each task. WSN and SoftNet
jointly learn the regularized model weights and task-adaptive non-binary masks
of subnetworks associated with each task whilst attempting to select a small
set of weights to be activated (winning ticket) by reusing weights of the prior
subnetworks. Our proposed WSN and SoftNet are inherently immune to catastrophic
forgetting as each selected subnetwork model does not infringe upon other
subnetworks in Task Incremental Learning (TIL). In TIL, binary masks spawned
per winning ticket are encoded into one N-bit binary digit mask, then
compressed using Huffman coding for a sub-linear increase in network capacity
to the number of tasks. Surprisingly, in the inference step, SoftNet generated
by injecting small noises to the backgrounds of acquired WSN (holding the
foregrounds of WSN) provides excellent forward transfer power for future tasks
in TIL. SoftNet shows its effectiveness over WSN in regularizing parameters to
tackle the overfitting, to a few examples in Few-shot Class Incremental
Learning (FSCIL).
Related papers
- Soft-TransFormers for Continual Learning [27.95463327680678]
We propose a novel fully fine-tuned continual learning (CL) method referred to as Soft-TransFormers (Soft-TF)
Soft-TF sequentially learns and selects an optimal soft-network or subnetwork for each task.
In inference, the identified task-adaptive network of Soft-TF masks the parameters of the pre-trained network.
arXiv Detail & Related papers (2024-11-25T03:52:47Z) - Continual Learning: Forget-free Winning Subnetworks for Video Representations [75.40220771931132]
Winning Subnetwork (WSN) in terms of task performance is considered for various continual learning tasks.
It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios.
The use of Fourier Subneural Operator (FSO) within WSN is considered for Video Incremental Learning (VIL)
arXiv Detail & Related papers (2023-12-19T09:11:49Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net)
IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z) - On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet).
Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones.
We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z) - Robust Continual Learning through a Comprehensively Progressive Bayesian
Neural Network [1.4695979686066065]
This work proposes a comprehensively progressive Bayesian neural network for robust continual learning of a sequence of tasks.
It starts with the contention that similar tasks should have the same number of total network resources, to ensure fair representation of all tasks.
The weights that are redundant at the end of training each task are also pruned through re-initialization, in order to be efficiently utilized in the subsequent task.
arXiv Detail & Related papers (2022-02-27T14:19:50Z) - Self-Supervised Learning for Binary Networks by Joint Classifier
Training [11.612308609123566]
We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.