Related papers: HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning

HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning

URL: http://arxiv.org/abs/2310.00113v4
Date: Fri, 24 May 2024 12:49:30 GMT
Title: HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning
Authors: Kamil Książek, Przemysław Spurek,
Abstract summary: We propose a method called HyperMask, which dynamically filters a target network depending on the CL task. Due to the lottery ticket hypothesis, we can use a single network with weighted forgettings.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. Many continual learning (CL) strategies are trying to overcome this problem. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that, in practice, the hypernetwork can produce completely different architectures for subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a whole network. In the paper, we propose a method called HyperMask, which dynamically filters a target network depending on the CL task. The hypernetwork produces semi-binary masks to obtain dedicated target subnetworks. Moreover, due to the lottery ticket hypothesis, we can use a single network with weighted subnets. Depending on the task, the importance of some weights may be dynamically enhanced while others may be weakened. HyperMask achieves competitive results in several CL datasets and, in some scenarios, goes beyond the state-of-the-art scores, both with derived and unknown task identities.

Related papers

HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories [62.975803165786324]
We propose a method to train hypernetworks, without the need for any per-sample ground truth. Our key idea is to learn a Hypernetwork Field and estimate the entire trajectory of network weight training instead of simply its converged state.
arXiv Detail & Related papers (2024-12-22T14:37:10Z)
Magnitude Invariant Parametrizations Improve Hypernetwork Learning [0.0]
Hypernetworks are powerful neural networks that predict the parameters of another neural network. Training typically converges far more slowly than for non-hypernetwork models. We identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP)
arXiv Detail & Related papers (2023-04-15T22:18:29Z)
Forget-free Continual Learning with Soft-Winning SubNetworks [67.0373924836107]
We investigate two proposed continual learning methods which sequentially learn and select adaptive binary- (WSN) and non-binary Soft-Subnetworks (SoftNet) for each task. WSN and SoftNet jointly learn the regularized model weights and task-adaptive non-binary masks ofworks associated with each task. In Task Incremental Learning (TIL), binary masks spawned per winning ticket are encoded into one N-bit binary digit mask, then compressed using Huffman coding for a sub-linear increase in network capacity to the number of tasks.
arXiv Detail & Related papers (2023-03-27T07:53:23Z)
Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer) In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks. It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z)
Continual Learning with Dependency Preserving Hypernetworks [14.102057320661427]
An effective approach to address continual learning (CL) problems is to use hypernetworks which generate task dependent weights for a target network. We propose a novel approach that uses a dependency preserving hypernetwork to generate weights for the target network while also maintaining the parameter efficiency. In addition, we propose novel regularisation and network growth techniques for the RNN based hypernetwork to further improve the continual learning performance.
arXiv Detail & Related papers (2022-09-16T04:42:21Z)
On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet). Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones. We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training [55.43088293183165]
Recent studies show that pre-trained language models (PLMs) like BERT contain matchingworks that have similar transfer learning performance as the original PLM. In this paper, we find that the BERTworks have even more potential than these studies have shown. We train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork.
arXiv Detail & Related papers (2022-04-24T08:42:47Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Automatic Sparse Connectivity Learning for Neural Networks [4.875787559251317]
Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources. In this work, we propose a new automatic pruning method - Sparse Connectivity Learning. Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
arXiv Detail & Related papers (2022-01-13T15:12:48Z)
Hypernetwork Dismantling via Deep Reinforcement Learning [1.4877837830677472]
We formulate the hypernetwork dismantling problem as a node sequence decision problem. We propose a deep reinforcement learning-based hypernetwork dismantling framework. Experimental results on five real-world hypernetworks demonstrate the effectiveness of our proposed framework.
arXiv Detail & Related papers (2021-04-29T13:35:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.