Ternary Feature Masks: zero-forgetting for task-incremental learning
- URL: http://arxiv.org/abs/2001.08714v2
- Date: Wed, 21 Apr 2021 10:34:21 GMT
- Title: Ternary Feature Masks: zero-forgetting for task-incremental learning
- Authors: Marc Masana, Tinne Tuytelaars, Joost van de Weijer
- Abstract summary: We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
- Score: 68.34518408920661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an approach without any forgetting to continual learning for the
task-aware regime, where at inference the task-label is known. By using ternary
masks we can upgrade a model to new tasks, reusing knowledge from previous
tasks while not forgetting anything about them. Using masks prevents both
catastrophic forgetting and backward transfer. We argue -- and show
experimentally -- that avoiding the former largely compensates for the lack of
the latter, which is rarely observed in practice. In contrast to earlier works,
our masks are applied to the features (activations) of each layer instead of
the weights. This considerably reduces the number of mask parameters for each
new task; with more than three orders of magnitude for most networks. The
encoding of the ternary masks into two bits per feature creates very little
overhead to the network, avoiding scalability issues. To allow already learned
features to adapt to the current task without changing the behavior of these
features for previous tasks, we introduce task-specific feature normalization.
Extensive experiments on several finegrained datasets and ImageNet show that
our method outperforms current state-of-the-art while reducing memory overhead
in comparison to weight-based approaches.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z) - Downstream Task Guided Masking Learning in Masked Autoencoders Using
Multi-Level Optimization [42.82742477950748]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning.
We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining.
Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
arXiv Detail & Related papers (2024-02-28T07:37:26Z) - Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class
Incremental Learning [64.14254712331116]
Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past.
We propose a novel framework of fine-grained knowledge selection and restoration.
arXiv Detail & Related papers (2023-12-20T02:34:11Z) - PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction
and Forecasting via Prompt Token Tuning [14.332279447231416]
Self-supervised learning has been actively studied in time series domain recently, especially for masked reconstruction.
Most of these methods follow the "Pre-training + Fine-tuning" paradigm in which a new decoder replaces the pre-trained decoder.
We propose a simple yet effective prompt token tuning (PT-Tuning) paradigm, in which all pre-trained parameters are frozen and only a few trainable prompt tokens are added to extended mask tokens.
arXiv Detail & Related papers (2023-11-07T07:11:27Z) - Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models [89.07925369856139]
We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection.
Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage.
It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
arXiv Detail & Related papers (2023-07-27T17:56:05Z) - ImpressLearn: Continual Learning via Combined Task Impressions [0.0]
This work proposes a new method to sequentially train a deep neural network on multiple tasks without suffering catastrophic forgetting.
We show that simply learning a linear combination of a small number of task-specific masks on a randomly backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on new tasks.
arXiv Detail & Related papers (2022-10-05T02:28:25Z) - Incremental Task Learning with Incremental Rank Updates [20.725181015069435]
We propose a new incremental task learning framework based on low-rank factorization.
We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting.
arXiv Detail & Related papers (2022-07-19T05:21:14Z) - KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.