Related papers: Ternary Feature Masks: zero-forgetting for task-incremental learning

Ternary Feature Masks: zero-forgetting for task-incremental learning

URL: http://arxiv.org/abs/2001.08714v2
Date: Wed, 21 Apr 2021 10:34:21 GMT
Title: Ternary Feature Masks: zero-forgetting for task-incremental learning
Authors: Marc Masana, Tinne Tuytelaars, Joost van de Weijer
Abstract summary: We propose an approach without any forgetting to continual learning for the task-aware regime. By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them. Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
Score: 68.34518408920661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose an approach without any forgetting to continual learning for the task-aware regime, where at inference the task-label is known. By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them. Using masks prevents both catastrophic forgetting and backward transfer. We argue -- and show experimentally -- that avoiding the former largely compensates for the lack of the latter, which is rarely observed in practice. In contrast to earlier works, our masks are applied to the features (activations) of each layer instead of the weights. This considerably reduces the number of mask parameters for each new task; with more than three orders of magnitude for most networks. The encoding of the ternary masks into two bits per feature creates very little overhead to the network, avoiding scalability issues. To allow already learned features to adapt to the current task without changing the behavior of these features for previous tasks, we introduce task-specific feature normalization. Extensive experiments on several finegrained datasets and ImageNet show that our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.

Related papers

ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights. We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z)
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization [42.82742477950748]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
arXiv Detail & Related papers (2024-02-28T07:37:26Z)
Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class Incremental Learning [64.14254712331116]
Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past. We propose a novel framework of fine-grained knowledge selection and restoration.
arXiv Detail & Related papers (2023-12-20T02:34:11Z)
PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning [14.332279447231416]
Self-supervised learning has been actively studied in time series domain recently, especially for masked reconstruction. Most of these methods follow the "Pre-training + Fine-tuning" paradigm in which a new decoder replaces the pre-trained decoder. We propose a simple yet effective prompt token tuning (PT-Tuning) paradigm, in which all pre-trained parameters are frozen and only a few trainable prompt tokens are added to extended mask tokens.
arXiv Detail & Related papers (2023-11-07T07:11:27Z)
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models [89.07925369856139]
We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection. Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage. It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
arXiv Detail & Related papers (2023-07-27T17:56:05Z)
ImpressLearn: Continual Learning via Combined Task Impressions [0.0]
This work proposes a new method to sequentially train a deep neural network on multiple tasks without suffering catastrophic forgetting. We show that simply learning a linear combination of a small number of task-specific masks on a randomly backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on new tasks.
arXiv Detail & Related papers (2022-10-05T02:28:25Z)
Incremental Task Learning with Incremental Rank Updates [20.725181015069435]
We propose a new incremental task learning framework based on low-rank factorization. We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting.
arXiv Detail & Related papers (2022-07-19T05:21:14Z)
KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting. Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets. We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.