KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning
- URL: http://arxiv.org/abs/2009.05668v1
- Date: Fri, 11 Sep 2020 21:48:39 GMT
- Title: KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning
- Authors: Li Yang, Zhezhi He, Junshan Zhang, Deliang Fan
- Abstract summary: Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
- Score: 49.77278179376902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNN) could forget the knowledge about earlier tasks
when learning new tasks, and this is known as \textit{catastrophic forgetting}.
While recent continual learning methods are capable of alleviating the
catastrophic problem on toy-sized datasets, some issues still remain to be
tackled when applying them in real-world problems. Recently, the fast
mask-based learning method (e.g. piggyback \cite{mallya2018piggyback}) is
proposed to address these issues by learning only a binary element-wise mask in
a fast manner, while keeping the backbone model fixed. However, the binary mask
has limited modeling capacity for new tasks. A more recent work
\cite{hung2019compacting} proposes a compress-grow-based method (CPG) to
achieve better accuracy for new tasks by partially training backbone model, but
with order-higher training cost, which makes it infeasible to be deployed into
popular state-of-the-art edge-/mobile-learning. The primary goal of this work
is to simultaneously achieve fast and high-accuracy multi task adaption in
continual learning setting. Thus motivated, we propose a new training method
called \textit{kernel-wise Soft Mask} (KSM), which learns a kernel-wise hybrid
binary and real-value soft mask for each task, while using the same backbone
model. Such a soft mask can be viewed as a superposition of a binary mask and a
properly scaled real-value tensor, which offers a richer representation
capability without low-level kernel support to meet the objective of low
hardware overhead. We validate KSM on multiple benchmark datasets against
recent state-of-the-art methods (e.g. Piggyback, Packnet, CPG, etc.), which
shows good improvement in both accuracy and training cost.
Related papers
- Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation [42.020470627552136]
Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks.
mask classification is the main performance bottleneck for open-vocab panoptic segmentation.
We propose Semantic Refocused Tuning, a novel framework that greatly enhances open-vocab panoptic segmentation.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - Downstream Task Guided Masking Learning in Masked Autoencoders Using
Multi-Level Optimization [42.82742477950748]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning.
We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining.
Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
arXiv Detail & Related papers (2024-02-28T07:37:26Z) - CL-MAE: Curriculum-Learned Masked Autoencoders [49.24994655813455]
We propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task.
We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE.
arXiv Detail & Related papers (2023-08-31T09:13:30Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - ImpressLearn: Continual Learning via Combined Task Impressions [0.0]
This work proposes a new method to sequentially train a deep neural network on multiple tasks without suffering catastrophic forgetting.
We show that simply learning a linear combination of a small number of task-specific masks on a randomly backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on new tasks.
arXiv Detail & Related papers (2022-10-05T02:28:25Z) - SdAE: Self-distillated Masked Autoencoder [95.3684955370897]
Self-distillated masked AutoEncoder network SdAE is proposed in this paper.
With only 300 epochs pre-training, a vanilla ViT-Base model achieves an 84.1% fine-tuning accuracy on ImageNet-1k classification.
arXiv Detail & Related papers (2022-07-31T15:07:25Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Ternary Feature Masks: zero-forgetting for task-incremental learning [68.34518408920661]
We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
arXiv Detail & Related papers (2020-01-23T18:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.