Supermasks in Superposition
- URL: http://arxiv.org/abs/2006.14769v3
- Date: Thu, 22 Oct 2020 00:32:49 GMT
- Title: Supermasks in Superposition
- Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi,
Mohammad Rastegari, Jason Yosinski, Ali Farhadi
- Abstract summary: We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting.
Our approach uses a randomly, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance.
In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks.
- Score: 70.5780643117055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Supermasks in Superposition (SupSup) model, capable of
sequentially learning thousands of tasks without catastrophic forgetting. Our
approach uses a randomly initialized, fixed base network and for each task
finds a subnetwork (supermask) that achieves good performance. If task identity
is given at test time, the correct subnetwork can be retrieved with minimal
memory usage. If not provided, SupSup can infer the task using gradient-based
optimization to find a linear superposition of learned supermasks which
minimizes the output entropy. In practice we find that a single gradient step
is often sufficient to identify the correct mask, even among 2500 tasks. We
also showcase two promising extensions. First, SupSup models can be trained
entirely without task identity information, as they may detect when they are
uncertain about new data and allocate an additional supermask for the new
training distribution. Finally the entire, growing set of supermasks can be
stored in a constant-sized reservoir by implicitly storing them as attractors
in a fixed-sized Hopfield network.
Related papers
- Triple Point Masking [49.39218611030084]
Existing 3D mask learning methods encounter performance bottlenecks under limited data.
We introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders.
Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks.
arXiv Detail & Related papers (2024-09-26T05:33:30Z) - Exclusive Supermask Subnetwork Training for Continual Learning [95.5186263127864]
Continual Learning (CL) methods focus on accumulating knowledge over time while avoiding forgetting.
We propose ExSSNeT (Exclusive Supermask SubNEtwork Training), that performs exclusive and non-overlapping subnetwork weight training.
We demonstrate that ExSSNeT outperforms strong previous methods on both NLP and Vision domains while preventing forgetting.
arXiv Detail & Related papers (2022-10-18T23:27:07Z) - ImpressLearn: Continual Learning via Combined Task Impressions [0.0]
This work proposes a new method to sequentially train a deep neural network on multiple tasks without suffering catastrophic forgetting.
We show that simply learning a linear combination of a small number of task-specific masks on a randomly backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on new tasks.
arXiv Detail & Related papers (2022-10-05T02:28:25Z) - KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z) - Ternary Feature Masks: zero-forgetting for task-incremental learning [68.34518408920661]
We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
arXiv Detail & Related papers (2020-01-23T18:08:37Z) - BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity.
Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.