Training Neural Networks with Fixed Sparse Masks
- URL: http://arxiv.org/abs/2111.09839v1
- Date: Thu, 18 Nov 2021 18:06:01 GMT
- Title: Training Neural Networks with Fixed Sparse Masks
- Authors: Yi-Lin Sung, Varun Nair, and Colin Raffel
- Abstract summary: Recent work has shown that it is possible to update only a small subset of the model's parameters during training.
We show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations.
- Score: 19.58969772430058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During typical gradient-based training of deep neural networks, all of the
model's parameters are updated at each iteration. Recent work has shown that it
is possible to update only a small subset of the model's parameters during
training, which can alleviate storage and communication requirements. In this
paper, we show that it is possible to induce a fixed sparse mask on the model's
parameters that selects a subset to update over many iterations. Our method
constructs the mask out of the $k$ parameters with the largest Fisher
information as a simple approximation as to which parameters are most important
for the task at hand. In experiments on parameter-efficient transfer learning
and distributed training, we show that our approach matches or exceeds the
performance of other methods for training with sparse updates while being more
efficient in terms of memory usage and communication costs. We release our code
publicly to promote further applications of our approach.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models [89.07925369856139]
We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection.
Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage.
It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
arXiv Detail & Related papers (2023-07-27T17:56:05Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Scalable Weight Reparametrization for Efficient Transfer Learning [10.265713480189486]
Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks.
Previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models.
We suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters.
arXiv Detail & Related papers (2023-02-26T23:19:11Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Fine-tuning Image Transformers using Learnable Memory [14.478892724736404]
We propose augmenting Vision Transformer models with learnable memory tokens.
Our approach allows the model to adapt to new tasks, using few parameters.
We show that augmenting a model with just a handful of such tokens per layer significantly improves accuracy.
arXiv Detail & Related papers (2022-03-29T05:26:20Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.