Scalable Weight Reparametrization for Efficient Transfer Learning
- URL: http://arxiv.org/abs/2302.13435v1
- Date: Sun, 26 Feb 2023 23:19:11 GMT
- Title: Scalable Weight Reparametrization for Efficient Transfer Learning
- Authors: Byeonggeun Kim, Jun-Tae Lee, Seunghan yang, Simyung Chang
- Abstract summary: Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks.
Previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models.
We suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters.
- Score: 10.265713480189486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel, efficient transfer learning method, called
Scalable Weight Reparametrization (SWR) that is efficient and effective for
multiple downstream tasks. Efficient transfer learning involves utilizing a
pre-trained model trained on a larger dataset and repurposing it for downstream
tasks with the aim of maximizing the reuse of the pre-trained model. However,
previous works have led to an increase in updated parameters and task-specific
modules, resulting in more computations, especially for tiny models.
Additionally, there has been no practical consideration for controlling the
number of updated parameters. To address these issues, we suggest learning a
policy network that can decide where to reparametrize the pre-trained model,
while adhering to a given constraint for the number of updated parameters. The
policy network is only used during the transfer learning process and not
afterward. As a result, our approach attains state-of-the-art performance in a
proposed multi-lingual keyword spotting and a standard benchmark,
ImageNet-to-Sketch, while requiring zero additional computations and
significantly fewer additional parameters.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - PAC-Net: A Model Pruning Approach to Inductive Transfer Learning [16.153557870191488]
PAC-Net is a simple yet effective approach for transfer learning based on pruning.
PAC-Net consists of three steps: Prune, Allocate, and Calibrate.
Under the various and extensive set of inductive transfer learning experiments, we show that our method achieves state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2022-06-12T09:45:16Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Training Neural Networks with Fixed Sparse Masks [19.58969772430058]
Recent work has shown that it is possible to update only a small subset of the model's parameters during training.
We show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations.
arXiv Detail & Related papers (2021-11-18T18:06:01Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.