Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm
- URL: http://arxiv.org/abs/2303.07910v1
- Date: Tue, 14 Mar 2023 13:50:31 GMT
- Title: Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm
- Authors: Hengyuan Zhao, Hao Luo, Yuyang Zhao, Pichao Wang, Fan Wang, Mike Zheng
Shou
- Abstract summary: PETL aims at efficiently adapting large models pre-trained on massive data to downstream tasks with limited task-specific data.
This paper proposes a novel two-stage paradigm, where the pre-trained model is first aligned to the target distribution.
The proposed paradigm achieves state-of-the-art performance on the average accuracy of 19 downstream tasks.
- Score: 21.747744343882392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-Efficient Transfer Learning (PETL) aims at efficiently adapting
large models pre-trained on massive data to downstream tasks with limited
task-specific data. In view of the practicality of PETL, previous works focus
on tuning a small set of parameters for each downstream task in an end-to-end
manner while rarely considering the task distribution shift issue between the
pre-training task and the downstream task. This paper proposes a novel
two-stage paradigm, where the pre-trained model is first aligned to the target
distribution. Then the task-relevant information is leveraged for effective
adaptation. Specifically, the first stage narrows the task distribution shift
by tuning the scale and shift in the LayerNorm layers. In the second stage, to
efficiently learn the task-relevant information, we propose a Taylor
expansion-based importance score to identify task-relevant channels for the
downstream task and then only tune such a small portion of channels, making the
adaptation to be parameter-efficient. Overall, we present a promising new
direction for PETL, and the proposed paradigm achieves state-of-the-art
performance on the average accuracy of 19 downstream tasks.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - TAIL: Task-specific Adapters for Imitation Learning with Large
Pretrained Models [32.83440439290383]
We introduce TAIL (Task-specific Adapters for Learning), a framework for efficient adaptation to new control tasks.
Inspired by recent advancements in parameter-efficient fine-tuning in language domains, we explore efficient fine-tuning techniques.
Our experiments in large-scale language-conditioned manipulation tasks suggest that TAIL with LoRA can achieve the best post-adaptation performance.
arXiv Detail & Related papers (2023-10-09T17:49:50Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Scalable Weight Reparametrization for Efficient Transfer Learning [10.265713480189486]
Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks.
Previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models.
We suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters.
arXiv Detail & Related papers (2023-02-26T23:19:11Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Efficient Continual Adaptation for Generative Adversarial Networks [97.20244383723853]
We present a continual learning approach for generative adversarial networks (GANs)
Our approach is based on learning a set of global and task-specific parameters.
We show that the feature-map transformation based approach outperforms state-of-the-art continual GANs methods.
arXiv Detail & Related papers (2021-03-06T05:09:37Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z) - Investigating Transferability in Pretrained Language Models [8.83046338075119]
We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance.
This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks.
arXiv Detail & Related papers (2020-04-30T17:23:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.