Towards a Unified View of Parameter-Efficient Transfer Learning
- URL: http://arxiv.org/abs/2110.04366v1
- Date: Fri, 8 Oct 2021 20:22:26 GMT
- Title: Towards a Unified View of Parameter-Efficient Transfer Learning
- Authors: Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham
Neubig
- Abstract summary: Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP.
Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance.
We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
- Score: 108.94786930869473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning large pre-trained language models on downstream tasks has become
the de-facto learning paradigm in NLP. However, conventional approaches
fine-tune all the parameters of the pre-trained model, which becomes
prohibitive as the model size and the number of tasks grow. Recent work has
proposed a variety of parameter-efficient transfer learning methods that only
fine-tune a small number of (extra) parameters to attain strong performance.
While effective, the critical ingredients for success and the connections among
the various methods are poorly understood. In this paper, we break down the
design of state-of-the-art parameter-efficient transfer learning methods and
present a unified framework that establishes connections between them.
Specifically, we re-frame them as modifications to specific hidden states in
pre-trained models, and define a set of design dimensions along which different
methods vary, such as the function to compute the modification and the position
to apply the modification. Through comprehensive empirical studies across
machine translation, text summarization, language understanding, and text
classification benchmarks, we utilize the unified view to identify important
design choices in previous methods. Furthermore, our unified framework enables
the transfer of design elements across different approaches, and as a result we
are able to instantiate new parameter-efficient fine-tuning methods that tune
less parameters than previous methods while being more effective, achieving
comparable results to fine-tuning all parameters on all four tasks.
Related papers
- Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting.
We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Rethinking Efficient Tuning Methods from a Unified Perspective [34.67645496324432]
We revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning.
The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning.
arXiv Detail & Related papers (2023-03-01T17:38:03Z) - Differentiable Entailment for Parameter Efficient Few Shot Learning [0.0]
We propose a new technique for parameter efficient few shot learning.
We quantify the tradeoff between parameter efficiency and performance in the few-shot regime.
We propose a simple model agnostic approach that can be extended to any task.
arXiv Detail & Related papers (2023-01-31T00:31:11Z) - Probing Out-of-Distribution Robustness of Language Models with
Parameter-Efficient Transfer Learning [17.110208720745064]
In this study, we explore how the ability to detect out-of-distribution changes as the size of the PLM grows or the transfer methods are altered.
We evaluate various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks.
arXiv Detail & Related papers (2023-01-27T11:27:40Z) - Parameter-Efficient Fine-Tuning Design Spaces [63.954953653386106]
We present a parameter-efficient fine-tuning design paradigm and discover design patterns that are applicable to different experimental settings.
We show experimentally that these methods consistently and significantly outperform investigated parameter-efficient fine-tuning strategies across different backbone models and different tasks in natural language processing.
arXiv Detail & Related papers (2023-01-04T21:00:18Z) - Analyzing Transformers in Embedding Space [59.434807802802105]
We present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space.
We show that parameters of both pretrained and fine-tuned models can be interpreted in embedding space.
Our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
arXiv Detail & Related papers (2022-09-06T14:36:57Z) - Know Where You're Going: Meta-Learning for Parameter-Efficient
Fine-tuning [34.66092282348687]
We show that taking the ultimate choice of fine-tuning method into consideration boosts the performance of parameter-efficient fine-tuning.
We prime the pretrained model specifically for parameter-efficient fine-tuning, resulting in gains of up to 1.7 points on cross-lingual NER fine-tuning.
arXiv Detail & Related papers (2022-05-25T02:51:57Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.