Parameter-Efficient Fine-Tuning without Introducing New Latency
- URL: http://arxiv.org/abs/2305.16742v1
- Date: Fri, 26 May 2023 08:44:42 GMT
- Title: Parameter-Efficient Fine-Tuning without Introducing New Latency
- Authors: Baohao Liao, Yan Meng, Christof Monz
- Abstract summary: We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
- Score: 7.631596468553607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient fine-tuning (PEFT) of pre-trained language models has
recently demonstrated remarkable achievements, effectively matching the
performance of full fine-tuning while utilizing significantly fewer trainable
parameters, and consequently addressing the storage and communication
constraints. Nonetheless, various PEFT methods are limited by their inherent
characteristics. In the case of sparse fine-tuning, which involves modifying
only a small subset of the existing parameters, the selection of fine-tuned
parameters is task- and domain-specific, making it unsuitable for federated
learning. On the other hand, PEFT methods with adding new parameters typically
introduce additional inference latency. In this paper, we demonstrate the
feasibility of generating a sparse mask in a task-agnostic manner, wherein all
downstream tasks share a common mask. Our approach, which relies solely on the
magnitude information of pre-trained parameters, surpasses existing
methodologies by a significant margin when evaluated on the GLUE benchmark.
Additionally, we introduce a novel adapter technique that directly applies the
adapter to pre-trained parameters instead of the hidden representation, thereby
achieving identical inference speed to that of full fine-tuning. Through
extensive experiments, our proposed method attains a new state-of-the-art
outcome in terms of both performance and storage efficiency, storing only 0.03%
parameters of full fine-tuning.
Related papers
- Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training [17.433808197776003]
We propose a simple yet effective fine-tuning framework based on Target Pre-training (TPP)
TPP includes an additional stage before PEFT to pre-train these target parameters.
TPP can be easily integrated into existing PEFT methods, significantly improving performance.
arXiv Detail & Related papers (2024-08-27T12:48:46Z) - Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - Parameter-Efficient Fine-Tuning With Adapters [5.948206235442328]
This research introduces a novel adaptation method utilizing the UniPELT framework as a base.
Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters.
arXiv Detail & Related papers (2024-05-09T01:40:38Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private
Tuning [32.69028093984526]
We propose a novel language transformer finetuning strategy that introduces task-specific parameters in multiple transformer layers.
We achieve within 5% of full finetuning performance on GLUE tasks with as few as 4,100 parameters per task.
Our method achieves the best or comparable utility compared to several recent finetuning methods when training with the same privacy constraints.
arXiv Detail & Related papers (2023-05-30T17:55:06Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
Benchmark for Speech Understanding [40.27182770995891]
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models.
We introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks.
arXiv Detail & Related papers (2023-03-02T08:57:33Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.