Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
- URL: http://arxiv.org/abs/2202.07962v1
- Date: Wed, 16 Feb 2022 10:11:19 GMT
- Title: Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
- Authors: Guanzheng Chen, Fangyu Liu, Zaiqiao Meng, Shangsong Liang
- Abstract summary: PETuning methods claim to have achieved performance on par with or better than finetuning.
We take a step back and re-examine these PETuning methods by conducting the first comprehensive investigation into the training and evaluation of PETuning methods.
- Score: 33.13293845589329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient tuning (PETuning) methods have been deemed by many as the
new paradigm for using pretrained language models (PLMs). By tuning just a
fraction amount of parameters comparing to full model finetuning, PETuning
methods claim to have achieved performance on par with or even better than
finetuning. In this work, we take a step back and re-examine these PETuning
methods by conducting the first comprehensive investigation into the training
and evaluation of PETuning methods. We found the problematic validation and
testing practice in current studies, when accompanied by the instability nature
of PETuning methods, has led to unreliable conclusions. When being compared
under a truly fair evaluation protocol, PETuning cannot yield consistently
competitive performance while finetuning remains to be the best-performing
method in medium- and high-resource settings. We delve deeper into the cause of
the instability and observed that model size does not explain the phenomenon
but training iteration positively correlates with the stability.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - Exploring the Impact of Model Scaling on Parameter-Efficient Tuning [100.61202305296275]
Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs)
In small PLMs, there are usually noticeable performance differences among PET methods.
We introduce a more flexible PET method called Arbitrary PET (APET) method.
arXiv Detail & Related papers (2023-06-04T10:10:54Z) - Test-Time Adaptation with Perturbation Consistency Learning [32.58879780726279]
We propose a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts.
Our method can achieve higher or comparable performance with less inference time over strong PLM backbones.
arXiv Detail & Related papers (2023-04-25T12:29:22Z) - Gradient-Based Automated Iterative Recovery for Parameter-Efficient
Tuning [11.124310650599146]
We use TracIn to improve model performance in the parameter-efficient tuning (PET) setting.
We develop a new methodology for using gradient-based explainability techniques to improve model performance.
arXiv Detail & Related papers (2023-02-13T18:54:58Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - Sparse Structure Search for Parameter-Efficient Tuning [85.49094523664428]
We show that S$3$PET surpasses manual and random structures with less trainable parameters.
The searched structures preserve more than 99% fine-tuning performance with 0.01% trainable parameters.
arXiv Detail & Related papers (2022-06-15T08:45:21Z) - Know Where You're Going: Meta-Learning for Parameter-Efficient
Fine-tuning [34.66092282348687]
We show that taking the ultimate choice of fine-tuning method into consideration boosts the performance of parameter-efficient fine-tuning.
We prime the pretrained model specifically for parameter-efficient fine-tuning, resulting in gains of up to 1.7 points on cross-lingual NER fine-tuning.
arXiv Detail & Related papers (2022-05-25T02:51:57Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.