Exploring the Impact of Model Scaling on Parameter-Efficient Tuning
- URL: http://arxiv.org/abs/2306.02320v2
- Date: Sun, 10 Dec 2023 19:43:28 GMT
- Title: Exploring the Impact of Model Scaling on Parameter-Efficient Tuning
- Authors: Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin,
Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu,
Maosong Sun
- Abstract summary: Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs)
In small PLMs, there are usually noticeable performance differences among PET methods.
We introduce a more flexible PET method called Arbitrary PET (APET) method.
- Score: 100.61202305296275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient tuning (PET) methods can effectively drive extremely
large pre-trained language models (PLMs) by training only minimal parameters.
Different PET methods utilize different manually designed tunable modules. In
small PLMs, there are usually noticeable performance differences among PET
methods. Nevertheless, as the model scale increases, the performance
differences become marginal. Hence, we hypothesize that model scaling mitigates
the impact of design differences on PET methods. To investigate this
hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET)
method. The APET method is compatible with a tunable module, which consists of
any number of parameters distributed in arbitrary positions. Then, we utilize
it and conduct experiments on 11 NLP tasks across 3 representative PLMs. Our
investigations reveal that model scaling (1) mitigates the effects of the
positions of tunable parameters on performance, and (2) enables tuning methods
to achieve performance comparable to full-parameter fine-tuning by optimizing
fewer tunable parameters. Intriguingly, we also observe that tuning methods
optimize the similar number of tunable parameters to exceed random guess
performance on different tasks. We collectively discuss this phenomenon and the
two aforementioned findings from an optimization perspective to understand the
underlying mechanisms. These conclusions enhance our understanding of the
impact of model scaling on PET and assist in designing more effective and
efficient PET methods for PLMs of different scales. The source code can be
obtained from this GitHub repository:
\url{https://github.com/yushengsu-thu/PET_Scaling}.
Related papers
- Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - When Scaling Meets LLM Finetuning: The Effect of Data, Model and
Finetuning Method [56.571951345048355]
Large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications.
We study whether and how different scaling factors, including LLM model size, pretraining data size, new finetuning parameter size and finetuning data size, affect the finetuning performance.
arXiv Detail & Related papers (2024-02-27T04:18:49Z) - ConPET: Continual Parameter-Efficient Tuning for Large Language Models [65.48107393731861]
Continual learning requires continual adaptation of models to newly emerging tasks.
We propose Continual.
Efficient Tuning (ConPET), a generalizable paradigm for.
continual task adaptation of large language models.
arXiv Detail & Related papers (2023-09-26T08:52:04Z) - Gradient-Based Automated Iterative Recovery for Parameter-Efficient
Tuning [11.124310650599146]
We use TracIn to improve model performance in the parameter-efficient tuning (PET) setting.
We develop a new methodology for using gradient-based explainability techniques to improve model performance.
arXiv Detail & Related papers (2023-02-13T18:54:58Z) - KronA: Parameter Efficient Tuning with Kronecker Adapter [17.175408603709712]
We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs.
We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
arXiv Detail & Related papers (2022-12-20T20:56:52Z) - Towards a Unified View on Visual Parameter-Efficient Transfer Learning [96.99924127527002]
We propose a framework with a unified view called visual-PETL (V-PETL) to investigate the different aspects affecting the trade-off.
An effective scheme Swin-BAPAT derived from the proposed V-PETL framework achieves significantly better performance than the state-of-the-art AdaptFormer-Swin.
arXiv Detail & Related papers (2022-10-03T09:54:39Z) - Sparse Structure Search for Parameter-Efficient Tuning [85.49094523664428]
We show that S$3$PET surpasses manual and random structures with less trainable parameters.
The searched structures preserve more than 99% fine-tuning performance with 0.01% trainable parameters.
arXiv Detail & Related papers (2022-06-15T08:45:21Z) - Revisiting Parameter-Efficient Tuning: Are We Really There Yet? [33.13293845589329]
PETuning methods claim to have achieved performance on par with or better than finetuning.
We take a step back and re-examine these PETuning methods by conducting the first comprehensive investigation into the training and evaluation of PETuning methods.
arXiv Detail & Related papers (2022-02-16T10:11:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.