Exploring the Impact of Model Scaling on Parameter-Efficient Tuning
- URL: http://arxiv.org/abs/2306.02320v2
- Date: Sun, 10 Dec 2023 19:43:28 GMT
- Title: Exploring the Impact of Model Scaling on Parameter-Efficient Tuning
- Authors: Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin,
Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu,
Maosong Sun
- Abstract summary: Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs)
In small PLMs, there are usually noticeable performance differences among PET methods.
We introduce a more flexible PET method called Arbitrary PET (APET) method.
- Score: 100.61202305296275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient tuning (PET) methods can effectively drive extremely
large pre-trained language models (PLMs) by training only minimal parameters.
Different PET methods utilize different manually designed tunable modules. In
small PLMs, there are usually noticeable performance differences among PET
methods. Nevertheless, as the model scale increases, the performance
differences become marginal. Hence, we hypothesize that model scaling mitigates
the impact of design differences on PET methods. To investigate this
hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET)
method. The APET method is compatible with a tunable module, which consists of
any number of parameters distributed in arbitrary positions. Then, we utilize
it and conduct experiments on 11 NLP tasks across 3 representative PLMs. Our
investigations reveal that model scaling (1) mitigates the effects of the
positions of tunable parameters on performance, and (2) enables tuning methods
to achieve performance comparable to full-parameter fine-tuning by optimizing
fewer tunable parameters. Intriguingly, we also observe that tuning methods
optimize the similar number of tunable parameters to exceed random guess
performance on different tasks. We collectively discuss this phenomenon and the
two aforementioned findings from an optimization perspective to understand the
underlying mechanisms. These conclusions enhance our understanding of the
impact of model scaling on PET and assist in designing more effective and
efficient PET methods for PLMs of different scales. The source code can be
obtained from this GitHub repository:
\url{https://github.com/yushengsu-thu/PET_Scaling}.
Related papers
- UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification [32.3387409534726]
This study explores parameter-efficient tuning methods for large-scale pre-trained SSL speech models to speaker verification task.
We propose three PET methods: (i)an adapter-tuning method, (ii)a prompt-tuning method, and (iii)a unified framework that effectively incorporates adapter-tuning and prompt-tuning with a dynamically learnable gating mechanism.
The proposed UniPET-SPK learns to find the optimal mixture of PET methods to match different datasets and scenarios.
arXiv Detail & Related papers (2025-01-27T22:26:37Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models [14.762222323897978]
We propose a novel parameter-efficient training (PET) method for large language models.
Unlike prior methods, this subset is not fixed in location but rather which parameters are modified over the course of training.
Our method enables a seamless scaling of the subset size across an arbitrary proportion of the total model size.
arXiv Detail & Related papers (2024-11-13T13:53:10Z) - When Scaling Meets LLM Finetuning: The Effect of Data, Model and
Finetuning Method [56.571951345048355]
Large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications.
We study whether and how different scaling factors, including LLM model size, pretraining data size, new finetuning parameter size and finetuning data size, affect the finetuning performance.
arXiv Detail & Related papers (2024-02-27T04:18:49Z) - ConPET: Continual Parameter-Efficient Tuning for Large Language Models [65.48107393731861]
Continual learning requires continual adaptation of models to newly emerging tasks.
We propose Continual.
Efficient Tuning (ConPET), a generalizable paradigm for.
continual task adaptation of large language models.
arXiv Detail & Related papers (2023-09-26T08:52:04Z) - KronA: Parameter Efficient Tuning with Kronecker Adapter [17.175408603709712]
We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs.
We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
arXiv Detail & Related papers (2022-12-20T20:56:52Z) - Towards a Unified View on Visual Parameter-Efficient Transfer Learning [96.99924127527002]
We propose a framework with a unified view called visual-PETL (V-PETL) to investigate the different aspects affecting the trade-off.
An effective scheme Swin-BAPAT derived from the proposed V-PETL framework achieves significantly better performance than the state-of-the-art AdaptFormer-Swin.
arXiv Detail & Related papers (2022-10-03T09:54:39Z) - Sparse Structure Search for Parameter-Efficient Tuning [85.49094523664428]
We show that S$3$PET surpasses manual and random structures with less trainable parameters.
The searched structures preserve more than 99% fine-tuning performance with 0.01% trainable parameters.
arXiv Detail & Related papers (2022-06-15T08:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.