Related papers: Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

URL: http://arxiv.org/abs/2306.02320v2
Date: Sun, 10 Dec 2023 19:43:28 GMT
Title: Exploring the Impact of Model Scaling on Parameter-Efficient Tuning
Authors: Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun
Abstract summary: Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) In small PLMs, there are usually noticeable performance differences among PET methods. We introduce a more flexible PET method called Arbitrary PET (APET) method.
Score: 100.61202305296275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parameter-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) by training only minimal parameters. Different PET methods utilize different manually designed tunable modules. In small PLMs, there are usually noticeable performance differences among PET methods. Nevertheless, as the model scale increases, the performance differences become marginal. Hence, we hypothesize that model scaling mitigates the impact of design differences on PET methods. To investigate this hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET) method. The APET method is compatible with a tunable module, which consists of any number of parameters distributed in arbitrary positions. Then, we utilize it and conduct experiments on 11 NLP tasks across 3 representative PLMs. Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters. Intriguingly, we also observe that tuning methods optimize the similar number of tunable parameters to exceed random guess performance on different tasks. We collectively discuss this phenomenon and the two aforementioned findings from an optimization perspective to understand the underlying mechanisms. These conclusions enhance our understanding of the impact of model scaling on PET and assist in designing more effective and efficient PET methods for PLMs of different scales. The source code can be obtained from this GitHub repository: \url{https://github.com/yushengsu-thu/PET_Scaling}.

Related papers

Faster Parameter-Efficient Tuning with Token Redundancy Reduction [38.47377525427411]
latency-efficient tuning (PET) aims to transfer pre-trained foundation models to downstream tasks by learning a small number of parameters. PET significantly reduces storage and transfer costs for each task regardless of exponentially increasing pre-trained model capacity. Most PET methods inherit the inference of their large backbone models and often introduce additional computational overhead.
arXiv Detail & Related papers (2025-03-26T07:15:08Z)
UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification [32.3387409534726]
This study explores parameter-efficient tuning methods for large-scale pre-trained SSL speech models to speaker verification task. We propose three PET methods: (i)an adapter-tuning method, (ii)a prompt-tuning method, and (iii)a unified framework that effectively incorporates adapter-tuning and prompt-tuning with a dynamically learnable gating mechanism. The proposed UniPET-SPK learns to find the optimal mixture of PET methods to match different datasets and scenarios.
arXiv Detail & Related papers (2025-01-27T22:26:37Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models [14.762222323897978]
We propose a novel parameter-efficient training (PET) method for large language models. Unlike prior methods, this subset is not fixed in location but rather which parameters are modified over the course of training. Our method enables a seamless scaling of the subset size across an arbitrary proportion of the total model size.
arXiv Detail & Related papers (2024-11-13T13:53:10Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method [56.571951345048355]
Large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications. We study whether and how different scaling factors, including LLM model size, pretraining data size, new finetuning parameter size and finetuning data size, affect the finetuning performance.
arXiv Detail & Related papers (2024-02-27T04:18:49Z)
ConPET: Continual Parameter-Efficient Tuning for Large Language Models [65.48107393731861]
Continual learning requires continual adaptation of models to newly emerging tasks. We propose Continual. Efficient Tuning (ConPET), a generalizable paradigm for. continual task adaptation of large language models.
arXiv Detail & Related papers (2023-09-26T08:52:04Z)
KronA: Parameter Efficient Tuning with Kronecker Adapter [17.175408603709712]
We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
arXiv Detail & Related papers (2022-12-20T20:56:52Z)
Towards a Unified View on Visual Parameter-Efficient Transfer Learning [96.99924127527002]
We propose a framework with a unified view called visual-PETL (V-PETL) to investigate the different aspects affecting the trade-off. An effective scheme Swin-BAPAT derived from the proposed V-PETL framework achieves significantly better performance than the state-of-the-art AdaptFormer-Swin.
arXiv Detail & Related papers (2022-10-03T09:54:39Z)
Sparse Structure Search for Parameter-Efficient Tuning [85.49094523664428]
We show that S$3$PET surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99% fine-tuning performance with 0.01% trainable parameters.
arXiv Detail & Related papers (2022-06-15T08:45:21Z)
Revisiting Parameter-Efficient Tuning: Are We Really There Yet? [33.13293845589329]
PETuning methods claim to have achieved performance on par with or better than finetuning. We take a step back and re-examine these PETuning methods by conducting the first comprehensive investigation into the training and evaluation of PETuning methods.
arXiv Detail & Related papers (2022-02-16T10:11:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.