Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for
Pre-trained Language Models
- URL: http://arxiv.org/abs/2203.06904v2
- Date: Tue, 15 Mar 2022 01:22:04 GMT
- Title: Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for
Pre-trained Language Models
- Authors: Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng
Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao,
Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang,
Juanzi Li, Maosong Sun
- Abstract summary: In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched.
Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full- parameter fine-tuning.
- Score: 90.24999406296867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the success, the process of fine-tuning large-scale PLMs brings
prohibitive adaptation costs. In fact, fine-tuning all the parameters of a
colossal model and retaining separate instances for different tasks are
practically infeasible. This necessitates a new branch of research focusing on
the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this
paper. In contrast with the standard fine-tuning, delta tuning only fine-tunes
a small portion of the model parameters while keeping the rest untouched,
largely reducing both the computation and storage costs. Recent studies have
demonstrated that a series of delta tuning methods with distinct tuned
parameter selection could achieve performance on a par with full-parameter
fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In
this paper, we first formally describe the problem of delta tuning and then
comprehensively review recent delta tuning approaches. We also propose a
unified categorization criterion that divide existing delta tuning methods into
three groups: addition-based, specification-based, and reparameterization-based
methods. Though initially proposed as an efficient method to steer large
models, we believe that some of the fascinating evidence discovered along with
delta tuning could help further reveal the mechanisms of PLMs and even deep
neural networks. To this end, we discuss the theoretical principles underlying
the effectiveness of delta tuning and propose frameworks to interpret delta
tuning from the perspective of optimization and optimal control, respectively.
Furthermore, we provide a holistic empirical study of representative methods,
where results on over 100 NLP tasks demonstrate a comprehensive performance
comparison of different approaches. The experimental results also cover the
analysis of combinatorial, scaling and transferable properties of delta tuning.
Related papers
- Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models.
We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies.
Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z) - Astraios: Parameter-Efficient Instruction Tuning Code Large Language
Models [21.17021844323919]
We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters.
We find that FFT leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale.
arXiv Detail & Related papers (2024-01-01T15:30:19Z) - Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision
Transformers [50.23439411530435]
We show that Partial Fine-Tuning can be an innovative and promising direction capable of concurrently enhancing both efficiency and accuracy.
We propose a novel fine-tuned angle metric to guide the selection of appropriate layers for partial fine-tuning.
Comprehensive experiments on a wide range of datasets and models validate the great potential of partial fine-tuning.
arXiv Detail & Related papers (2023-12-25T10:11:34Z) - OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of
Pre-trained Models [81.7855202178564]
We present OpenDelta, an open-source library that overcomes limitations by providing a plug-and-play implementation of various delta tuning methods.
Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs.
arXiv Detail & Related papers (2023-07-05T16:30:14Z) - Rethinking Efficient Tuning Methods from a Unified Perspective [34.67645496324432]
We revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning.
The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning.
arXiv Detail & Related papers (2023-03-01T17:38:03Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.