Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for
Pre-trained Language Models
- URL: http://arxiv.org/abs/2203.06904v2
- Date: Tue, 15 Mar 2022 01:22:04 GMT
- Title: Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for
Pre-trained Language Models
- Authors: Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng
Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao,
Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang,
Juanzi Li, Maosong Sun
- Abstract summary: In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched.
Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full- parameter fine-tuning.
- Score: 90.24999406296867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the success, the process of fine-tuning large-scale PLMs brings
prohibitive adaptation costs. In fact, fine-tuning all the parameters of a
colossal model and retaining separate instances for different tasks are
practically infeasible. This necessitates a new branch of research focusing on
the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this
paper. In contrast with the standard fine-tuning, delta tuning only fine-tunes
a small portion of the model parameters while keeping the rest untouched,
largely reducing both the computation and storage costs. Recent studies have
demonstrated that a series of delta tuning methods with distinct tuned
parameter selection could achieve performance on a par with full-parameter
fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In
this paper, we first formally describe the problem of delta tuning and then
comprehensively review recent delta tuning approaches. We also propose a
unified categorization criterion that divide existing delta tuning methods into
three groups: addition-based, specification-based, and reparameterization-based
methods. Though initially proposed as an efficient method to steer large
models, we believe that some of the fascinating evidence discovered along with
delta tuning could help further reveal the mechanisms of PLMs and even deep
neural networks. To this end, we discuss the theoretical principles underlying
the effectiveness of delta tuning and propose frameworks to interpret delta
tuning from the perspective of optimization and optimal control, respectively.
Furthermore, we provide a holistic empirical study of representative methods,
where results on over 100 NLP tasks demonstrate a comprehensive performance
comparison of different approaches. The experimental results also cover the
analysis of combinatorial, scaling and transferable properties of delta tuning.
Related papers
- Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting.
We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z) - A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models [45.82689769685688]
Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks.
We introduce extensions to existing techniques like DARE and BitDelta to enhance the applicability and effectiveness of delta parameter editing in post-trained models.
arXiv Detail & Related papers (2024-10-17T17:56:53Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of
Pre-trained Models [81.7855202178564]
We present OpenDelta, an open-source library that overcomes limitations by providing a plug-and-play implementation of various delta tuning methods.
Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs.
arXiv Detail & Related papers (2023-07-05T16:30:14Z) - Rethinking Efficient Tuning Methods from a Unified Perspective [34.67645496324432]
We revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning.
The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning.
arXiv Detail & Related papers (2023-03-01T17:38:03Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.