DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
- URL: http://arxiv.org/abs/2309.05173v5
- Date: Sun, 18 Feb 2024 10:02:23 GMT
- Title: DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
- Authors: Zhengxiang Shi, Aldo Lipani
- Abstract summary: We propose DePT, which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates.
We demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios.
- Score: 14.975436239088312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning (PT), where a small amount of trainable soft (continuous)
prompt vectors is affixed to the input of language models (LM), has shown
promising results across various tasks and models for parameter-efficient
fine-tuning (PEFT). PT stands out from other PEFT approaches because it
maintains competitive performance with fewer trainable parameters and does not
drastically scale up its parameters as the model size expands. However, PT
introduces additional soft prompt tokens, leading to longer input sequences,
which significantly impacts training and inference time and memory usage due to
the Transformer's quadratic complexity. Particularly concerning for Large
Language Models (LLMs) that face heavy daily querying. To address this issue,
we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt
into a shorter soft prompt and a pair of low-rank matrices that are then
optimised with two different learning rates. This allows DePT to achieve better
performance while saving substantial memory and time costs compared to vanilla
PT and its variants, without changing trainable parameter sizes. Through
extensive experiments on 23 natural language processing (NLP) and
vision-language (VL) tasks, we demonstrate that DePT outperforms
state-of-the-art PEFT approaches, including the full fine-tuning baseline, in
some scenarios. Additionally, we empirically show that DEPT grows more
efficient as the model size increases. Our further study reveals that DePT
integrates seamlessly with parameter-efficient transfer learning in the
few-shot learning setting and highlights its adaptability to various model
architectures and sizes.
Related papers
- ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - LoRETTA: Low-Rank Economic Tensor-Train Adaptation for
Ultra-Low-Parameter Fine-Tuning of Large Language Models [20.5908375260123]
Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance.
We present LoRETTA, a framework that significantly reduces trainable parameters through tensor-train decomposition.
LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100times$ fewer parameters on the LLaMA-2-7B models.
arXiv Detail & Related papers (2024-02-18T01:20:00Z) - Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More [9.230338573494622]
Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models to specific tasks.
This paper investigates the potential of SPT for cross-lingual transfer.
arXiv Detail & Related papers (2024-02-06T07:52:30Z) - Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning
for Versatile Multimodal Modeling [42.42235704360381]
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on a wide range of tasks.
These large scales make it impossible to adapt and deploy fully specialized models given a task of interest.
In this work, we describe AdaLink as a non-intrusive PEFT technique that achieves competitive performance.
arXiv Detail & Related papers (2023-10-18T16:43:08Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - How Does In-Context Learning Help Prompt Tuning? [55.78535874154915]
Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale.
This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model.
Recently, Singhal et al. (2022) propose instruction prompt tuning'' (IPT), which combines PT with ICL by concatenating a natural language demonstration with learned prompt embeddings.
arXiv Detail & Related papers (2023-02-22T17:45:12Z) - AutoPEFT: Automatic Configuration Search for Parameter-Efficient
Fine-Tuning [77.61565726647784]
Motivated by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection.
We show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs.
arXiv Detail & Related papers (2023-01-28T08:51:23Z) - FPT: Improving Prompt Tuning Efficiency via Progressive Training [84.25195519945215]
We propose Fast Prompt Tuning to improve prompt tuning's training efficiency.
We show that FPT could save over 30% training computations while achieving comparable performance.
arXiv Detail & Related papers (2022-11-13T08:00:29Z) - When does Parameter-Efficient Transfer Learning Work for Machine
Translation? [8.862707047517913]
Prior work indicates that PEFTs may not work as well for machine translation (MT)
We conduct a comprehensive empirical study of PEFTs for MT, considering (1) various parameter budgets, (2) a diverse set of language-pairs, and (3) different pre-trained models.
We find that using PEFTs with a larger pre-trained model outperforms full fine-tuning with a smaller model, and for smaller training data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.
arXiv Detail & Related papers (2022-05-23T12:49:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.