On Transferability of Prompt Tuning for Natural Language Understanding
- URL: http://arxiv.org/abs/2111.06719v1
- Date: Fri, 12 Nov 2021 13:39:28 GMT
- Title: On Transferability of Prompt Tuning for Natural Language Understanding
- Authors: Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Zhiyuan
Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, Jie Zhou
- Abstract summary: We investigate the transferability of soft prompts across different tasks and models.
We find that trained soft prompts can well transfer to similar tasks and initialize PT for them to accelerate training and improve performance.
Our findings show that improving PT with knowledge transfer is possible and promising, while prompts' cross-task transferability is generally better than the cross-model transferability.
- Score: 63.29235426932978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt tuning (PT) is a promising parameter-efficient method to utilize
extremely large pre-trained language models (PLMs), which could achieve
comparable performance to full-parameter fine-tuning by only tuning a few soft
prompts. However, compared to fine-tuning, PT empirically requires much more
training steps. To explore whether we can improve the efficiency of PT by
reusing trained soft prompts and sharing learned knowledge, we empirically
investigate the transferability of soft prompts across different tasks and
models. In cross-task transfer, we find that trained soft prompts can well
transfer to similar tasks and initialize PT for them to accelerate training and
improve performance. Moreover, to explore what factors influence prompts'
transferability across tasks, we investigate how to measure the prompt
similarity and find that the overlapping rate of activated neurons highly
correlates to the transferability. In cross-model transfer, we explore how to
project the prompts of a PLM to another PLM and successfully train a kind of
projector which can achieve non-trivial transfer performance on similar tasks.
However, initializing PT with the projected prompts does not work well, which
may be caused by optimization preferences and PLMs' high redundancy. Our
findings show that improving PT with knowledge transfer is possible and
promising, while prompts' cross-task transferability is generally better than
the cross-model transferability.
Related papers
- Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More [9.230338573494622]
Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models to specific tasks.
This paper investigates the potential of SPT for cross-lingual transfer.
arXiv Detail & Related papers (2024-02-06T07:52:30Z) - Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer [21.57847333976567]
Multimodal Continual Instruction Tuning (MCIT) enables Multimodal Large Language Models (MLLMs) to meet continuously emerging requirements without expensive retraining.
MCIT faces two major obstacles: catastrophic forgetting (where old knowledge is forgotten) and negative forward transfer.
We propose Prompt Tuning with Positive Forward Transfer (Fwd-Prompt) to address these issues.
arXiv Detail & Related papers (2024-01-17T12:44:17Z) - Efficient Cross-Task Prompt Tuning for Few-Shot Conversational Emotion
Recognition [6.988000604392974]
Emotion Recognition in Conversation (ERC) has been widely studied due to its importance in developing emotion-aware empathetic machines.
We propose a derivative-free optimization method called Cross-Task Prompt Tuning (CTPT) for few-shot conversational emotion recognition.
arXiv Detail & Related papers (2023-10-23T06:46:03Z) - DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning [14.975436239088312]
We propose DePT, which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates.
We demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios.
arXiv Detail & Related papers (2023-09-11T00:02:05Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - How Does In-Context Learning Help Prompt Tuning? [55.78535874154915]
Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale.
This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model.
Recently, Singhal et al. (2022) propose instruction prompt tuning'' (IPT), which combines PT with ICL by concatenating a natural language demonstration with learned prompt embeddings.
arXiv Detail & Related papers (2023-02-22T17:45:12Z) - FPT: Improving Prompt Tuning Efficiency via Progressive Training [84.25195519945215]
We propose Fast Prompt Tuning to improve prompt tuning's training efficiency.
We show that FPT could save over 30% training computations while achieving comparable performance.
arXiv Detail & Related papers (2022-11-13T08:00:29Z) - Identifying Suitable Tasks for Inductive Transfer Through the Analysis
of Feature Attributions [78.55044112903148]
We use explainability techniques to predict whether task pairs will be complementary, through comparison of neural network activation between single-task models.
Our results show that, through this approach, it is possible to reduce training time by up to 83.5% at a cost of only 0.034 reduction in positive-class F1 on the TREC-IS 2020-A dataset.
arXiv Detail & Related papers (2022-02-02T15:51:07Z) - Frustratingly Easy Transferability Estimation [64.42879325144439]
We propose a simple, efficient, and effective transferability measure named TransRate.
TransRate measures the transferability as the mutual information between the features of target examples extracted by a pre-trained model and labels of them.
Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 22 pre-trained models and 16 downstream tasks.
arXiv Detail & Related papers (2021-06-17T10:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.