IAPT: Instruction-Aware Prompt Tuning for Large Language Models
- URL: http://arxiv.org/abs/2405.18203v2
- Date: Fri, 7 Jun 2024 06:41:18 GMT
- Title: IAPT: Instruction-Aware Prompt Tuning for Large Language Models
- Authors: Wei Zhu, Aaron Xuxiang Tian, Congrui Yin, Yuan Ni, Xiaoling Wang, Guotong Xie,
- Abstract summary: We propose a novel prompt tuning method, Instruction-Aware Prompt Tuning (IAPT), that requires only four soft tokens.
First, we install a parameter-efficient soft prompt generator at each Transformer layer to generate idiosyncratic soft prompts for each input instruction.
Second, the soft prompt generators are modules with a bottleneck architecture consisting of a self-attention pooling operation, two linear projections, and an activation function.
- Score: 19.408462115679914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction-Aware Prompt Tuning (IAPT), that requires only four soft tokens. First, we install a parameter-efficient soft prompt generator at each Transformer layer to generate idiosyncratic soft prompts for each input instruction. The generated soft prompts can be seen as a semantic summary of the input instructions and can effectively guide the output generation. Second, the soft prompt generators are modules with a bottleneck architecture consisting of a self-attention pooling operation, two linear projections, and an activation function. Pilot experiments show that prompt generators at different Transformer layers require different activation functions. Thus, we propose to learn the idiosyncratic activation functions for prompt generators automatically with the help of rational functions. We have conducted experiments on various tasks, and the experimental results demonstrate that (a) our IAPT method can outperform the recent baselines with comparable tunable parameters. (b) Our IAPT method is more efficient than LoRA under the single-backbone multi-tenant setting.
Related papers
- Less is more: Summarizing Patch Tokens for efficient Multi-Label Class-Incremental Learning [38.36863497458095]
We propose a new class-incremental learning method Multi-Label class incremental learning via summarising pAtch tokeN Embeddings (MULTI-LANE)
Our proposed method Multi-Label class incremental learning via summarising pAtch tokeN Embeddings (MULTI-LANE) enables learning disentangled task-specific representations in MLCIL while ensuring fast inference.
arXiv Detail & Related papers (2024-05-24T15:18:27Z) - Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained
Models for Spatiotemporal Modeling [32.603558214472265]
We introduce Attention Prompt Tuning (APT) for video-based applications such as action recognition.
APT involves injecting a set of learnable prompts along with data tokens during fine-tuning while keeping the backbone frozen.
The proposed approach greatly reduces the number of FLOPs and latency while achieving a significant performance boost.
arXiv Detail & Related papers (2024-03-11T17:59:41Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Universality and Limitations of Prompt Tuning [65.8354898840308]
We take one of the first steps to understand the role of soft-prompt tuning for transformer-based architectures.
We analyze prompt tuning from the lens of universality and limitations with finite-depth pretrained transformers for continuous-valued functions.
Our result guarantees the existence of a strong transformer with a prompt to approximate any sequence-to-sequence function in the set of Lipschitz functions.
arXiv Detail & Related papers (2023-05-30T06:47:07Z) - Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts [97.20933523766182]
Prompt tuning is a parameter-efficient tuning (PETuning) method for utilizing pre-trained models (PTMs)
We present Late Prompt Tuning () that inserts a late prompt into an intermediate layer of the PTM instead of the input layer or all layers.
We show that, can achieve competitive performance to full model tuning and other PETuning methods under both full-data and few-shot scenarios.
arXiv Detail & Related papers (2022-10-20T14:23:52Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z) - IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.
We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z) - PPT: Pre-trained Prompt Tuning for Few-shot Learning [47.05554619258627]
Prompts for pre-trained language models (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks.
Among these methods, prompt tuning, which freezes PLMs and only tunes soft prompts, provides an efficient and effective solution for adapting large-scale PLMs to downstream tasks.
In our work, we find that prompt tuning performs comparably with conventional full-model fine-tuning when downstream data are sufficient, whereas it performs much worse under few-shot learning settings.
arXiv Detail & Related papers (2021-09-09T15:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.