XPrompt: Exploring the Extreme of Prompt Tuning
- URL: http://arxiv.org/abs/2210.04457v1
- Date: Mon, 10 Oct 2022 06:57:19 GMT
- Title: XPrompt: Exploring the Extreme of Prompt Tuning
- Authors: Fang Ma, Chen Zhang, Lei Ren, Jingang Wang, Qifan Wang, Wei Wu,
Xiaojun Quan, Dawei Song
- Abstract summary: We propose a novel Prompt tuning model with an eXtremely small scale (XPrompt) under the regime of lottery tickets hypothesis.
XPrompt eliminates the negative prompt tokens at different levels through a hierarchical structured pruning, yielding a more parameter-efficient prompt yet with a competitive performance.
- Score: 31.242680485717447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning learns soft prompts to condition frozen Pre-trained Language
Models (PLMs) for performing downstream tasks in a parameter-efficient manner.
While prompt tuning has gradually reached the performance level of fine-tuning
as the model scale increases, there is still a large performance gap between
prompt tuning and fine-tuning for models of moderate and small scales
(typically less than 11B parameters). In this paper, we empirically show that
the trained prompt tokens can have a negative impact on a downstream task and
thus degrade its performance. To bridge the gap, we propose a novel Prompt
tuning model with an eXtremely small scale (XPrompt) under the regime of
lottery tickets hypothesis. Specifically, XPrompt eliminates the negative
prompt tokens at different granularity levels through a hierarchical structured
pruning, yielding a more parameter-efficient prompt yet with a competitive
performance. Comprehensive experiments are carried out on SuperGLUE tasks, and
the extensive results indicate that XPrompt is able to close the performance
gap at smaller model scales.
Related papers
- Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL [29.01858866450715]
We present RLPrompt, which aims to find optimal prompt tokens leveraging soft Q-learning.
While the results show promise, we have observed that the prompts frequently appear unnatural, which impedes their interpretability.
We address this limitation by using sparse Tsallis entropy regularization, a principled approach to filtering out unlikely tokens from consideration.
arXiv Detail & Related papers (2024-07-20T03:10:19Z) - On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts.
We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries.
Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z) - PTP: Boosting Stability and Performance of Prompt Tuning with
Perturbation-Based Regularizer [94.23904400441957]
We introduce perturbation-based regularizers, which can smooth the loss landscape, into prompt tuning.
We design two kinds of perturbation-based regularizers, including random-noise-based and adversarial-based.
Our new algorithms improve the state-of-the-art prompt tuning methods by 1.94% and 2.34% on SuperGLUE and FewGLUE benchmarks, respectively.
arXiv Detail & Related papers (2023-05-03T20:30:51Z) - Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts [97.20933523766182]
Prompt tuning is a parameter-efficient tuning (PETuning) method for utilizing pre-trained models (PTMs)
We present Late Prompt Tuning () that inserts a late prompt into an intermediate layer of the PTM instead of the input layer or all layers.
We show that, can achieve competitive performance to full model tuning and other PETuning methods under both full-data and few-shot scenarios.
arXiv Detail & Related papers (2022-10-20T14:23:52Z) - Prompt Tuning for Generative Multimodal Pretrained Models [75.44457974275154]
We implement prompt tuning on the unified sequence-to-sequence pretrained model adaptive to both understanding and generation tasks.
Experimental results demonstrate that the light-weight prompt tuning can achieve comparable performance with finetuning.
In comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks.
arXiv Detail & Related papers (2022-08-04T08:56:38Z) - STT: Soft Template Tuning for Few-Shot Adaptation [72.46535261444151]
We propose a new prompt-tuning framework, called Soft Template Tuning (STT)
STT combines manual and auto prompts, and treats downstream classification tasks as a masked language modeling task.
It can even outperform the time- and resource-consuming fine-tuning method on sentiment classification tasks.
arXiv Detail & Related papers (2022-07-18T07:07:22Z) - Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models [82.75572875007755]
We argue that one of the factors hindering the development of prompt-tuning on NLG tasks is the unfamiliar inputs.
This motivates us to propose input-tuning, which fine-tunes both the continuous prompts and the input representations.
Our proposed input-tuning is conceptually simple and empirically powerful.
arXiv Detail & Related papers (2022-03-07T05:04:32Z) - PPT: Pre-trained Prompt Tuning for Few-shot Learning [47.05554619258627]
Prompts for pre-trained language models (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks.
Among these methods, prompt tuning, which freezes PLMs and only tunes soft prompts, provides an efficient and effective solution for adapting large-scale PLMs to downstream tasks.
In our work, we find that prompt tuning performs comparably with conventional full-model fine-tuning when downstream data are sufficient, whereas it performs much worse under few-shot learning settings.
arXiv Detail & Related papers (2021-09-09T15:11:04Z) - The Power of Scale for Parameter-Efficient Prompt Tuning [4.481348281462904]
"prompt tuning" is a simple mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks.
Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin.
arXiv Detail & Related papers (2021-04-18T03:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.