StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation
- URL: http://arxiv.org/abs/2404.19335v2
- Date: Thu, 03 Oct 2024 07:59:36 GMT
- Title: StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation
- Authors: Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen,
- Abstract summary: sysname outperforms state-of-the-art methods by 6.97% in accuracy and reduces the standard deviation by 1.92 on average.
Tests underscore its robustness and stability across 8 datasets covering various tasks.
- Score: 14.341806875791288
- License:
- Abstract: Large language models have shown their ability to become effective few-shot learners with prompting, revolutionizing the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which limits its extension to more real-world applications. To tackle this issue, we propose to treat the hard prompt and soft prompt as separate inputs to mitigate noise brought by the prompt initialization. Furthermore, we optimize soft prompts with contrastive learning for utilizing class-aware information in the training process to maintain model performance. Experimental results demonstrate that \sysname outperforms state-of-the-art methods by 6.97% in accuracy and reduces the standard deviation by 1.92 on average. Furthermore, extensive experiments underscore its robustness and stability across 8 datasets covering various tasks. Codes are available at https://github.com/lccc0528/Stable/tree/main.
Related papers
- Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability [67.77534983324229]
In this paper, we investigate the ability of Large Language Models to develop a unified compression method that discretizes uninformative tokens.
Experiments show Selection-p achieves state-of-the-art performance across numerous classification tasks.
It exhibits superior transferability to different models compared to prior work.
arXiv Detail & Related papers (2024-10-15T17:05:25Z) - Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning.
In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach.
Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z) - On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts.
We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries.
Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z) - Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion [9.55994486328914]
Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters.
Existing methods are hard to balance accuracy and efficiency.
A longer (shorter) soft prompt generally leads to a better(worse) accuracy but at the cost of more (less) training time.
We propose an Efficient Prompt Tuning method (EPT) by multi-space projection and prompt fusion.
arXiv Detail & Related papers (2024-05-19T06:43:12Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
Language Understanding [51.48361798508375]
We develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters.
We show that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods.
arXiv Detail & Related papers (2023-06-08T04:31:48Z) - Evaluating the Robustness of Discrete Prompts [27.919548466481583]
We conduct a systematic study of the robustness of discrete prompts.
We measure their performance in two Natural Language Inference (NLI) datasets.
Our results show that although the discrete prompt-based method remains relatively robust against perturbations to NLI inputs, they are highly sensitive to other types of perturbations such as shuffling and deletion of prompt tokens.
arXiv Detail & Related papers (2023-02-11T07:01:53Z) - CODA-Prompt: COntinual Decomposed Attention-based Prompting for
Rehearsal-Free Continual Learning [30.676509834338884]
Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data.
We propose prompting approaches as an alternative to data-rehearsal.
We show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy.
arXiv Detail & Related papers (2022-11-23T18:57:11Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.