Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt
Generation for Few-shot Learning
- URL: http://arxiv.org/abs/2308.07272v2
- Date: Tue, 16 Jan 2024 03:22:15 GMT
- Title: Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt
Generation for Few-shot Learning
- Authors: Chengzhengxu Li, Xiaoming Liu, Yichen Wang, Duyi Li, Yu Lan, Chao Shen
- Abstract summary: Prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts.
Existing continuous prompt optimization methods improve the performance by learning the ideal prompts.
By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DPO$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on four open-source datasets.
- Score: 14.200398093260118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt-based pre-trained language models (PLMs) paradigm have succeeded
substantially in few-shot natural language processing (NLP) tasks. However,
prior discrete prompt optimization methods require expert knowledge to design
the base prompt set and identify high-quality prompts, which is costly,
inefficient, and subjective. Meanwhile, existing continuous prompt optimization
methods improve the performance by learning the ideal prompts through the
gradient information of PLMs, whose high computational cost, and low
readability and generalizability are often concerning. To address the research
gap, we propose a Dialogue-comprised Policy-gradient-based Discrete Prompt
Optimization ($DP_2O$) method. We first design a multi-round dialogue alignment
strategy for readability prompt set generation based on GPT-4. Furthermore, we
propose an efficient prompt screening metric to identify high-quality prompts
with linear complexity. Finally, we construct a reinforcement learning (RL)
framework based on policy gradients to match the prompts to inputs optimally.
By training a policy network with only 0.67% of the PLM parameter size on the
tasks in the few-shot setting, $DP_2O$ outperforms the state-of-the-art (SOTA)
method by 1.52% in accuracy on average on four open-source datasets. Moreover,
subsequent experiments also demonstrate that $DP_2O$ has good universality,
robustness, and generalization ability.
Related papers
- PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning [45.847259809950316]
We propose a novel method namely PromptIntern to internalize the prompt knowledge into model parameters via progressive fine-tuning.
Our method reduces inference tokens over 90%, speedups inference by 4.2 times, and saves 88.3% monetary cost.
arXiv Detail & Related papers (2024-07-02T12:21:14Z) - Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks.
This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs.
We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.