Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt
Generation for Few-shot Learning
- URL: http://arxiv.org/abs/2308.07272v2
- Date: Tue, 16 Jan 2024 03:22:15 GMT
- Title: Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt
Generation for Few-shot Learning
- Authors: Chengzhengxu Li, Xiaoming Liu, Yichen Wang, Duyi Li, Yu Lan, Chao Shen
- Abstract summary: Prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts.
Existing continuous prompt optimization methods improve the performance by learning the ideal prompts.
By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DPO$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on four open-source datasets.
- Score: 14.200398093260118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt-based pre-trained language models (PLMs) paradigm have succeeded
substantially in few-shot natural language processing (NLP) tasks. However,
prior discrete prompt optimization methods require expert knowledge to design
the base prompt set and identify high-quality prompts, which is costly,
inefficient, and subjective. Meanwhile, existing continuous prompt optimization
methods improve the performance by learning the ideal prompts through the
gradient information of PLMs, whose high computational cost, and low
readability and generalizability are often concerning. To address the research
gap, we propose a Dialogue-comprised Policy-gradient-based Discrete Prompt
Optimization ($DP_2O$) method. We first design a multi-round dialogue alignment
strategy for readability prompt set generation based on GPT-4. Furthermore, we
propose an efficient prompt screening metric to identify high-quality prompts
with linear complexity. Finally, we construct a reinforcement learning (RL)
framework based on policy gradients to match the prompts to inputs optimally.
By training a policy network with only 0.67% of the PLM parameter size on the
tasks in the few-shot setting, $DP_2O$ outperforms the state-of-the-art (SOTA)
method by 1.52% in accuracy on average on four open-source datasets. Moreover,
subsequent experiments also demonstrate that $DP_2O$ has good universality,
robustness, and generalization ability.
Related papers
- GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning [8.307785339429863]
We propose a novel framework for prompt optimization for large language models (LLMs)
GRL-Prompt aims to automatically construct optimal prompts via reinforcement learning (RL) in an end-to-end manner.
Experiments show that GRL-Prompt outperforms recent state-of-the-art methods.
arXiv Detail & Related papers (2024-11-19T10:52:25Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks.
This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs.
We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.