Related papers: Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Generation for Few-shot Learning

Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Generation for Few-shot Learning

URL: http://arxiv.org/abs/2308.07272v2
Date: Tue, 16 Jan 2024 03:22:15 GMT
Title: Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Generation for Few-shot Learning
Authors: Chengzhengxu Li, Xiaoming Liu, Yichen Wang, Duyi Li, Yu Lan, Chao Shen
Abstract summary: Prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts. Existing continuous prompt optimization methods improve the performance by learning the ideal prompts. By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DPO$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on four open-source datasets.
Score: 14.200398093260118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt-based pre-trained language models (PLMs) paradigm have succeeded substantially in few-shot natural language processing (NLP) tasks. However, prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts, which is costly, inefficient, and subjective. Meanwhile, existing continuous prompt optimization methods improve the performance by learning the ideal prompts through the gradient information of PLMs, whose high computational cost, and low readability and generalizability are often concerning. To address the research gap, we propose a Dialogue-comprised Policy-gradient-based Discrete Prompt Optimization ($DP_2O$) method. We first design a multi-round dialogue alignment strategy for readability prompt set generation based on GPT-4. Furthermore, we propose an efficient prompt screening metric to identify high-quality prompts with linear complexity. Finally, we construct a reinforcement learning (RL) framework based on policy gradients to match the prompts to inputs optimally. By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DP_2O$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on average on four open-source datasets. Moreover, subsequent experiments also demonstrate that $DP_2O$ has good universality, robustness, and generalization ability.

Related papers

A Sequential Optimal Learning Approach to Automated Prompt Engineering in Large Language Models [14.483240353801074]
This paper proposes an optimal learning framework for automated prompt engineering. It is designed to sequentially identify effective prompt features while efficiently allocating a limited evaluation budget. Our framework provides a solution to deploying automated prompt engineering in a wider range applications.
arXiv Detail & Related papers (2025-01-07T03:51:10Z)
GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning [8.307785339429863]
We propose a novel framework for prompt optimization for large language models (LLMs) GRL-Prompt aims to automatically construct optimal prompts via reinforcement learning (RL) in an end-to-end manner. Experiments show that GRL-Prompt outperforms recent state-of-the-art methods.
arXiv Detail & Related papers (2024-11-19T10:52:25Z)
Enhancing LLM-Based Text Classification in Political Science: Automatic Prompt Optimization and Dynamic Exemplar Selection for Few-Shot Learning [1.6967824074619953]
Large language models (LLMs) offer substantial promise for text classification in political science. Our framework enhances LLM performance through automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism. An open-source Python package (PoliPrompt) is available on GitHub.
arXiv Detail & Related papers (2024-09-02T21:05:31Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs. We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)
Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts. IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z)
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL) RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs) Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs. Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings. In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.