RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning
- URL: http://arxiv.org/abs/2205.12548v1
- Date: Wed, 25 May 2022 07:50:31 GMT
- Title: RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning
- Authors: Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo,
Tianmin Shu, Meng Song, Eric P. Xing, Zhiting Hu
- Abstract summary: This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
- Score: 84.75064077323098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompting has shown impressive success in enabling large pretrained language
models (LMs) to perform diverse NLP tasks, especially when only few downstream
data are available. Automatically finding the optimal prompt for each task,
however, is challenging. Most existing work resorts to tuning soft prompt
(e.g., embeddings) which falls short of interpretability, reusability across
LMs, and applicability when gradients are not accessible. Discrete prompt, on
the other hand, is difficult to optimize, and is often created by "enumeration
(e.g., paraphrasing)-then-selection" heuristics that do not explore the prompt
space systematically. This paper proposes RLPrompt, an efficient discrete
prompt optimization approach with reinforcement learning (RL). RLPrompt
formulates a parameter-efficient policy network that generates the desired
discrete prompt after training with reward. To overcome the complexity and
stochasticity of reward signals by the large LM environment, we incorporate
effective reward stabilization that substantially enhances the training
efficiency. RLPrompt is flexibly applicable to different types of LMs, such as
masked (e.g., BERT) and left-to-right models (e.g., GPTs), for both
classification and generation tasks. Experiments on few-shot classification and
unsupervised text style transfer show superior performance over a wide range of
existing finetuning or prompting methods. Interestingly, the resulting
optimized prompts are often ungrammatical gibberish text; and surprisingly,
those gibberish prompts are transferrable between different LMs to retain
significant performance, indicating LM prompting may not follow human language
patterns.
Related papers
- IPO: Interpretable Prompt Optimization for Vision-Language Models [40.83071220530289]
This paper introduces a simple but interpretable prompt (IPO)
IPO utilizes large language models (LLMs) to generate textual prompts dynamically.
We incorporate a large multimodal model (LMM) to condition on visual content by generating image descriptions.
arXiv Detail & Related papers (2024-10-20T14:10:22Z) - Learning from Contrastive Prompts: Automated Optimization and Adaptation [7.455360923031003]
We propose the Learning from Contrastive Prompts (LCP) framework to enhance prompt optimization and adaptation.
LCP employs contrastive learning to generate effective prompts by analyzing patterns in good and bad prompt examples.
Our evaluation on the Big-Bench Hard dataset shows that LCP has a win rate of over 76% over existing methods in prompt optimization.
arXiv Detail & Related papers (2024-09-23T16:47:23Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks.
This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs.
We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - PromptBoosting: Black-Box Text Classification with Ten Forward Passes [61.38341243907045]
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations.
Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
arXiv Detail & Related papers (2022-12-19T06:04:54Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.