PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine
- URL: http://arxiv.org/abs/2308.12033v1
- Date: Wed, 23 Aug 2023 09:46:37 GMT
- Title: PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine
- Authors: Chenrui Zhang, Lin Liu, Jinpeng Wang, Chuyuan Wang, Xiao Sun, Hongyu
Wang, Mingchen Cai
- Abstract summary: We propose a simple, universal, and automatic method named PREFER to address the stated limitations.
Our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin.
- Score: 24.888093229577965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an effective tool for eliciting the power of Large Language Models (LLMs),
prompting has recently demonstrated unprecedented abilities across a variety of
complex tasks. To further improve the performance, prompt ensemble has
attracted substantial interest for tackling the hallucination and instability
of LLMs. However, existing methods usually adopt a two-stage paradigm, which
requires a pre-prepared set of prompts with substantial manual effort, and is
unable to perform directed optimization for different weak learners. In this
paper, we propose a simple, universal, and automatic method named PREFER (Pompt
Ensemble learning via Feedback-Reflect-Refine) to address the stated
limitations. Specifically, given the fact that weak learners are supposed to
focus on hard examples during boosting, PREFER builds a feedback mechanism for
reflecting on the inadequacies of existing weak learners. Based on this, the
LLM is required to automatically synthesize new prompts for iterative
refinement. Moreover, to enhance stability of the prompt effect evaluation, we
propose a novel prompt bagging method involving forward and backward thinking,
which is superior to majority voting and is beneficial for both feedback and
weight calculation in boosting. Extensive experiments demonstrate that our
PREFER achieves state-of-the-art performance in multiple types of tasks by a
significant margin. We have made our code publicly available.
Related papers
- Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs [30.333277284839053]
Large language models (LLMs) have shown success in generating high-quality responses.
Existing methods to enhance response quality often involve a prompt refinement model.
We introduce a self-instructed in-context learning framework that empowers LLMs to deliver more effective responses.
arXiv Detail & Related papers (2024-09-03T02:42:39Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL [29.01858866450715]
We present RLPrompt, which aims to find optimal prompt tokens leveraging soft Q-learning.
While the results show promise, we have observed that the prompts frequently appear unnatural, which impedes their interpretability.
We address this limitation by using sparse Tsallis entropy regularization, a principled approach to filtering out unlikely tokens from consideration.
arXiv Detail & Related papers (2024-07-20T03:10:19Z) - MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP.
We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z) - Enhancing Q-Learning with Large Language Model Heuristics [0.0]
Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations.
We propose textbfLLM-guided Q-learning, a framework that leverages LLMs as hallucinations to aid in learning the Q-function for reinforcement learning.
arXiv Detail & Related papers (2024-05-06T10:42:28Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - AutoHint: Automatic Prompt Optimization with Hint Generation [11.737818328656735]
This paper presents AutoHint, a novel framework for automatic prompt engineering and optimization for Large Language Models (LLM)
We propose a framework to inherit the merits of both in-context learning and zero-shot learning by incorporating enriched instructions derived from input-output demonstrations to optimize original prompt.
We refer to the enrichment as the hint and propose a framework to automatically generate the hint from labeled data.
arXiv Detail & Related papers (2023-07-13T00:49:27Z) - TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA)
In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge.
Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.