Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL
- URL: http://arxiv.org/abs/2309.06553v4
- Date: Thu, 7 Mar 2024 16:12:21 GMT
- Title: Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL
- Authors: Hao Sun, Alihan H\"uy\"uk, Mihaela van der Schaar
- Abstract summary: We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
- Score: 62.824464372594576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we aim to enhance the arithmetic reasoning ability of Large
Language Models (LLMs) through zero-shot prompt optimization. We identify a
previously overlooked objective of query dependency in such optimization and
elucidate two ensuing challenges that impede the successful and economical
design of prompt optimization techniques. One primary issue is the absence of
an effective method to evaluate prompts during inference when the golden answer
is unavailable. Concurrently, learning via interactions with the LLMs to
navigate the expansive natural language prompting space proves to be
resource-intensive. To address this, we introduce Prompt-OIRL, which harnesses
offline inverse reinforcement learning to draw insights from offline prompting
demonstration data. Such data exists as by-products when diverse prompts are
benchmarked on open-accessible datasets. With Prompt-OIRL, the query-dependent
prompt optimization objective is achieved by first learning an offline reward
model. This model can evaluate any query-prompt pairs without accessing LLMs.
Subsequently, a best-of-N strategy is deployed to recommend the optimal prompt.
Our experimental evaluations across various LLM scales and arithmetic reasoning
datasets underscore both the efficacy and economic viability of the proposed
approach.
Related papers
- MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP.
We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z) - Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks.
This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs.
We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z) - PhaseEvo: Towards Unified In-Context Prompt Optimization for Large
Language Models [9.362082187605356]
We present PhaseEvo, an efficient automatic prompt optimization framework that combines the generative capability of LLMs with the global search proficiency of evolution algorithms.
PhaseEvo significantly outperforms the state-of-the-art baseline methods by a large margin whilst maintaining good efficiency.
arXiv Detail & Related papers (2024-02-17T17:47:10Z) - Adapting LLMs for Efficient, Personalized Information Retrieval: Methods
and Implications [0.7832189413179361]
Large Language Models (LLMs) excel in comprehending and generating human-like text.
This paper explores strategies for integrating Language Models (LLMs) with Information Retrieval (IR) systems.
arXiv Detail & Related papers (2023-11-21T02:01:01Z) - Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt
Generation for Few-shot Learning [14.200398093260118]
Prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts.
Existing continuous prompt optimization methods improve the performance by learning the ideal prompts.
By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DPO$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on four open-source datasets.
arXiv Detail & Related papers (2023-08-14T16:58:50Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.