Related papers: GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning

GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning

URL: http://arxiv.org/abs/2411.14479v1
Date: Tue, 19 Nov 2024 10:52:25 GMT
Title: GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning
Authors: Yuze Liu, Tingjie Liu, Tiehua Zhang, Youhua Xia, Jinze Wang, Zhishu Shen, Jiong Jin, Fei Richard Yu,
Abstract summary: We propose a novel framework for prompt optimization for large language models (LLMs) GRL-Prompt aims to automatically construct optimal prompts via reinforcement learning (RL) in an end-to-end manner. Experiments show that GRL-Prompt outperforms recent state-of-the-art methods.
Score: 8.307785339429863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated impressive success in a wide range of natural language processing (NLP) tasks due to their extensive general knowledge of the world. Recent works discovered that the performance of LLMs is heavily dependent on the input prompt. However, prompt engineering is usually done manually in a trial-and-error fashion, which can be labor-intensive and challenging in order to find the optimal prompts. To address these problems and unleash the utmost potential of LLMs, we propose a novel LLMs-agnostic framework for prompt optimization, namely GRL-Prompt, which aims to automatically construct optimal prompts via reinforcement learning (RL) in an end-to-end manner. To provide structured action/state representation for optimizing prompts, we construct a knowledge graph (KG) that better encodes the correlation between the user query and candidate in-context examples. Furthermore, a policy network is formulated to generate the optimal action by selecting a set of in-context examples in a rewardable order to construct the prompt. Additionally, the embedding-based reward shaping is utilized to stabilize the RL training process. The experimental results show that GRL-Prompt outperforms recent state-of-the-art methods, achieving an average increase of 0.10 in ROUGE-1, 0.07 in ROUGE-2, 0.07 in ROUGE-L, and 0.05 in BLEU.

Related papers

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning [106.98018881499362]
We introduce GEPA (Genetic-Pareto), a prompt that thoroughly incorporates natural language to learn high-level rules from trial and error.<n>GEPA samples system-level trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems.<n>It can often turn even just a few rollouts into a large quality gain.
arXiv Detail & Related papers (2025-07-25T17:42:32Z)
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers [52.17222304851524]
We introduce GReaTer, a novel prompt optimization technique that directly incorporates gradient information over task-specific reasoning. By utilizing task loss gradients, GReaTer enables self-optimization of prompts for open-source, lightweight language models. GReaTer consistently outperforms previous state-of-the-art prompt optimization methods.
arXiv Detail & Related papers (2024-12-12T20:59:43Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Generation for Few-shot Learning [14.200398093260118]
Prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts. Existing continuous prompt optimization methods improve the performance by learning the ideal prompts. By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DPO$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on four open-source datasets.
arXiv Detail & Related papers (2023-08-14T16:58:50Z)
Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction. Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL) RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs) Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.