Related papers: Meta-Prompt Optimization for LLM-Based Sequential Decision Making

Meta-Prompt Optimization for LLM-Based Sequential Decision Making

URL: http://arxiv.org/abs/2502.00728v1
Date: Sun, 02 Feb 2025 09:22:39 GMT
Title: Meta-Prompt Optimization for LLM-Based Sequential Decision Making
Authors: Mingze Kong, Zhiyong Wang, Yao Shu, Zhongxiang Dai,
Abstract summary: Large language models (LLMs) have been employed as agents to solve sequential decision-making tasks.<n>We propose our EXPonential-weight algorithm for prompt Optimization (EXPO) to automatically optimize the task description and meta-instruction in the meta-prompt.<n>We also extend EXPO to additionally optimize the exemplars in the meta-prompt to further enhance the performance.
Score: 24.050701239196876
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have recently been employed as agents to solve sequential decision-making tasks such as Bayesian optimization and multi-armed bandits (MAB). These works usually adopt an LLM for sequential action selection by providing it with a fixed, manually designed meta-prompt. However, numerous previous works have found that the prompt has a significant impact on the performance of the LLM, which calls for a method to automatically optimize the meta-prompt for LLM-based agents. Unfortunately, the non-stationarity in the reward observations during LLM-based sequential decision-making makes meta-prompt optimization highly challenging. To address this challenge, we draw inspirations from adversarial bandit algorithms, which are inherently capable of handling non-stationary reward observations. Building on this foundation, we propose our EXPonential-weight algorithm for prompt Optimization} (EXPO) to automatically optimize the task description and meta-instruction in the meta-prompt for LLM-based agents. We also extend EXPO to additionally optimize the exemplars (i.e., history of interactions) in the meta-prompt to further enhance the performance, hence introducing our EXPO-ES algorithm. We use extensive experiments to show that our algorithms significantly improve the performance of LLM-based sequential decision-making.

Related papers

Large Language Model-Enhanced Multi-Armed Bandits [43.34246396804588]
Large language models (LLMs) have been adopted to solve sequential decision-making tasks such as multi-armed bandits (MAB)<n>We propose an alternative approach which combines the strengths of classical MAB and LLMs.<n>We conduct empirical evaluations using both synthetic MAB tasks and experiments designed using real-world text datasets.
arXiv Detail & Related papers (2025-02-03T07:19:05Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP. We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z)
Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction [14.982446379660633]
Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performance. This paper proposes a two-stage multi-step method for LLM-based information extraction and adopts the RL framework to execute the multi-step planning.
arXiv Detail & Related papers (2024-06-17T12:11:01Z)
RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents [27.807695570974644]
We propose a novel method, textscRePrompt, which does agradient descent"-like approach to optimize the step-by-step instructions in the prompts given to LLM agents. By leveraging intermediate feedback, textscRePrompt can optimize the prompt without the need for a final solution checker.
arXiv Detail & Related papers (2024-06-17T01:23:11Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data. We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z)
Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge. We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z)
Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers [66.823588073584]
Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. We propose a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs.
arXiv Detail & Related papers (2023-10-02T02:01:16Z)
Large Language Models as Optimizers [106.52386531624532]
We propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as prompts. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values. We demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
arXiv Detail & Related papers (2023-09-07T00:07:15Z)
Robust Prompt Optimization for Large Language Models Against Distribution Shifts [80.6757997074956]
Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. We propose a new problem of robust prompt optimization for LLMs against distribution shifts. This problem requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group.
arXiv Detail & Related papers (2023-05-23T11:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.