Related papers: PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

URL: http://arxiv.org/abs/2310.16427v2
Date: Thu, 7 Dec 2023 14:39:22 GMT
Title: PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
Authors: Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu
Abstract summary: PromptAgent is an optimization method that crafts expert-level prompts equivalent in quality to those handcrafted by experts. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions. We apply PromptAgent to 12 tasks spanning three practical domains.
Score: 60.00631098364391
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.

Related papers

Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models [72.4723784999432]
Large Language Models (LLMs) perform best with well-crafted prompts, yet prompt engineering remains manual, inconsistent, and inaccessible to non-experts.<n>Promptomatix transforms natural language task descriptions into high-quality prompts without requiring manual tuning or domain expertise.<n>System analyzes user intent, generates synthetic training data, selects prompting strategies, and refines prompts using cost-aware objectives.
arXiv Detail & Related papers (2025-07-17T18:18:20Z)
Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search [30.988785260110248]
HiRA is a hierarchical framework that separates strategic planning from specialized execution.<n>Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities.<n> Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems.
arXiv Detail & Related papers (2025-07-03T14:18:08Z)
WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z)
Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings [0.9437165725355702]
We introduce DEEVO, a novel framework that guides prompt evolution through a debate-driven evaluation with an Elo-based selection.<n>Using Elo ratings as a fitness proxy, DEEVO simultaneously drives improvement and preserves valuable diversity in the prompt population.
arXiv Detail & Related papers (2025-05-30T19:33:41Z)
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems [1.1930434318557155]
We introduce HALO, a multi-agent collaboration framework based on a hierarchical reasoning architecture.<n>Specifically, we incorporate a high-level planning agent for task decomposition, mid-level role-design agents for subtask-specific agent instantiation, and low-level inference agents for subtask execution.<n>As the majority of users lack expertise in prompt engineering, we leverage an Adaptive Prompt Refinement module to transform raw queries into task-specific prompts.
arXiv Detail & Related papers (2025-05-17T04:14:03Z)
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization [30.748085697067154]
We propose a Multi-Agent framework incorporating Socratic guidance (MARS) MARS comprises seven agents, each with distinct functionalities, which autonomously use the Planner to devise an optimization path. We conduct extensive experiments on various datasets to validate the effectiveness of our method.
arXiv Detail & Related papers (2025-03-21T06:19:55Z)
Towards more Contextual Agents: An extractor-Generator Optimization Framework [0.0]
Large Language Model (LLM)-based agents have demonstrated remarkable success in solving complex tasks across a wide range of general-purpose applications. However, their performance often degrades in context-specific scenarios, such as specialized industries or research domains. To address this challenge, our work introduces a systematic approach to enhance the contextual adaptability of LLM-based agents.
arXiv Detail & Related papers (2025-02-18T15:07:06Z)
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts [54.11162991206203]
This paper consolidates diverse navigation tasks into a unified and generic framework. We propose a novel State-Adaptive Mixture of Experts (SAME) model that effectively enables an agent to infer decisions.
arXiv Detail & Related papers (2024-12-07T06:12:53Z)
Keeping Experts in the Loop: Expert-Guided Optimization for Clinical Data Classification using Large Language Models [0.5249805590164902]
StructEase is a novel framework that bridges the gap between automation and the input of human expertise in prompt engineering. A core innovation of the framework is SamplEase, an iterative sampling algorithm that identifies high-value cases where expert feedback drives significant performance improvements.
arXiv Detail & Related papers (2024-12-03T05:05:13Z)
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models [75.44218111729442]
We present Multi-expert Prompting, a novel enhancement of ExpertPrompting to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness.
arXiv Detail & Related papers (2024-11-01T10:06:52Z)
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts [22.500968440666398]
LangGPT is a structural prompt design framework. Minstrel is a multi-generative agent system with reflection to automate the generation of structural prompts.
arXiv Detail & Related papers (2024-09-20T12:30:03Z)
Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models [14.74868220560438]
We propose a fresh objective towards domain-generalizable prompts optimization named "Concentration" Our idea improves comparison prompt optimization methods by 1.42% for soft prompt generalization and 2.16% for hard prompt generalization in accuracy on the multi-source domain generalization setting.
arXiv Detail & Related papers (2024-06-15T10:02:46Z)
PromptWizard: Task-Aware Prompt Optimization Framework [2.618253052454435]
Large language models (LLMs) have transformed AI across diverse domains. Manual prompt engineering is both labor-intensive and domain-specific. We introduce PromptWizard, a novel, fully automated framework for discrete prompt optimization.
arXiv Detail & Related papers (2024-05-28T17:08:31Z)
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents [54.09074527006576]
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges. This inadequacy primarily stems from the lack of built-in action knowledge in language agents. We introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge.
arXiv Detail & Related papers (2024-03-05T16:39:12Z)
Improving Knowledge Extraction from LLMs for Task Learning through Agent Analysis [4.055489363682198]
Large language models (LLMs) offer significant promise as a knowledge source for task learning. Prompt engineering has been shown to be effective for eliciting knowledge from an LLM, but alone it is insufficient for acquiring relevant, situationally grounded knowledge for an embodied agent learning novel tasks. We describe a cognitive-agent approach, STARS, that extends and complements prompt engineering, mitigating its limitations and thus enabling an agent to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences.
arXiv Detail & Related papers (2023-06-11T20:50:14Z)
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts [93.58012324415762]
ExpertPrompting elicits the potential of large language models to answer as distinguished experts. We produce a new set of instruction-following data using GPT-3.5, and train a competitive open-source chat assistant called ExpertLLaMA.
arXiv Detail & Related papers (2023-05-24T03:51:31Z)
Bayesian Optimization Augmented with Actively Elicited Expert Knowledge [13.551210295284733]
We tackle the problem of incorporating expert knowledge into BO, with the goal of further accelerating the optimization. We design a multi-task learning architecture for this task, with the goal of jointly eliciting the expert knowledge and minimizing the objective function. Experiments on various benchmark functions with both simulated and actual human experts show that the proposed method significantly speeds up BO even when the expert knowledge is biased.
arXiv Detail & Related papers (2022-08-18T09:49:21Z)
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution [54.385344986265714]
We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions. We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
arXiv Detail & Related papers (2021-07-12T17:47:19Z)
Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions. We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.