Code Driven Planning with Domain-Adaptive Critic
- URL: http://arxiv.org/abs/2509.19077v1
- Date: Tue, 23 Sep 2025 14:36:12 GMT
- Title: Code Driven Planning with Domain-Adaptive Critic
- Authors: Zikang Tian, Shaohui Peng, Du Huang, Jiaming Guo, Ruizhi Chen, Rui Zhang, Xishan Zhang, Yuxuan Guo, Zidong Du, Qi Guo, Ling Li, Yewen Pu, Xing Hu, Yunji Chen,
- Abstract summary: We propose Code Driven Planning with Domain-Adaptive Critic (CoPiC)<n>Instead of relying on frequent queries, CoPiC employs LLMs to generate a diverse set of high-level planning programs.<n>A trained domain-adaptive critic then evaluates these candidates and selects the one most aligned with long-term rewards for execution.
- Score: 41.04089289445378
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have been widely adopted as task planners for AI agents in sequential decision-making problems, leveraging their extensive world knowledge. However, the gap between their general knowledge and environment-specific requirements often leads to inaccurate plans. To address this, existing approaches rely on frequent LLM queries to iteratively refine plans based on immediate environmental feedback, which incurs substantial query costs. However, this refinement is typically guided by short-term environmental feedback, limiting LLMs from developing plans aligned with long-term rewards. We propose Code Driven Planning with Domain-Adaptive Critic (CoPiC). Instead of relying on frequent queries, CoPiC employs LLMs to generate a diverse set of high-level planning programs, which iteratively produce and refine candidate plans. A trained domain-adaptive critic then evaluates these candidates and selects the one most aligned with long-term rewards for execution. Using high-level planning programs as planner and domain-adaptive critic as estimator, CoPiC improves planning while significantly reducing query costs. Results in ALFWorld, NetHack, and StarCraft II Unit Building show that CoPiC outperforms advanced LLM-based baselines, AdaPlanner and Reflexion, achieving an average (1) 23.33% improvement in success rate and (2) 91.27% reduction in query costs.
Related papers
- DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints [25.987776928014707]
We introduce DeepPlanning, a benchmark for practical long-horizon agent planning.<n>It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization.
arXiv Detail & Related papers (2026-01-26T04:43:49Z) - CRISP: Complex Reasoning with Interpretable Step-based Plans [15.656686375199921]
We introduce CRISP (Complex Reasoning with Interpretable Step-based Plans), a dataset of high-level plans for mathematical reasoning and code generation.<n>We demonstrate that fine-tuning a small model on CRISP enables it to generate higher-quality plans than much larger models using few-shot prompting.
arXiv Detail & Related papers (2025-07-09T11:40:24Z) - PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization [58.465778756331574]
We propose a pseudocode-style Planning Guided Preference Optimization method called PGPO for effective agent learning.<n>With two planning-oriented rewards, PGPO further enhances LLM agents' ability to generate high-quality P-code Plans.<n>Experiments show that PGPO achieves superior performance on representative agent benchmarks and outperforms the current leading baselines.
arXiv Detail & Related papers (2025-06-02T09:35:07Z) - Circinus: Efficient Query Planner for Compound ML Serving [3.6295638972280733]
This paper presents Circinus, an SLO-aware query planner for large-scale compound AI workloads.<n>By exploiting plan similarities within and across queries, Circinus significantly reduces search steps.<n> Evaluations show that Circinus improves service goodput by 3.2-5.0$times$, accelerates query planning by 4.2-5.8$times$, achieving query response in seconds.
arXiv Detail & Related papers (2025-04-23T03:57:24Z) - Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code [8.971234046933349]
Large language models (LLMs) fail to plan reliably, even when prompted with a detailed definition of a planning task.<n>We show how to use LLMs to generate correct plans, even for out-of-distribution tasks of increasing size.
arXiv Detail & Related papers (2025-03-24T15:50:20Z) - Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming [13.246017517159043]
Large language models (LLMs) have recently demonstrated strong potential in solving planning problems.<n>We propose LLpreview, a framework that leverages LLMs to capture key information from planning problems and formally formulate and solve them as optimization problems from scratch.<n>We apply LLpreview to 9 planning problems, ranging from multi-constraint decision making to multi-step planning problems, and demonstrate that LL achieves on average 83.7% and 86.8% optimal rate across 9 tasks for GPTo and Claude 3.5 Sonnet.
arXiv Detail & Related papers (2024-10-15T23:20:54Z) - VinePPO: Refining Credit Assignment in RL Training of LLMs [66.80143024475635]
We propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates.<n>Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability [59.72892401927283]
We evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks.
Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints.
arXiv Detail & Related papers (2024-09-30T03:58:43Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks.
We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.
To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.