PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization
- URL: http://arxiv.org/abs/2506.01475v1
- Date: Mon, 02 Jun 2025 09:35:07 GMT
- Title: PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization
- Authors: Zouying Cao, Runze Wang, Yifei Yang, Xinbei Ma, Xiaoyong Zhu, Bo Zheng, Hai Zhao,
- Abstract summary: We propose a pseudocode-style Planning Guided Preference Optimization method called PGPO for effective agent learning.<n>With two planning-oriented rewards, PGPO further enhances LLM agents' ability to generate high-quality P-code Plans.<n>Experiments show that PGPO achieves superior performance on representative agent benchmarks and outperforms the current leading baselines.
- Score: 58.465778756331574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model (LLM) agents have demonstrated impressive capabilities in handling complex interactive problems. Existing LLM agents mainly generate natural language plans to guide reasoning, which is verbose and inefficient. NL plans are also tailored to specific tasks and restrict agents' ability to generalize across similar tasks. To this end, we explore pseudocode-style plans (P-code Plan) to capture the structural logic of reasoning. We find that P-code Plan empowers LLM agents with stronger generalization ability and more efficiency. Inspired by this finding, we propose a pseudocode-style Planning Guided Preference Optimization method called PGPO for effective agent learning. With two planning-oriented rewards, PGPO further enhances LLM agents' ability to generate high-quality P-code Plans and subsequent reasoning. Experiments show that PGPO achieves superior performance on representative agent benchmarks and outperforms the current leading baselines. Analyses reveal the advantage of PGPO in reducing action errors and omissions during reasoning.
Related papers
- Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models [63.765846080050906]
This paper proposes a novel parameter-efficient action planner using large language models (PEAP-LLM) to generate a single-step instruction at each location.<n>Experiments show the superiority of our proposed model on REVERIE compared to the previous state-of-the-art.
arXiv Detail & Related papers (2025-05-12T12:38:20Z) - MPO: Boosting LLM Agents with Meta Plan Optimization [37.35230659116656]
Large language models (LLMs) have enabled agents to successfully tackle interactive planning tasks.<n>Existing approaches often suffer from planning hallucinations and require retraining for each new agent.<n>We propose the Meta Plan Optimization framework, which enhances agent planning capabilities by directly incorporating explicit guidance.
arXiv Detail & Related papers (2025-03-04T14:54:45Z) - Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting [4.500734889060007]
This paper addresses the synergy between large language models (LLMs) and task planning systems.<n>We propose Vote-Tree-Planner to minimize redundancy while enhancing planning effectiveness.
arXiv Detail & Related papers (2025-02-13T20:08:06Z) - DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents [2.1438108757511958]
We propose a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility.<n>Experiments in 25-room navigation environments demonstrate $100%$ success rate.<n>The method also generalizes to momentum-based control tasks and requires only $log N$ steps for replanning.
arXiv Detail & Related papers (2025-02-04T03:05:55Z) - Aligning CodeLLMs with Direct Preference Optimization [44.34483822102872]
This work first identifies that the commonly used PPO algorithm may be suboptimal for the alignment of CodeLLM.
Based on only preference data pairs, DPO can render the model rank data automatically, giving rise to a fine-grained rewarding pattern.
Studies show that our method significantly improves the performance of existing CodeLLMs on benchmarks such as MBPP and HumanEval.
arXiv Detail & Related papers (2024-10-24T09:36:13Z) - Non-myopic Generation of Language Models for Reasoning and Planning [45.75146679449453]
This paper proposes a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy.
Our experiments show significant improvements in a wide range of tasks for math, coding, and agents.
arXiv Detail & Related papers (2024-10-22T17:13:38Z) - AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation [81.32722475387364]
Large Language Model-based agents have garnered significant attention and are becoming increasingly popular.<n>Planning ability is a crucial component of an LLM-based agent, which generally entails achieving a desired goal from an initial state.<n>Recent studies have demonstrated that utilizing expert-level trajectory for instruction-tuning LLMs effectively enhances their planning capabilities.
arXiv Detail & Related papers (2024-08-01T17:59:46Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents [52.34892973785117]
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges.<n>This inadequacy primarily stems from the lack of built-in action knowledge in language agents.<n>We introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge.
arXiv Detail & Related papers (2024-03-05T16:39:12Z) - Consolidating Trees of Robotic Plans Generated Using Large Language
Models to Improve Reliability [6.4111574364474215]
The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability.
This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios.
arXiv Detail & Related papers (2024-01-15T18:01:59Z) - Secrets of RLHF in Large Language Models Part I: PPO [81.01936993929127]
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence.
reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit.
In this report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training.
arXiv Detail & Related papers (2023-07-11T01:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.