Related papers: Graph-Enhanced Policy Optimization in LLM Agent Training

Graph-Enhanced Policy Optimization in LLM Agent Training

URL: http://arxiv.org/abs/2510.26270v1
Date: Thu, 30 Oct 2025 08:53:41 GMT
Title: Graph-Enhanced Policy Optimization in LLM Agent Training
Authors: Jiazhen Yuan, Wei Zhao, Zhengbiao Bai,
Abstract summary: Group based reinforcement learning (RL) has shown impressive results on complex reasoning and mathematical tasks.<n>Group based reinforcement learning (RL) has shown impressive results on complex reasoning and mathematical tasks.
Score: 3.177432419321498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Group based reinforcement learning (RL) has shown impressive results on complex reasoning and mathematical tasks. Yet, when applied to train multi-turn, interactive LLM agents, these methods often suffer from structural blindness-the inability to exploit the underlying connectivity of the environment. This manifests in three critical challenges: (1) inefficient, unguided exploration, (2) imprecise credit assignment due to overlooking pivotal states, and (3) myopic planning caused by static reward discounting. We address these issues with Graph-Enhanced Policy Optimization (GEPO), which dynamically constructs a state-transition graph from agent experience and employs graph-theoretic centrality to provide three synergistic learning signals: (1)structured intrinsic rewards that guide exploration toward high-impact states, (2) a graph-enhanced advantage function for topology-aware credit assignment, and (3) a dynamic discount factor adapted to each state's strategic value. On the ALFWorld, WebShop, and a proprietary Workbench benchmarks, GEPO demonstrates strong performance, achieving absolute success rate gains of +4.1%, +5.3%, and +10.9% over competitive baselines. These results highlight that explicitly modeling environmental structure is a robust, generalizable strategy for advancing LLM agent training.

Related papers

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z)
Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration [11.655381195889428]
We propose Deep GraphRAG, a framework designed for a balanced approach to hierarchical retrieval and adaptive integration.<n>It introduces a hierarchical global-to-local retrieval strategy that integrates macroscopic inter-community and microscopic intra-community contextual relations.<n>A beam search-optimized dynamic re-ranking module guides this process, continuously filtering candidates to balance efficiency and global comprehensiveness.
arXiv Detail & Related papers (2026-01-16T10:02:31Z)
Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making [24.534365665776672]
Large language model (LLM) agents often rely on external demonstrations or retrieval-augmented planning.<n>We propose DuSAR - a demonstration-free framework that enables a single frozen LLM to perform co-adaptive reasoning.<n>On ALFWorld and Mind2Web, DuSAR achieves state-of-the-art performance with open-source LLMs.
arXiv Detail & Related papers (2025-12-09T08:44:59Z)
Demystifying Reinforcement Learning in Agentic Reasoning [90.3737088727791]
We conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning.<n>We highlight our key insights: (i) replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT.<n> Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency.
arXiv Detail & Related papers (2025-10-13T17:57:15Z)
Agentic Reinforcement Learning with Implicit Step Rewards [92.26560379363492]
Large language models (LLMs) are increasingly developed as autonomous agents using reinforcement learning (agentic RL)<n>We introduce implicit step rewards for agentic RL (iStar), a general credit-assignment strategy that integrates seamlessly with standard RL algorithms.<n>We evaluate our method on three challenging agent benchmarks, including WebShop and VisualSokoban, as well as open-ended social interactions with unverifiable rewards in SOTOPIA.
arXiv Detail & Related papers (2025-09-23T16:15:42Z)
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning [56.496001894673235]
Reinforcement Learning (RL) has proven highly effective at enhancing the complex reasoning abilities of Large Language Models (LLMs)<n>Our analysis reveals that puzzling phenomena like aha moments", length-scaling'' and entropy dynamics are not disparate occurrences but hallmarks of an emergent reasoning hierarchy.
arXiv Detail & Related papers (2025-09-03T18:52:49Z)
Power Grid Control with Graph-Based Distributed Reinforcement Learning [60.49805771047161]
This work advances a graph-based distributed reinforcement learning framework for real-time, scalable grid management.<n>A Graph Neural Network (GNN) is employed to encode the network's topological information within the single low-level agent's observation.<n>Experiments on the Grid2Op simulation environment show the effectiveness of the approach.
arXiv Detail & Related papers (2025-09-02T22:17:25Z)
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment [24.296667264939515]
We propose GRAO (Group Relative Alignment Optimization), a unified framework that synergizes the strengths of SFT (supervised fine-tuning) and RL (reinforcement learning)<n>Our theoretical analysis establishes GRAO's convergence guarantees and sample efficiency advantages over conventional approaches.<n>This work provides both a theoretically grounded alignment framework and empirical evidence for efficient capability evolution in language models.
arXiv Detail & Related papers (2025-08-11T08:28:47Z)
Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z)
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning [58.73279333365234]
Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale graph reasoning abilities.<n>With RL on Erdos, G1 obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2.5-72B-Instruct (24x size)<n>Our findings offer an efficient, scalable path for building strong graph reasoners by finetuning LLMs with RL on graph-theoretic tasks.
arXiv Detail & Related papers (2025-05-24T04:33:41Z)
Training Large Language Models to Reason via EM Policy Gradient [0.27195102129094995]
We introduce an off-policy reinforcement learning algorithm, EM Policy Gradient, to enhance LLM reasoning.<n>We evaluate the effectiveness of EM Policy Gradient on the GSM8K and MATH (HARD) datasets.<n>Models fine-tuned with our method exhibit cognitive behaviors, such as sub-problem decomposition, self-verification, and backtracking.
arXiv Detail & Related papers (2025-04-24T01:31:05Z)
Self-supervised Learning of Dense Hierarchical Representations for Medical Image Segmentation [2.2265038612930663]
This paper demonstrates a self-supervised framework for learning voxel-wise coarse-to-fine representations tailored for dense downstream tasks. We devise a training strategy that balances the contributions of features from multiple scales, ensuring that the learned representations capture both coarse and fine-grained details.
arXiv Detail & Related papers (2024-01-12T09:47:17Z)
Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons [50.33260238739837]
Graph networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data.<n>There remains a lack of analysis comparing GNNs and generalizations from an optimization and generalization perspective.
arXiv Detail & Related papers (2023-06-24T10:21:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.