Related papers: GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

URL: http://arxiv.org/abs/2510.25320v1
Date: Wed, 29 Oct 2025 09:35:55 GMT
Title: GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning
Authors: Jiaqi Wu, Qinlao Zhao, Zefeng Chen, Kai Qin, Yifei Zhao, Xueqian Wang, Yuhang Yao,
Abstract summary: Graph-based Agent Planning (GAP) is a novel framework that explicitly models inter-task dependencies through graph-based planning.<n>Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs.<n>This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy.
Score: 20.75113227786218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous agents powered by large language models (LLMs) have shown impressive capabilities in tool manipulation for complex task-solving. However, existing paradigms such as ReAct rely on sequential reasoning and execution, failing to exploit the inherent parallelism among independent sub-tasks. This sequential bottleneck leads to inefficient tool utilization and suboptimal performance in multi-step reasoning scenarios. We introduce Graph-based Agent Planning (GAP), a novel framework that explicitly models inter-task dependencies through graph-based planning to enable adaptive parallel and serial tool execution. Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs, autonomously determining which tools can be executed in parallel and which must follow sequential dependencies. This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy. To train GAP, we construct a high-quality dataset of graph-based planning traces derived from the Multi-Hop Question Answering (MHQA) benchmark. We employ a two-stage training strategy: supervised fine-tuning (SFT) on the curated dataset, followed by reinforcement learning (RL) with a correctness-based reward function on strategically sampled queries where tool-based reasoning provides maximum value. Experimental results on MHQA datasets demonstrate that GAP significantly outperforms traditional ReAct baselines, particularly on multi-step retrieval tasks, while achieving dramatic improvements in tool invocation efficiency through intelligent parallelization. The project page is available at: https://github.com/WJQ7777/Graph-Agent-Planning.

Related papers

Learning to Share: Selective Memory for Efficient Parallel Agentic Systems [49.78267008828593]
Agentic systems solve complex tasks by coordinating multiple agents that iteratively reason, invoke tools, and exchange intermediate results.<n>Recent approaches deploy multiple agent teams running in parallel to explore diverse reasoning trajectories.<n>We propose Learning to Share (LTS), a learned shared-memory mechanism for parallel agentic frameworks.
arXiv Detail & Related papers (2026-02-05T18:20:21Z)
PerfGuard: A Performance-Aware Agent for Visual Content Generation [53.591105729011595]
PerfGuard is a performance-aware agent framework for visual content generation.<n>It integrates tool performance boundaries into task planning and scheduling.<n>It has advantages in tool selection accuracy, execution reliability, and alignment with user intent.
arXiv Detail & Related papers (2026-01-30T05:12:19Z)
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z)
ML-Tool-Bench: Tool-Augmented Planning for ML Tasks [23.54937738755734]
We introduce a benchmark for evaluating tool-augmented machine learning agents.<n>Our benchmark goes beyond traditional tool-use evaluation by incorporating an in-memory named object management.<n>Our approach improves over ReAct by 16.52 percentile positions, taking the median across all Kaggle challenges.
arXiv Detail & Related papers (2025-11-29T23:59:40Z)
TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks [23.96822236741708]
Large language model (LLM) agents have exhibited strong problem-solving competence across domains like research and coding.<n>This paper introduces TPS-Bench to benchmark the ability of LLM agents in solving such problems that demand Tool Planning and Scheduling.
arXiv Detail & Related papers (2025-11-03T12:45:39Z)
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph [42.247964605609745]
Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference.<n>We formalize it as a multi-LLM collaboration graph, where nodes encode roles and model assignments, edges capture information flow.<n>We propose Agent-REINFORCE, an LLM-agent-augmented framework that mirrors the REINFORCE pipeline by mapping sampling-gradient-update to sampling-feedback-update.
arXiv Detail & Related papers (2025-10-29T22:14:25Z)
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution [48.7788770680643]
Flash-Searcher is a novel parallel agent reasoning framework.<n>It decomposes complex tasks into subtasks with explicit dependencies, enabling concurrent execution of independent reasoning paths.<n>It achieves 67.7% accuracy on BrowseComp and 83% on xbench-DeepSearch, while reducing agent execution steps by up to 35% compared to current frameworks.
arXiv Detail & Related papers (2025-09-29T17:39:30Z)
Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use [50.02614257515131]
Large language models (LLMs) have demonstrated strong capabilities in language understanding and reasoning.<n>We propose Tool-R1, a reinforcement learning framework that enables LLMs to perform general, compositional, and multi-step tool use.
arXiv Detail & Related papers (2025-09-16T09:22:21Z)
TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning [10.267950603662776]
TableMind is a tool-integrated table reasoning agent that autonomously performs multi-turn tool invocation, writes and executes code in a secure sandbox environment for data analysis and precise numerical reasoning.<n>To realize these capabilities, we adopt a two-stage fine-tuning paradigm built on top of a powerful pre-trained language model.
arXiv Detail & Related papers (2025-09-08T02:00:31Z)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z)
Improving Large Language Model Planning with Action Sequence Similarity [50.52049888490524]
In this work, we explore how to improve the model planning capability through in-context learning (ICL)<n>We propose GRASE-DC: a two-stage pipeline that first re-samples high AS exemplars and then curates the selected exemplars.<n>Our experimental result confirms that GRASE-DC achieves significant performance improvement on various planning tasks.
arXiv Detail & Related papers (2025-05-02T05:16:17Z)
Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning.<n>This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph.<n>The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z)
Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation [36.29566268457534]
We introduce a novel parallel tool invocation paradigm, DTA-Llama.<n>First, we transform traditional tree-based tool search paths into Directed Acyclic Graph (DAG) structure.<n>The DTA-Llama is then trained on the dataset to learn to iteratively divide the current task into several parallel tool invocation sub-tasks.
arXiv Detail & Related papers (2025-01-21T16:49:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.