Fugu-MT 論文翻訳(概要): GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

論文の概要: GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

arxiv url: http://arxiv.org/abs/2510.25320v1
Date: Wed, 29 Oct 2025 09:35:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-30 15:50:45.378608
Title: GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning
Title（参考訳）: GAP: 並列ツールと強化学習によるグラフベースエージェント計画
Authors: Jiaqi Wu, Qinlao Zhao, Zefeng Chen, Kai Qin, Yifei Zhao, Xueqian Wang, Yuhang Yao,
Abstract要約: グラフベースのエージェント計画(GAP)は、グラフベースの計画を通じてタスク間の依存関係を明示的にモデル化する新しいフレームワークである。我々のアプローチは、複雑なタスクを依存性を意識したサブタスクグラフに分解する基礎モデルを訓練する。この依存性を意識したオーケストレーションは、実行効率とタスクの正確性の両方で大幅に改善される。
参考スコア（独自算出の注目度）: 20.75113227786218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous agents powered by large language models (LLMs) have shown impressive capabilities in tool manipulation for complex task-solving. However, existing paradigms such as ReAct rely on sequential reasoning and execution, failing to exploit the inherent parallelism among independent sub-tasks. This sequential bottleneck leads to inefficient tool utilization and suboptimal performance in multi-step reasoning scenarios. We introduce Graph-based Agent Planning (GAP), a novel framework that explicitly models inter-task dependencies through graph-based planning to enable adaptive parallel and serial tool execution. Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs, autonomously determining which tools can be executed in parallel and which must follow sequential dependencies. This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy. To train GAP, we construct a high-quality dataset of graph-based planning traces derived from the Multi-Hop Question Answering (MHQA) benchmark. We employ a two-stage training strategy: supervised fine-tuning (SFT) on the curated dataset, followed by reinforcement learning (RL) with a correctness-based reward function on strategically sampled queries where tool-based reasoning provides maximum value. Experimental results on MHQA datasets demonstrate that GAP significantly outperforms traditional ReAct baselines, particularly on multi-step retrieval tasks, while achieving dramatic improvements in tool invocation efficiency through intelligent parallelization. The project page is available at: https://github.com/WJQ7777/Graph-Agent-Planning.
Abstract（参考訳）: 大規模言語モデル(LLM)を利用した自律エージェントは、複雑なタスク解決のためのツール操作において印象的な機能を示している。しかし、ReActのような既存のパラダイムはシーケンシャルな推論と実行に依存しており、独立したサブタスク間の固有の並列性を利用していない。このシーケンシャルなボトルネックは、多段階推論シナリオにおける非効率なツール利用と最適以下のパフォーマンスをもたらす。グラフベースの計画によってタスク間の依存関係を明示的にモデル化し、適応並列およびシリアルツールの実行を可能にする新しいフレームワークであるグラフベースのエージェント計画(GAP)を紹介した。我々のアプローチは、複雑なタスクを依存性を意識したサブタスクグラフに分解し、どのツールを並列に実行できるか、そしてシーケンシャルな依存関係に従わなければならないかを自律的に決定する基礎モデルを訓練する。この依存性を意識したオーケストレーションは、実行効率とタスクの正確性の両方で大幅に改善される。 GAPをトレーニングするために,Multi-Hop Question Answering (MHQA)ベンチマークから得られたグラフベースの計画トレースの高品質データセットを構築した。評価データセット上での教師付き微調整(SFT)と,ツールベースの推論が最大値を提供する戦略的サンプリングクエリに対して,正当性に基づく報酬関数を備えた強化学習(RL)という2段階のトレーニング戦略を採用する。 MHQAデータセットの実験結果は、GAPが従来のReActベースライン、特にマルチステップ検索タスクを大幅に上回る一方で、インテリジェント並列化によるツール呼び出し効率の劇的な改善を実現していることを示している。プロジェクトページは、https://github.com/WJQ7777/Graph-Agent-Planning.orgで公開されている。

論文の概要: GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

関連論文リスト