Fugu-MT 論文翻訳(概要): TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG

論文の概要: TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG

arxiv url: http://arxiv.org/abs/2601.06922v1
Date: Sun, 11 Jan 2026 14:07:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-13 19:08:01.069167
Title: TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG
Title（参考訳）: TreePS-RAG: エージェントRAGにおける強化学習のためのツリーベースプロセススーパービジョン
Authors: Tianhua Zhang, Kun Li, Junan Li, Yunxiang Li, Hongyin Luo, Xixin Wu, James Glass, Helen Meng,
Abstract要約: エージェント検索強化生成(RAG)は、推論と情報検索の多段階的な相互作用として質問応答を定式化する。エージェントRAGのためのオンラインツリーベースRLフレームワークであるTreePS-RAGについて述べる。
参考スコア（独自算出の注目度）: 71.06073770344732
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval, and has recently been advanced by reinforcement learning (RL) with outcome-based supervision. While effective, relying solely on sparse final rewards limits step-wise credit assignment and provides weak guidance for intermediate reasoning and actions. Recent efforts explore process-level supervision, but typically depend on offline constructed training data, which risks distribution shift, or require costly intermediate annotations. We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining standard outcome-only rewards. Our key insight is to model agentic RAG reasoning as a rollout tree, where each reasoning step naturally maps to a node. This tree structure allows step utility to be estimated via Monte Carlo estimation over its descendant outcomes, yielding fine-grained process advantages without requiring intermediate labels. To make this paradigm practical, we introduce an efficient online tree construction strategy that preserves exploration diversity under a constrained computational budget. With a rollout cost comparable to strong baselines like Search-R1, experiments on seven multi-hop and general QA benchmarks across multiple model scales show that TreePS-RAG consistently and significantly outperforms both outcome-supervised and leading process-supervised RL methods.
Abstract（参考訳）: エージェント検索強化生成(RAG)は、推論と情報検索の多段階的な相互作用として質問応答を定式化し、近年は強化学習(RL)と結果に基づく監視によって進歩している。効果はあるが、粗末な最終報酬にのみ依存することは、段階的なクレジット割り当てを制限し、中間的推論と行動のための弱いガイダンスを提供する。近年の取り組みではプロセスレベルの監視について検討されているが、通常はオフラインで構築されたトレーニングデータに依存している。 TreePS-RAGはエージェントRAGのためのオンラインのツリーベースRLフレームワークで、標準的な結果のみの報酬を維持しながら段階的なクレジット割り当てを可能にする。我々の重要な洞察は、エージェントRAG推論をロールアウトツリーとしてモデル化することであり、各推論ステップが自然にノードにマップされる。この木構造により、ステップユーティリティはモンテカルロの推定により、その子孫の成果を推定することができ、中間ラベルを必要とせずに、きめ細かいプロセスの利点を得ることができる。このパラダイムを実用的なものにするために,制約された計算予算の下で探索の多様性を維持する効率的なオンラインツリー構築戦略を導入する。 Search-R1のような強力なベースラインに匹敵するロールアウトコストで、複数のモデルスケールにわたる7つのマルチホップおよび一般的なQAベンチマークの実験により、TreePS-RAGは、結果管理とプロセス管理のRLメソッドの両方において、一貫して、大幅にパフォーマンスが向上していることが示された。

論文の概要: TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG

関連論文リスト