Fugu-MT 論文翻訳(概要): SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

論文の概要: SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

arxiv url: http://arxiv.org/abs/2603.07853v1
Date: Mon, 09 Mar 2026 00:05:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.322843
Title: SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans
Title（参考訳）: SynPlanResearch-R1:Ecouraging Tool Exploration for Deep Research with Synthetic Plans
Authors: Hansi Zeng, Zoey Li, Yifan Gao, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, Jingbo Shang,
Abstract要約: リサーチエージェントは、ユーザークエリに回答するツールを使用して、モデルがWebから情報を集めることができる。エージェントは、未熟期の終了やツール使用の偏りなど、探索行動に乏しいことが多いことを観察する。より深い探索を促進するためのツール利用軌跡を合成するフレームワークであるSynPlanResearch-R1を提案する。
参考スコア（独自算出の注目度）: 65.19021035010059
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Research Agents enable models to gather information from the web using tools to answer user queries, requiring them to dynamically interleave internal reasoning with tool use. While such capabilities can in principle be learned via reinforcement learning with verifiable rewards (RLVR), we observe that agents often exhibit poor exploration behaviors, including premature termination and biased tool usage. As a result, RLVR alone yields limited improvements. We propose SynPlanResearch-R1, a framework that synthesizes tool-use trajectories that encourage deeper exploration to shape exploration during cold-start supervised fine-tuning, providing a strong initialization for subsequent RL. Across seven multi-hop and open-web benchmarks, \framework improves performance by up to 6.0% on Qwen3-8B and 5.8% on Qwen3-4B backbones respectively compared to SOTA baselines. Further analyses of tool-use patterns and training dynamics compared to baselines shed light on the factors underlying these gains. Our code is publicly available at https://github.com/HansiZeng/syn-plan-research.
Abstract（参考訳）: リサーチエージェントは、ユーザークエリに答えるためにツールを使用して、モデルがWebから情報を集めることができ、内部推論とツールの使用を動的にインターリーブする必要がある。このような能力は、原則として、検証可能な報酬(RLVR)による強化学習によって学習することができるが、エージェントは早期終了や偏りのあるツールの使用など、探索行動が不十分であることが多い。その結果、RLVRだけでは限定的な改善が得られます。ツール利用軌跡を合成し,冷間開始制御微調整中の形状探索の深層探索を促進するフレームワークであるSynPlanResearch-R1を提案する。 7つのマルチホップベンチマークとオープンウェブベンチマークで、Shaframeworkは、SOTAベースラインと比較して、Qwen3-8Bで最大6.0%、Qwen3-4Bバックボーンで最大5.8%パフォーマンスを改善している。ツール使用パターンとトレーニングのダイナミクスのさらなる分析は、ベースラインがこれらの利得の根底にある要因に光を当てているのと比較している。私たちのコードはhttps://github.com/HansiZeng/syn-plan-research.comで公開されています。

論文の概要: SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

関連論文リスト