Fugu-MT 論文翻訳(概要): Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

論文の概要: Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

arxiv url: http://arxiv.org/abs/2510.08276v1
Date: Thu, 09 Oct 2025 14:31:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:15.131441
Title: Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
Title（参考訳）: Beyond Turn Limits: 動的コンテキストウィンドウによるディープ検索エージェントのトレーニング
Authors: Qiaoyu Tang, Hao Xiang, Le Yu, Bowen Yu, Yaojie Lu, Xianpei Han, Le Sun, WenJuan Zhang, Pengbo Wang, Shixuan Liu, Zhenru Zhang, Jianhong Tu, Hongyu Lin, Junyang Lin,
Abstract要約: DeepMinerは、高機能なトレーニングタスクと動的コンテキストウィンドウを導入することで、そのような能力を引き出す新しいフレームワークである。 We developed DeepMiner-32B, which is a significant performance improvements across multiple search agent benchmarks。
参考スコア（独自算出の注目度）: 88.85901839023803
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent advances in reasoning models have demonstrated cognitive behaviors through reinforcement learning, existing approaches struggle to invoke deep reasoning capabilities in multi-turn agents with long-horizon interactions. We propose DeepMiner, a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window. DeepMiner presents a reverse construction method to generate complex but verifiable question-answer pairs from authentic web sources, which ensures the challenge and reliability of training data while injecting cognitive capabilities into multi-turn reasoning scenarios. We further design an elegant yet effective dynamic context management strategy for both training and inference, utilizing sliding window mechanisms while eliminating the dependency on external summarization models, thereby efficiently empowering the model to handle continuously expanding long-horizon contexts. Through reinforcement learning on Qwen3-32B, we develop DeepMiner-32B, which achieves substantial performance improvements across multiple search agent benchmarks. DeepMiner attains 33.5% accuracy on BrowseComp-en, surpassing the previous best open-source agent by almost 20 percentage points, and demonstrates consistent improvements on BrowseComp-zh, XBench-DeepSearch, and GAIA. Notably, our dynamic context management enables sustained interactions of nearly 100 turns within standard 32k context length, effectively addressing the context limitations that constrain existing multi-turn interaction systems.
Abstract（参考訳）: 近年の推論モデルの進歩は、強化学習を通じて認知行動を示すが、既存のアプローチは、長い水平相互作用を持つマルチターンエージェントの深い推論機能を実現するのに苦労している。本稿では,高難易度トレーニングタスクと動的コンテキストウィンドウを導入することで,このような能力を実現する新しいフレームワークであるDeepMinerを提案する。 DeepMinerは、認証されたWebソースから複雑だが検証可能な質問応答ペアを生成するためのリバースコンストラクション方法を提示し、マルチターン推論シナリオに認知機能を注入しながら、トレーニングデータのチャレンジと信頼性を保証する。さらに、外部要約モデルへの依存を排除しつつ、スライディングウインドウ機構を利用して、トレーニングと推論の両方のためのエレガントで効果的な動的コンテキスト管理戦略を設計し、拡張した長期的コンテキストを扱うためにモデルを効果的に活用する。 Qwen3-32Bの強化学習を通じて,複数のサーチエージェントベンチマークにおいて大幅な性能向上を実現するDeepMiner-32Bを開発した。 DeepMinerはBrowseComp-enで33.5%の精度を獲得し、BrowseComp-zh、XBench-DeepSearch、GAIAで一貫した改善を示している。特に、我々の動的コンテキスト管理は、標準32kコンテキスト長内における100回転近い持続的な相互作用を可能にし、既存のマルチターンインタラクションシステムを制約するコンテキスト制限に効果的に対処する。

論文の概要: Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

関連論文リスト