Fugu-MT 論文翻訳(概要): Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning

論文の概要: Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning

arxiv url: http://arxiv.org/abs/2602.05829v1
Date: Thu, 05 Feb 2026 16:19:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:09.033877
Title: Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning
Title（参考訳）: Weaver:ビデオインターリーブ推論のためのエンドツーエンドエージェントシステムトレーニング
Authors: Yudi Shi, Shangzhe Di, Qirui Chen, Qinian Wang, Jiayin Cai, Xiaolong Jiang, Yao Hu, Weidi Xie,
Abstract要約: ウィーバー(Weaver)は、エンドツーエンドのトレーニング可能なマルチモーダル推論エージェントシステムである。 Weaverは複数の複雑なビデオ推論ベンチマークのパフォーマンスを向上させる。
参考スコア（独自算出の注目度）: 54.9540824532312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video reasoning constitutes a comprehensive assessment of a model's capabilities, as it demands robust perceptual and interpretive skills, thereby serving as a means to explore the boundaries of model performance. While recent research has leveraged text-centric Chain-of-Thought reasoning to augment these capabilities, such approaches frequently suffer from representational mismatch and restricted by limited perceptual acuity. To address these limitations, we propose Weaver, a novel, end-to-end trainable multimodal reasoning agentic system. Weaver empowers its policy model to dynamically invoke diverse tools throughout the reasoning process, enabling progressive acquisition of crucial visual cues and construction of authentic multimodal reasoning trajectories. Furthermore, we integrate a reinforcement learning algorithm to allow the system to freely explore strategies for employing and combining these tools with trajectory-free data. Extensive experiments demonstrate that our system, Weaver, enhances performance on several complex video reasoning benchmarks, particularly those involving long videos.
Abstract（参考訳）: ビデオ推論は、知覚力と解釈力の強いスキルを必要とするため、モデルの性能を包括的に評価し、モデル性能の境界を探索する手段として機能する。近年の研究では、テキスト中心のChain-of-Thought推論を利用してこれらの能力を増強しているが、このようなアプローチはしばしば表現ミスマッチに悩まされ、知覚力の制限によって制限される。これらの制約に対処するために、ウィーバー(Weaver)という、新しい、エンドツーエンドのトレーニング可能なマルチモーダル推論エージェントシステムを提案する。 Weaverはそのポリシーモデルに、推論プロセスを通じて多様なツールを動的に呼び出す権限を与え、重要な視覚的手がかりの段階的な獲得と、真のマルチモーダル推論軌道の構築を可能にする。さらに,これらのツールをトラジェクティブフリーなデータと併用するための戦略を,システムが自由に探求できるように強化学習アルゴリズムを統合する。我々のシステムであるWeaverは、いくつかの複雑なビデオ推論ベンチマーク、特に長いビデオに関する性能を向上させることを実証した。

論文の概要: Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning

関連論文リスト