Fugu-MT 論文翻訳(概要): Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution

論文の概要: Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution

arxiv url: http://arxiv.org/abs/2509.21072v1
Date: Thu, 25 Sep 2025 12:23:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-26 20:58:12.896935
Title: Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution
Title（参考訳）: Recon-Act: Web Reconnaissance, Tool Generation, Task Executionによる自己進化型マルチエージェントブラウザ利用システム
Authors: Kaiwen He, Zhiwei Wang, Chenyi Zhuang, Jinjie Gu,
Abstract要約: Recon-Actは、Reconnaissance-Actionの行動パラダイムに基づく、自己進化型のマルチエージェントフレームワークである。システムは偵察チームとアクションチームで構成される。 Recon-Actは、目に見えないWebサイトへの適応性と、長期的なタスクに対する解決可能性を大幅に改善する。
参考スコア（独自算出の注目度）: 24.71872444088982
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent years, multimodal models have made remarkable strides and pave the way for intelligent browser use agents. However, when solving tasks on real world webpages in multi-turn, long-horizon trajectories, current agents still suffer from disordered action sequencing and excessive trial and error during execution. This paper introduces Recon-Act, a self-evolving multi-agent framework grounded in Reconnaissance-Action behavioral paradigm. The system comprises a Reconnaissance Team and an Action Team: the former conducts comparative analysis and tool generation, while the latter handles intent decomposition, tool orchestration, and execution. By contrasting the erroneous trajectories with successful ones, the Reconnaissance Team infers remedies, and abstracts them into a unified notion of generalized tools, either expressed as hints or as rule-based codes, and register to the tool archive in real time. The Action Team reinference the process empowered with these targeting tools, thus establishing a closed-loop training pipeline of data-tools-action-feedback. Following the 6 level implementation roadmap proposed in this work, we have currently reached Level 3 (with limited human-in-the-loop intervention). Leveraging generalized tools obtained through reconnaissance, Recon-Act substantially improves adaptability to unseen websites and solvability on long-horizon tasks, and achieves state-of-the-art performance on the challenging VisualWebArena dataset.
Abstract（参考訳）: 近年、マルチモーダルモデルは目覚ましい進歩を遂げ、インテリジェントなブラウザ利用エージェントの道を開いた。しかし、マルチターン・ロングホライゾン軌道における現実世界のウェブページのタスクを解く際、現在のエージェントは乱れたアクションシーケンシングと過剰な試行錯誤に悩まされている。本稿では,Reconnaissance-Action行動パラダイムに基づく自己進化型マルチエージェントフレームワークRecon-Actを紹介する。システムはReconnaissance TeamとAction Teamで構成されており、前者は比較分析とツール生成を行い、後者は意図の分解、ツールオーケストレーション、実行を処理する。誤った軌道と成功した軌道とを対比することにより、リコネッサンスチームは治療を推測し、それらを一般化されたツールの統一概念に抽象化し、ヒントまたはルールベースのコードとして表現し、リアルタイムでツールアーカイブに登録する。 Action Teamはこのプロセスをこれらのターゲティングツールで強化し、データツール-アクションフィードバックのクローズドループトレーニングパイプラインを確立する。この作業で提案された6段階の実装ロードマップに従い、現在はレベル3に達しています。 Recon-Actは、偵察によって得られた一般化されたツールを活用することで、目に見えないWebサイトへの適応性と長期的なタスクの解決可能性を大幅に向上し、挑戦的なVisualWebArenaデータセット上で最先端のパフォーマンスを達成する。

論文の概要: Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution

関連論文リスト