Fugu-MT 論文翻訳(概要): AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

論文の概要: AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

arxiv url: http://arxiv.org/abs/2604.27253v1
Date: Wed, 29 Apr 2026 22:57:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:53.834493
Title: AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling
Title（参考訳）: AutoSurfer - 総合的なサーフィン、学習、モデリングを通じてWebエージェントを教える
Authors: Fazle Elahi Faisal, Qianhui Wu, Baolin Peng, Jianfeng Gao,
Abstract要約: AutoSurferは,3つの重要なイノベーションを通じて制限に対処する,包括的なWebトラジェクタである。まずAutoSurferは、発見されたページとアクショントレースのキューを維持する、系統的な幅優先探索戦略を採用している。第2に、AutoSurferは探索軌道を利用してタスク合成を誘導し、実際のナビゲーションパスに複雑なタスクを接地することで幻覚を減らす。第3に、AutoSurferは、より正確で信頼性の高い軌道修正に向けてWebエージェントを操るヒントとして、同じ探索軌道を使用する。
参考スコア（独自算出の注目度）: 44.65915050312771
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website coverage due to homepage-based task proposals or random-walk exploration. Such methods often result in hallucinated or ambiguous task synthesis that lead to incomplete and unreliable trajectory generation. Here, we present AutoSurfer, a comprehensive web trajectory generator that addresses these limitations through three key innovations. First, AutoSurfer employs a systematic breadth-first exploration strategy that maintains a queue of discovered pages and action traces, propagates knowledge across pages to avoid redundant exploration, and recursively expands multi-level graphical user interface elements - closely resembling how a human would learn a new website. Second, AutoSurfer leverages the exploration trajectory to guide task synthesis, reducing hallucinations by grounding complex tasks in actual navigation paths rather than isolated actions or page content alone. Third, AutoSurfer uses the same exploration trajectory as hints to steer a web agent toward more accurate and reliable trajectory refinement. Together, these innovations enable AutoSurfer to comprehensively cover a website's action space and generate data suitable for training website-specific LLMs. We evaluate AutoSurfer on the WebArena benchmark by fine-tuning Qwen2.5-VL-7B-Instruct and demonstrate that it outperforms state-of-the-art methods - Explorer, OS-Genesis, and SynthAgent - achieving up to 24.23% overall task completion accuracy compared to 19.59% for the best prior method. Further, task diversity analysis demonstrates that AutoSurfer yields a more diverse distribution of synthesized tasks.
Abstract（参考訳）: マルチモーダル大規模言語モデル(LLM)の最近の進歩は、ウェブサイト上の複雑なタスクを自動化できるWebエージェントに革命をもたらした。しかし、その精度は、高品質なWeb軌道訓練データの不足によって制限されている。既存の自動軌道生成手法は、ホームページベースのタスク提案やランダムウォーク探索による不完全なWebサイトカバレッジに悩まされている。このような方法は、しばしば幻覚的または曖昧なタスク合成をもたらし、不完全で信頼性の低い軌道生成につながる。本稿では、3つの重要なイノベーションを通じてこれらの制限に対処する包括的WebトラジェクタであるAutoSurferを紹介する。まずAutoSurferは、発見されたページとアクショントレースのキューを維持し、冗長な探索を避けるためにページ間の知識を伝播し、複数のレベルのグラフィカルユーザインターフェース要素を再帰的に拡張する、体系的な幅優先探索戦略を採用している。第2に、AutoSurferは探索軌道を利用してタスク合成をガイドし、独立したアクションやページコンテンツではなく、実際のナビゲーションパスに複雑なタスクを接地することで幻覚を減らす。第3に、AutoSurferは、より正確で信頼性の高い軌道修正に向けてWebエージェントを操るヒントとして、同じ探索軌道を使用する。これらのイノベーションにより、AutoSurferはWebサイトのアクションスペースを包括的にカバーし、Webサイト固有のLLMのトレーニングに適したデータを生成することができる。 Instructing Qwen2.5-VL-7B-Instructing Qwen2.5-VL-7B-InstructによりWebArenaベンチマーク上でAutoSurferの評価を行い、最新手法であるエクスプローラー、OS-Genesis、SynthAgentよりも優れた性能を示し、最も優れた先行手法と比較して24.23%のタスク完了精度を実現した。さらに、タスクの多様性分析は、AutoSurferがより多様な合成タスクの分布をもたらすことを示す。

論文の概要: AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

関連論文リスト