Fugu-MT 論文翻訳(概要): BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

論文の概要: BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

arxiv url: http://arxiv.org/abs/2510.10666v2
Date: Tue, 14 Oct 2025 08:54:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 14:23:56.907859
Title: BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
Title（参考訳）: BrowserAgent: ヒューマンインスパイアされたWebブラウジングアクションでWebエージェントを構築する
Authors: Tao Yu, Zhengbo Zhang, Zhiheng Lyu, Junhao Gong, Hongzhu Yi, Xinming Wang, Yuxuan Zhou, Jiabing Yang, Ping Nie, Yan Huang, Wenhu Chen,
Abstract要約: BrowserAgentは、事前に定義されたブラウザアクションのセットを通じて、Playwright経由で生のWebページで直接動作する。ステップ間で重要な結論を格納するための明示的なメモリ機構を導入し、モデルの推論能力をさらに強化する。
参考スコア（独自算出の注目度）: 48.194688161526756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficiently solving real-world problems with LLMs increasingly hinges on their ability to interact with dynamic web environments and autonomously acquire external information. While recent research like Search-R1 and WebDancer demonstrates strong performance in solving web tasks, they heavily rely on additional tools to convert the interactive web environment into static text content. This is in contrast to human browsing behaviors, which involve diverse interactions with the browser, such as scrolling, clicking, and typing. In this paper, we propose BrowserAgent, a more interactive agent that solves complex tasks through human-inspired browser actions. BrowserAgent operates directly on raw web pages via Playwright through a set of predefined browser actions. We adopt a two-stage training (Supervised Fine-Tuning (SFT) and Rejection Fine-Tuning (RFT)) to improve the model's generalization abilities. Despite using significantly less training data than Search-R1, BrowserAgent achieves more competitive results across different Open-QA tasks. Additionally, we introduce an explicit memory mechanism to store key conclusions across steps, further enhancing the model's reasoning capabilities for long-horizon tasks. Notably, BrowserAgent-7B can achieve around 20\% improvement over Search-R1 on multi-hop QA tasks like HotpotQA, 2Wiki, and Bamboogle. These results indicate that BrowserAgent can serve as a more advanced framework for more interactive and scalable web agents.
Abstract（参考訳）: LLMによる現実世界の問題解決は、動的Web環境と対話し、外部情報を自律的に取得する能力にますます依存している。 Search-R1やWebDancerといった最近の研究は、Webタスクの解決における強力なパフォーマンスを示しているが、インタラクティブなWeb環境を静的なテキストコンテンツに変換するツールに大きく依存している。これは、スクロール、クリック、タイピングなど、ブラウザとの多様なインタラクションを含む人間のブラウジング動作とは対照的である。本稿では,よりインタラクティブなエージェントであるBrowserAgentを提案する。 BrowserAgentは、事前に定義されたブラウザアクションのセットを通じて、Playwright経由で生のWebページで直接動作する。我々は,モデルの一般化能力を向上させるために,2段階のトレーニング(SFT(Supervised Fine-Tuning)とRFT(Rejection Fine-Tuning))を採用する。 Search-R1よりもトレーニングデータが少ないにもかかわらず、BrowserAgentは異なるOpen-QAタスク間でより競争力のある結果を得る。さらに、ステップ間で重要な結論を格納するための明示的なメモリ機構を導入し、長い水平タスクに対するモデルの推論能力をさらに強化する。特に、BrowserAgent-7Bは、HotpotQA、2Wiki、BamboogleといったマルチホップQAタスクにおいて、Search-R1よりも約20%改善できる。これらの結果は、よりインタラクティブでスケーラブルなWebエージェントのための、より高度なフレームワークとして、BrowserAgentが機能することを示している。

論文の概要: BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

関連論文リスト