Fugu-MT 論文翻訳(概要): FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

論文の概要: FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

arxiv url: http://arxiv.org/abs/2510.03204v1
Date: Fri, 03 Oct 2025 17:41:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.516646
Title: FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents
Title（参考訳）: FocusAgent: Webエージェントの大規模コンテキストをトリミングするシンプルで効果的な方法
Authors: Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar, Xing Han Lù, Léo Boisvert, Massimo Caccia, Jérémy Espinas, Alexandre Aussem, Véronique Eglin, Alexandre Lacoste,
Abstract要約: 大規模言語モデル(LLM)を利用したWebエージェントは、ユーザの目標を達成するために、長いWebページの観察を処理しなければならない。既存のプルーニング戦略は、関連するコンテンツを捨てるか、無関係なコンテキストを保持するかのいずれかであり、最適以下の行動予測につながる。 FocusAgentは軽量LCMレトリバーを利用してアクセシビリティツリー(AxTree)観測から最も関連性の高い線を抽出するシンプルで効果的なアプローチである。
参考スコア（独自算出の注目度）: 76.12500510390439
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens of thousands of tokens. This saturates context limits and increases computational cost processing; moreover, processing full pages exposes agents to security risks such as prompt injection. Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction. We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations, guided by task goals. By pruning noisy and irrelevant content, FocusAgent enables efficient reasoning while reducing vulnerability to injection attacks. Experiments on WorkArena and WebArena benchmarks show that FocusAgent matches the performance of strong baselines, while reducing observation size by over 50%. Furthermore, a variant of FocusAgent significantly reduces the success rate of prompt-injection attacks, including banner and pop-up attacks, while maintaining task success performance in attack-free settings. Our results highlight that targeted LLM-based retrieval is a practical and robust strategy for building web agents that are efficient, effective, and secure.
Abstract（参考訳）: 大規模言語モデル(LLM)を利用したWebエージェントは、ユーザ目標を達成するために長いWebページ観察を処理しなければならない。これにより、コンテキスト制限が飽和し、計算コスト処理が増加する。さらに、フルページの処理は、プロンプトインジェクションのようなセキュリティリスクにエージェントを露出させる。既存のプルーニング戦略は、関連するコンテンツを捨てるか、無関係なコンテキストを保持するかのいずれかであり、最適以下の行動予測につながる。 FocusAgentは軽量のLCMレトリバーを利用して、アクセシビリティツリー(AxTree)観測から最も関連性の高い線を抽出する。 FocusAgentは、ノイズや無関係なコンテンツを刈り取ることによって、効果的な推論を可能にし、インジェクション攻撃に対する脆弱性を減らす。 WorkArenaとWebArenaベンチマークの実験では、FocusAgentは強いベースラインのパフォーマンスと一致し、観測サイズを50%以上削減している。さらに、FocusAgentの亜種は、バナーやポップアップアタックを含むプロンプトインジェクションアタックの成功率を著しく低減するとともに、アタックフリー環境でのタスク成功パフォーマンスを維持している。この結果から,LLMをベースとした検索は,効率的かつ効果的かつセキュアなWebエージェントを構築するための実用的で堅牢な戦略であることが示唆された。

論文の概要: FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

関連論文リスト