Fugu-MT 論文翻訳(概要): Realistic Environmental Injection Attacks on GUI Agents

論文の概要: Realistic Environmental Injection Attacks on GUI Agents

arxiv url: http://arxiv.org/abs/2509.11250v1
Date: Sun, 14 Sep 2025 12:47:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:22.995692
Title: Realistic Environmental Injection Attacks on GUI Agents
Title（参考訳）: GUIエージェントへの現実的な環境注入攻撃
Authors: Yitong Zhang, Ximo Li, Liyi Cai, Jia Li,
Abstract要約: LVLM上に構築されたGUIエージェントは、ウェブサイトとの対話にますます利用されている。オープンワールドコンテンツへの曝露により、環境注入攻撃(EIA)に弱い。 2つの主要な新規性を持つ攻撃フレームワークであるChameleonを提案する。
参考スコア（独自算出の注目度）: 6.38492008798679
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: GUI agents built on LVLMs are increasingly used to interact with websites. However, their exposure to open-world content makes them vulnerable to Environmental Injection Attacks (EIAs) that hijack agent behavior via webpage elements. Many recent studies assume the attacker to be a regular user who can only upload a single trigger image, which is more realistic than earlier assumptions of website-level administrative control. However, these works still fall short of realism: (1) the trigger's position and surrounding context remain largely fixed between training and testing, failing to capture the dynamic nature of real webpages and (2) the trigger often occupies an unrealistically large area, whereas real-world images are typically small. To better reflect real-world scenarios, we introduce a more realistic threat model where the attacker is a regular user and the trigger image is small and embedded within a dynamically changing environment. As a result, existing attacks prove largely ineffective under this threat model. To better expose the vulnerabilities of GUI agents, we propose Chameleon, an attack framework with two main novelties. The first is LLM-Driven Environment Simulation, which automatically generates diverse and high-fidelity webpage simulations. The second is Attention Black Hole, which transforms attention weights into explicit supervisory signals that guide the agent's focus toward the trigger region. We evaluate Chameleon on 6 realistic websites and 4 representative LVLM-powered GUI agents, where it significantly outperforms existing methods. Ablation studies confirm that both novelties are critical to performance. Our findings reveal underexplored vulnerabilities in modern GUI agents and establish a robust foundation for future research on defense in open-world GUI agent systems. The code is publicly available at https://github.com/zhangyitonggg/attack2gui.
Abstract（参考訳）: LVLM上に構築されたGUIエージェントは、ウェブサイトとの対話にますます利用されている。しかし、オープンワールドコンテンツへの露出は、Webページ要素を介してエージェントの動作をハイジャックする環境注入攻撃(EIA)に対して脆弱である。近年の多くの研究では、攻撃者は単一のトリガー画像のみをアップロードできる通常のユーザーであると仮定しており、これはウェブサイトレベルの管理管理の前提よりも現実的である。トリガーの位置と周囲のコンテキストはトレーニングとテストの間に大きく固定されており、実際のWebページの動的な性質を捉えていない。現実のシナリオをよりよく反映するために,攻撃者が通常のユーザであり,トリガ画像が小さく,動的に変化する環境に埋め込まれた,より現実的な脅威モデルを導入する。その結果、既存の攻撃はこの脅威モデルではほとんど効果がないことが判明した。 GUIエージェントの脆弱性を明らかにするために、2つの主要な新機能を持つ攻撃フレームワークであるChameleonを提案する。 1つ目は LLM-Driven Environment Simulation で、多種多様な高忠実なWebページシミュレーションを自動的に生成する。 2つ目は注意孔で、注意重みを明示的な監視信号に変換し、エージェントのトリガー領域への焦点を誘導する。現実的な6つのウェブサイトと4つの代表的なLVLMGUIエージェント上でChameleonを評価し、既存の手法よりも優れています。アブレーション研究は、両方の新奇性がパフォーマンスに重要なことを証明している。本研究は, 現代のGUIエージェントの脆弱性を解明し, オープンワールドGUIエージェントシステムにおける防衛研究の強固な基盤を確立することを目的としたものである。コードはhttps://github.com/zhangyitonggg/ attack2gui.comで公開されている。

論文の概要: Realistic Environmental Injection Attacks on GUI Agents

関連論文リスト