Fugu-MT 論文翻訳(概要): Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

論文の概要: Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

arxiv url: http://arxiv.org/abs/2604.07831v1
Date: Thu, 09 Apr 2026 05:32:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.720007
Title: Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection
Title（参考訳）: GUIエージェントは十分にフォーカスされているか? セマンティックレベルのUI要素注入による自動ディストラクション
Authors: Wenkui Yang, Chao Jin, Haisu Zhu, Weilin Luo, Derek Yuen, Kun Shao, Huaibo Huang, Junxian Duan, Jie Cao, Ran He,
Abstract要約: 安全に整合した無害なUI要素をスクリーンショット上にオーバーレイしてエージェントの視覚的接地を誤指示する赤チーム設定を提案する。本手法では,モジュール型エディター-オーバーラッパー-ヴィクティムパイプラインと,複数の候補編集を抽出する反復探索手法を用いる。攻撃は攻撃成功率を最大4.4倍に向上させる。
参考スコア（独自算出の注目度）: 37.5890320718138
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing red-teaming studies on GUI agents have important limitations. Adversarial perturbations typically require white-box access, which is unavailable for commercial systems, while prompt injection is increasingly mitigated by stronger safety alignment. To study robustness under a more practical threat model, we propose Semantic-level UI Element Injection, a red-teaming setting that overlays safety-aligned and harmless UI elements onto screenshots to misdirect the agent's visual grounding. Our method uses a modular Editor-Overlapper-Victim pipeline and an iterative search procedure that samples multiple candidate edits, keeps the best cumulative overlay, and adapts future prompt strategies based on previous failures. Across five victim models, our optimized attacks improve attack success rate by up to 4.4x over random injection on the strongest victims. Moreover, elements optimized on one source model transfer effectively to other target models, indicating model-agnostic vulnerabilities. After the first successful attack, the victim still clicks the attacker-controlled element in more than 15% of later independent trials, versus below 1% for random injection, showing that the injected element acts as a persistent attractor rather than simple visual clutter.
Abstract（参考訳）: 既存のGUIエージェントの研究には重要な制限がある。敵の摂動は一般にホワイトボックスアクセスを必要とするが、商業システムでは利用できない。より実用的な脅威モデルの下でロバスト性を研究するために,安全で有害なUI要素をスクリーンショット上にオーバーレイし,エージェントの視覚的接地を誤指示する,セマンティックレベルのUI要素注入を提案する。提案手法では,複数の候補編集を抽出し,最高の累積オーバーレイを保ち,過去の故障に基づく今後のプロンプト戦略に適応する,モジュール型エディター・オバラッパー・ヴィクティムパイプラインと反復探索手法を用いる。 5つの犠牲者モデルで、我々の最適化された攻撃は、最も強い犠牲者に対するランダム注入よりも、攻撃成功率を最大4.4倍改善した。さらに、あるソースモデルに最適化された要素は、モデルに依存しない脆弱性を示す他のターゲットモデルに効果的に転送される。最初の攻撃が成功した後も、攻撃者がコントロールした要素を15%以上クリックし、ランダム注入では1%以下でクリックし、注入された要素が単純な視覚的乱雑ではなく永続的な引き金として機能することを示した。

論文の概要: Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

関連論文リスト