Fugu-MT 論文翻訳(概要): Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

論文の概要: Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

arxiv url: http://arxiv.org/abs/2601.12349v1
Date: Sun, 18 Jan 2026 10:54:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-21 22:47:22.589651
Title: Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
Title（参考訳）: ゼロパーミッション操作:大規模マルチモーダルモデルGUIエージェントを信頼できるか?
Authors: Yi Qian, Kunwei Qian, Xingbang He, Ligeng Chen, Jikang Zhang, Tiantai Zhang, Haiyang Wei, Linzhang Wang, Hao Wu, Bing Mao,
Abstract要約: アクションリバインド(Action Rebinding)は、エージェントの実行をリバインドする危険な権限をゼロにする、一見良心的なアプリを可能にする、新たな攻撃である。エージェントのタスク回復ロジックとAndroidのUI状態保存を武器化し、プログラム可能なマルチステップアタックチェーンを編成する。以上の結果から,原子間相互作用リバインディングの成功率は100%であり,マルチステップアタックチェーンを確実にオーケストレーションできることが示唆された。
参考スコア（独自算出の注目度）: 6.9619059967556725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large multimodal model powered GUI agents are emerging as high-privilege operators on mobile platforms, entrusted with perceiving screen content and injecting inputs. However, their design operates under the implicit assumption of Visual Atomicity: that the UI state remains invariant between observation and action. We demonstrate that this assumption is fundamentally invalid in Android, creating a critical attack surface. We present Action Rebinding, a novel attack that allows a seemingly-benign app with zero dangerous permissions to rebind an agent's execution. By exploiting the inevitable observation-to-action gap inherent in the agent's reasoning pipeline, the attacker triggers foreground transitions to rebind the agent's planned action toward the target app. We weaponize the agent's task-recovery logic and Android's UI state preservation to orchestrate programmable, multi-step attack chains. Furthermore, we introduce an Intent Alignment Strategy (IAS) that manipulates the agent's reasoning process to rationalize UI states, enabling it to bypass verification gates (e.g., confirmation dialogs) that would otherwise be rejected. We evaluate Action Rebinding Attacks on six widely-used Android GUI agents across 15 tasks. Our results demonstrate a 100% success rate for atomic action rebinding and the ability to reliably orchestrate multi-step attack chains. With IAS, the success rate in bypassing verification gates increases (from 0% to up to 100%). Notably, the attacker application requires no sensitive permissions and contains no privileged API calls, achieving a 0% detection rate across malware scanners (e.g., VirusTotal). Our findings reveal a fundamental architectural flaw in current agent-OS integration and provide critical insights for the secure design of future agent systems. To access experimental logs and demonstration videos, please contact yi_qian@smail.nju.edu.cn.
Abstract（参考訳）: 大規模なマルチモーダルモデルによるGUIエージェントは,画面内容の認識と入力の注入を頼りに,モバイルプラットフォーム上での高特権演算子として出現している。しかし、その設計は視覚的原子性(Visual Atomicity)という暗黙の仮定の下で動作し、UI状態は観察とアクションの間に不変である。この仮定はAndroidでは基本的に無効であり、クリティカルアタックサーフェスを生成します。我々は、エージェントの実行をリバインドする危険な許可をゼロにする、一見良心的なアプリを可能にする新しい攻撃であるAction Rebindingを紹介します。エージェントの推論パイプラインに固有の不可避な観察とアクションのギャップを利用することで、攻撃者は前景遷移をトリガーし、エージェントの計画された動作をターゲットアプリにリバインドする。エージェントのタスク回復ロジックとAndroidのUI状態保存を武器化し、プログラム可能なマルチステップアタックチェーンを編成する。さらに、エージェントの推論プロセスを操作してUI状態の合理化を行い、そうでなければ拒否される検証ゲート(例えば、確認ダイアログ)をバイパスするIAS(Intent Alignment Strategy)を導入する。 15タスクにわたる6つの広く使われているAndroid GUIエージェントに対するAction Rebinding Attacksの評価を行った。以上の結果から,原子間相互作用リバインディングの成功率は100%であり,マルチステップアタックチェーンを確実にオーケストレーションできることが示唆された。 IASでは、検証ゲートをバイパスする成功率は0%から100%に増加する。特に、攻撃者はセンシティブなパーミッションを必要とせず、特権的なAPI呼び出しも含んでおらず、マルウェアスキャナー(例: VirusTotal)間で0%の検知率を達成する。本研究は,現在のエージェントOS統合における基本的なアーキテクチャ上の欠陥を明らかにし,将来のエージェントシステムのセキュアな設計のための重要な洞察を提供するものである。実験ログやデモビデオにアクセスするには、yi_qian@smail.nju.edu.cnに連絡してください。

論文の概要: Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

関連論文リスト