Related papers: Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

URL: http://arxiv.org/abs/2601.12349v1
Date: Sun, 18 Jan 2026 10:54:54 GMT
Title: Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
Authors: Yi Qian, Kunwei Qian, Xingbang He, Ligeng Chen, Jikang Zhang, Tiantai Zhang, Haiyang Wei, Linzhang Wang, Hao Wu, Bing Mao,
Abstract summary: Action Rebinding is a novel attack that allows a seemingly-benign app with zero dangerous permissions to rebind an agent's execution.<n>We weaponize the agent's task-recovery logic and Android's UI state preservation to orchestrate programmable, multi-step attack chains.<n>Our results demonstrate a 100% success rate for atomic action rebinding and the ability to reliably orchestrate multi-step attack chains.
Score: 6.9619059967556725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large multimodal model powered GUI agents are emerging as high-privilege operators on mobile platforms, entrusted with perceiving screen content and injecting inputs. However, their design operates under the implicit assumption of Visual Atomicity: that the UI state remains invariant between observation and action. We demonstrate that this assumption is fundamentally invalid in Android, creating a critical attack surface. We present Action Rebinding, a novel attack that allows a seemingly-benign app with zero dangerous permissions to rebind an agent's execution. By exploiting the inevitable observation-to-action gap inherent in the agent's reasoning pipeline, the attacker triggers foreground transitions to rebind the agent's planned action toward the target app. We weaponize the agent's task-recovery logic and Android's UI state preservation to orchestrate programmable, multi-step attack chains. Furthermore, we introduce an Intent Alignment Strategy (IAS) that manipulates the agent's reasoning process to rationalize UI states, enabling it to bypass verification gates (e.g., confirmation dialogs) that would otherwise be rejected. We evaluate Action Rebinding Attacks on six widely-used Android GUI agents across 15 tasks. Our results demonstrate a 100% success rate for atomic action rebinding and the ability to reliably orchestrate multi-step attack chains. With IAS, the success rate in bypassing verification gates increases (from 0% to up to 100%). Notably, the attacker application requires no sensitive permissions and contains no privileged API calls, achieving a 0% detection rate across malware scanners (e.g., VirusTotal). Our findings reveal a fundamental architectural flaw in current agent-OS integration and provide critical insights for the secure design of future agent systems. To access experimental logs and demonstration videos, please contact yi_qian@smail.nju.edu.cn.

Related papers

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z)
Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System [30.443894673057816]
We conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant.<n>We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution.<n>We propose Aura - a clean-slate secure agent OS.
arXiv Detail & Related papers (2026-02-11T14:52:27Z)
Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space [4.699272847316498]
Reasoning-Style Poisoning (RSP) manipulates how agents process information rather than what they process.<n>Generative Style Injection (GSI) rewrites retrieved documents into pathological tones.<n>RSP-M is a lightweight runtime monitor that calculates RSV metrics in real-time and triggers alerts when values exceed safety thresholds.
arXiv Detail & Related papers (2025-12-16T14:34:10Z)
Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents [29.62914440645731]
We present a one-shot jailbreak attack that leverages in-app prompt injections.<n> malicious apps embed short prompts in UI text that remain inert during human interaction but are revealed when an agent drives the UI via ADB.<n>Our framework comprises three crucial components: (1) low-privilege perception-chain targeting, which injects payloads into malicious apps as the agent's visual inputs; (2) user-invisible activation, a touch-based trigger that discriminates agent from human touches using physical touch attributes and exposes the payload only during agent operation; and (3) one-shot prompt efficacy, a stealthy-guided, character-level
arXiv Detail & Related papers (2025-10-09T05:34:57Z)
Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z)
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation [73.92237451442752]
This work reveals that the visual grounding of GUI agent-mapping textual plans to GUI elements can introduce vulnerabilities.<n>With backdoor attack targeting visual grounding, the agent's behavior can be compromised even when given correct task-solving plans.<n>We propose VisualTrap, a method that can hijack the grounding by misleading the agent to locate textual plans to trigger locations instead of the intended targets.
arXiv Detail & Related papers (2025-07-09T14:36:00Z)
Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z)
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents [19.348335171985152]
MLLM-powered GUI agents naturally expose multiple interaction-level triggers.<n>We introduce AgentGhost, an effective and stealthy framework for red-teaming backdoor attacks.<n>AgentGhost is effective and generic, with attack accuracy that reaches 99.7% on three attack objectives, and shows stealthiness with only 1% utility degradation.
arXiv Detail & Related papers (2025-05-20T14:29:18Z)
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z)
Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.