Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments
- URL: http://arxiv.org/abs/2506.13205v4
- Date: Wed, 02 Jul 2025 06:08:03 GMT
- Title: Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments
- Authors: Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang, Xitong Gao,
- Abstract summary: We present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon vision-language models (VLMs)<n>Our method manipulates only the visual inputs of a portion of the training samples without altering their corresponding labels or instructions.<n>We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use.
- Score: 61.808686396077036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon VLMs. Our method manipulates only the visual inputs of a portion of the training samples - without altering their corresponding labels or instructions - thereby injecting malicious behaviors into the model. Once fine-tuned with this tampered data, the agent will exhibit attacker-controlled responses when a specific visual trigger is introduced at inference time. The core of our approach lies in aligning the gradients of poisoned samples with those of a chosen target instance, embedding backdoor-relevant features into the poisoned training data. To maintain stealth and enhance robustness, we develop three realistic visual triggers: static visual patches, dynamic motion cues, and subtle low-opacity overlays. We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use. Results show that our attack achieves high attack success rates (up to 94.67 percent) while maintaining high clean-task performance (FSR up to 95.85 percent). Additionally, ablation studies shed light on how various design choices affect the efficacy and concealment of the attack. Overall, this work is the first to expose critical security flaws in VLM-based mobile agents, highlighting their susceptibility to clean-label backdoor attacks and the urgent need for effective defense mechanisms in their training pipelines.
Related papers
- VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation [68.30039719980519]
This work reveals that the visual grounding of GUI agent-mapping textual plans to GUI elements can introduce vulnerabilities.<n>With backdoor attack targeting visual grounding, the agent's behavior can be compromised even when given correct task-solving plans.<n>We propose VisualTrap, a method that can hijack the grounding by misleading the agent to locate textual plans to trigger locations instead of the intended targets.
arXiv Detail & Related papers (2025-07-09T14:36:00Z) - Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation [32.24294112337828]
BadSem is a data poisoning attack that injects backdoors by deliberately misaligning image-text pairs during training.<n>Our experiments show that BadSem achieves over 98% average ASR, generalizes well to out-of-distribution datasets, and can transfer across poisoning modalities.<n>Our findings highlight the urgent need to address semantic vulnerabilities in Vision Language Models for their safer deployment.
arXiv Detail & Related papers (2025-06-08T16:40:40Z) - MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning [16.04315589280155]
We propose Venomancer, an effective backdoor attack that is imperceptible and allows target-on-demand.
The method is robust against state-of-the-art defenses such as Norm Clipping, Weak DP, Krum, Multi-Krum, RLR, FedRAD, Deepsight, and RFLBAT.
arXiv Detail & Related papers (2024-07-03T14:22:51Z) - Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors [31.383591942592467]
Vision-language models (VLMs) offer innovative ways to combine visual and textual data for enhanced understanding and interaction.
Patch-based adversarial attack is considered the most realistic threat model in physical vision applications.
We introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, to protectVLMs from the threat of patched visual prompt injectors.
arXiv Detail & Related papers (2024-05-17T04:19:19Z) - Pre-trained Trojan Attacks for Visual Recognition [106.13792185398863]
Pre-trained vision models (PVMs) have become a dominant component due to their exceptional performance when fine-tuned for downstream tasks.
We propose the Pre-trained Trojan attack, which embeds backdoors into a PVM, enabling attacks across various downstream vision tasks.
We highlight the challenges posed by cross-task activation and shortcut connections in successful backdoor attacks.
arXiv Detail & Related papers (2023-12-23T05:51:40Z) - Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective [71.39995120597999]
Modern machine learning models are vulnerable to adversarial and backdoor attacks.<n>Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models.<n>CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
arXiv Detail & Related papers (2023-11-25T06:55:13Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment [31.24530091590395]
We study an attack scenario called Trojan Activation Attack (TA2), which injects trojan steering vectors into the activation layers of Large Language Models.
Our experiment results show that TA2 is highly effective and adds little or no overhead to attack efficiency.
arXiv Detail & Related papers (2023-11-15T23:07:40Z) - Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation [48.238349062995916]
We find that highly effective backdoors can be easily inserted using rotation-based image transformation.
Our work highlights a new, simple, physically realizable, and highly effective vector for backdoor attacks.
arXiv Detail & Related papers (2022-07-22T00:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.