Related papers: MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

URL: http://arxiv.org/abs/2503.10809v2
Date: Tue, 04 Nov 2025 10:25:46 GMT
Title: MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Authors: Lukas Aichberger, Alasdair Paren, Guohao Li, Philip Torr, Yarin Gal, Adel Bibi,
Abstract summary: Recent advances in operating system (OS) agents have enabled vision-language models (VLMs) to directly control a user's computer.<n>We uncover a novel attack vector against these OS agents: Malicious Image Patches (MIPs)<n>MIPs adversarially perturbed screen regions that, when captured by an OS agent, induce it to perform harmful actions by exploiting specific APIs.
Score: 60.92962583528122
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in operating system (OS) agents have enabled vision-language models (VLMs) to directly control a user's computer. Unlike conventional VLMs that passively output text, OS agents autonomously perform computer-based tasks in response to a single user prompt. OS agents do so by capturing, parsing, and analysing screenshots and executing low-level actions via application programming interfaces (APIs), such as mouse clicks and keyboard inputs. This direct interaction with the OS significantly raises the stakes, as failures or manipulations can have immediate and tangible consequences. In this work, we uncover a novel attack vector against these OS agents: Malicious Image Patches (MIPs), adversarially perturbed screen regions that, when captured by an OS agent, induce it to perform harmful actions by exploiting specific APIs. For instance, a MIP can be embedded in a desktop wallpaper or shared on social media to cause an OS agent to exfiltrate sensitive user data. We show that MIPs generalise across user prompts and screen configurations, and that they can hijack multiple OS agents even during the execution of benign instructions. These findings expose critical security vulnerabilities in OS agents that have to be carefully addressed before their widespread deployment.

Related papers

ceLLMate: Sandboxing Browser AI Agents [16.060034673487287]
We propose ceLLMate, a browser-level sandboxing framework that restricts the agent's ambient authority and reduces the blast radius of prompt injections.<n> ceLLMate pairs website-authored mandatory policies with an automated policy-prediction layer that adapts and instantiates these policies from the user's natural-language task.<n>We implement ceLLMate as an agent-agnostic browser extension and demonstrate how it enables sandboxing policies that effectively block various types of prompt injection attacks with negligible overhead.
arXiv Detail & Related papers (2025-12-14T08:25:31Z)
GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments? [30.170538068791263]
Vision-Language Models (VLMs) are increasingly deployed as autonomous agents to navigate mobile graphical user interfaces (GUIs)<n>Environment injection corrupts an agent's visual perception by inserting adversarial UI elements directly into the GUI.<n>GhostEI-Bench is the first benchmark for assessing mobile agents under environmental injection attacks within dynamic, executable environments.
arXiv Detail & Related papers (2025-10-23T08:33:24Z)
Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z)
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents [39.3943822850841]
We introduce VeriOS-Agent, a trustworthy OS agent trained with a two-stage learning paradigm.<n>We show that VeriOS-Agent improves the average step-wise success rate by 20.64% in untrustworthy scenarios over the state-of-the-art.
arXiv Detail & Related papers (2025-09-09T09:46:01Z)
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation [68.30039719980519]
This work reveals that the visual grounding of GUI agent-mapping textual plans to GUI elements can introduce vulnerabilities.<n>With backdoor attack targeting visual grounding, the agent's behavior can be compromised even when given correct task-solving plans.<n>We propose VisualTrap, a method that can hijack the grounding by misleading the agent to locate textual plans to trigger locations instead of the intended targets.
arXiv Detail & Related papers (2025-07-09T14:36:00Z)
Context manipulation attacks : Web agents are susceptible to corrupted memory [37.66661108936654]
"Plan injection" is a novel context manipulation attack that corrupts these agents' internal task representations by targeting this vulnerable context.<n>We show that plan injections bypass robust prompt injection defenses, achieving up to 3x higher attack success rates than comparable prompt-based attacks.<n>Our findings highlight that secure memory handling must be a first-class concern in agentic systems.
arXiv Detail & Related papers (2025-06-18T14:29:02Z)
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [34.396536936282175]
We introduce OS-Harm, a new benchmark for measuring safety of computer use agents.<n> OS-Harm is built on top of the OSWorld environment and aims to test models across three categories of harm: deliberate user misuse, prompt injection attacks, and model misbehavior.<n>We evaluate computer use agents based on a range of frontier models and provide insights into their safety.
arXiv Detail & Related papers (2025-06-17T17:59:31Z)
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents [74.6761188527948]
Computer-Use Agents (CUAs) with full system access pose significant security and privacy risks.<n>We investigate Visual Prompt Injection (VPI) attacks, where malicious instructions are visually embedded within rendered user interfaces.<n>Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51% and 100%, respectively, on certain platforms.
arXiv Detail & Related papers (2025-06-03T05:21:50Z)
UFO2: The Desktop AgentOS [60.317812905300336]
UFO2 is a multiagent AgentOS for Windows desktops that elevates into practical, system-level automation. We evaluate UFO2 across over 20 real-world Windows applications, demonstrating substantial improvements in robustness and execution accuracy over prior CUAs. Our results show that deep OS integration unlocks a scalable path toward reliable, user-aligned desktop automation.
arXiv Detail & Related papers (2025-04-20T13:04:43Z)
The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections [21.322212760700957]
A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions. To complete real-world tasks, such as filling forms or booking services, GUI agents often need to process and act on sensitive user data. These attacks often exploit the discrepancy between visual saliency for agents and human users.
arXiv Detail & Related papers (2025-04-15T15:21:09Z)
Multi-Agent Systems Execute Arbitrary Malicious Code [9.200635465485067]
We show that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities. We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection.
arXiv Detail & Related papers (2025-03-15T16:16:08Z)
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC [98.82146219495792]
In this paper, we propose a hierarchical agent framework named PC-Agent.<n>From the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content.<n>From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture.
arXiv Detail & Related papers (2025-02-20T05:41:55Z)
Attacking Vision-Language Computer Agents via Pop-ups [61.744008541021124]
We show that VLM agents can be easily attacked by a set of carefully designed adversarial pop-ups. This distraction leads agents to click these pop-ups instead of performing the tasks as usual.
arXiv Detail & Related papers (2024-11-04T18:56:42Z)
Imprompter: Tricking LLM Agents into Improper Tool Use [35.255462653237885]
Large Language Model (LLM) Agents are an emerging computing paradigm that blends generative machine learning with tools such as code interpreters, web browsing, email, and more generally, external resources. We contribute to the security foundations of agent-based systems and surface a new class of automatically computed obfuscated adversarial prompt attacks.
arXiv Detail & Related papers (2024-10-19T01:00:57Z)
Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
We propose an agent that perceives its environment solely through screenshot images.<n>By leveraging the reasoning capability of the Large Language Models, we eliminate the need for large-scale human demonstration data.<n>Agent achieves an average success rate of 94.5% on MiniWoB++ and an average task score of 62.3 on WebShop.
arXiv Detail & Related papers (2024-06-11T05:21:20Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.