Attacking Multimodal OS Agents with Malicious Image Patches
- URL: http://arxiv.org/abs/2503.10809v1
- Date: Thu, 13 Mar 2025 18:59:12 GMT
- Title: Attacking Multimodal OS Agents with Malicious Image Patches
- Authors: Lukas Aichberger, Alasdair Paren, Yarin Gal, Philip Torr, Adel Bibi,
- Abstract summary: Recent advances in operating system (OS) agents enable vision-language models to interact directly with the graphical user interface of an OS.<n>These multimodal OS agents autonomously perform computer-based tasks in response to a single prompt via application programming interfaces (APIs)<n>We introduce a novel attack vector: malicious image patches (MIPs) that have been adversarially perturbed so that, when captured in a screenshot, they cause an OS agent to perform harmful actions by exploiting specific APIs.
- Score: 43.09197967149309
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in operating system (OS) agents enable vision-language models to interact directly with the graphical user interface of an OS. These multimodal OS agents autonomously perform computer-based tasks in response to a single prompt via application programming interfaces (APIs). Such APIs typically support low-level operations, including mouse clicks, keyboard inputs, and screenshot captures. We introduce a novel attack vector: malicious image patches (MIPs) that have been adversarially perturbed so that, when captured in a screenshot, they cause an OS agent to perform harmful actions by exploiting specific APIs. For instance, MIPs embedded in desktop backgrounds or shared on social media can redirect an agent to a malicious website, enabling further exploitation. These MIPs generalise across different user requests and screen layouts, and remain effective for multiple OS agents. The existence of such attacks highlights critical security vulnerabilities in OS agents, which should be carefully addressed before their widespread adoption.
Related papers
- ceLLMate: Sandboxing Browser AI Agents [16.060034673487287]
We propose ceLLMate, a browser-level sandboxing framework that restricts the agent's ambient authority and reduces the blast radius of prompt injections.<n> ceLLMate pairs website-authored mandatory policies with an automated policy-prediction layer that adapts and instantiates these policies from the user's natural-language task.<n>We implement ceLLMate as an agent-agnostic browser extension and demonstrate how it enables sandboxing policies that effectively block various types of prompt injection attacks with negligible overhead.
arXiv Detail & Related papers (2025-12-14T08:25:31Z) - GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments? [30.170538068791263]
Vision-Language Models (VLMs) are increasingly deployed as autonomous agents to navigate mobile graphical user interfaces (GUIs)<n>Environment injection corrupts an agent's visual perception by inserting adversarial UI elements directly into the GUI.<n>GhostEI-Bench is the first benchmark for assessing mobile agents under environmental injection attacks within dynamic, executable environments.
arXiv Detail & Related papers (2025-10-23T08:33:24Z) - Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z) - VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents [39.3943822850841]
We introduce VeriOS-Agent, a trustworthy OS agent trained with a two-stage learning paradigm.<n>We show that VeriOS-Agent improves the average step-wise success rate by 20.64% in untrustworthy scenarios over the state-of-the-art.
arXiv Detail & Related papers (2025-09-09T09:46:01Z) - VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation [68.30039719980519]
This work reveals that the visual grounding of GUI agent-mapping textual plans to GUI elements can introduce vulnerabilities.<n>With backdoor attack targeting visual grounding, the agent's behavior can be compromised even when given correct task-solving plans.<n>We propose VisualTrap, a method that can hijack the grounding by misleading the agent to locate textual plans to trigger locations instead of the intended targets.
arXiv Detail & Related papers (2025-07-09T14:36:00Z) - Context manipulation attacks : Web agents are susceptible to corrupted memory [37.66661108936654]
"Plan injection" is a novel context manipulation attack that corrupts these agents' internal task representations by targeting this vulnerable context.<n>We show that plan injections bypass robust prompt injection defenses, achieving up to 3x higher attack success rates than comparable prompt-based attacks.<n>Our findings highlight that secure memory handling must be a first-class concern in agentic systems.
arXiv Detail & Related papers (2025-06-18T14:29:02Z) - OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [34.396536936282175]
We introduce OS-Harm, a new benchmark for measuring safety of computer use agents.<n> OS-Harm is built on top of the OSWorld environment and aims to test models across three categories of harm: deliberate user misuse, prompt injection attacks, and model misbehavior.<n>We evaluate computer use agents based on a range of frontier models and provide insights into their safety.
arXiv Detail & Related papers (2025-06-17T17:59:31Z) - VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents [74.6761188527948]
Computer-Use Agents (CUAs) with full system access pose significant security and privacy risks.<n>We investigate Visual Prompt Injection (VPI) attacks, where malicious instructions are visually embedded within rendered user interfaces.<n>Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51% and 100%, respectively, on certain platforms.
arXiv Detail & Related papers (2025-06-03T05:21:50Z) - UFO2: The Desktop AgentOS [60.317812905300336]
UFO2 is a multiagent AgentOS for Windows desktops that elevates into practical, system-level automation.
We evaluate UFO2 across over 20 real-world Windows applications, demonstrating substantial improvements in robustness and execution accuracy over prior CUAs.
Our results show that deep OS integration unlocks a scalable path toward reliable, user-aligned desktop automation.
arXiv Detail & Related papers (2025-04-20T13:04:43Z) - The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections [21.322212760700957]
A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions.
To complete real-world tasks, such as filling forms or booking services, GUI agents often need to process and act on sensitive user data.
These attacks often exploit the discrepancy between visual saliency for agents and human users.
arXiv Detail & Related papers (2025-04-15T15:21:09Z) - Multi-Agent Systems Execute Arbitrary Malicious Code [9.200635465485067]
We show that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities.
We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection.
arXiv Detail & Related papers (2025-03-15T16:16:08Z) - PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC [98.82146219495792]
In this paper, we propose a hierarchical agent framework named PC-Agent.<n>From the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content.<n>From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture.
arXiv Detail & Related papers (2025-02-20T05:41:55Z) - Attacking Vision-Language Computer Agents via Pop-ups [61.744008541021124]
We show that VLM agents can be easily attacked by a set of carefully designed adversarial pop-ups.
This distraction leads agents to click these pop-ups instead of performing the tasks as usual.
arXiv Detail & Related papers (2024-11-04T18:56:42Z) - Imprompter: Tricking LLM Agents into Improper Tool Use [35.255462653237885]
Large Language Model (LLM) Agents are an emerging computing paradigm that blends generative machine learning with tools such as code interpreters, web browsing, email, and more generally, external resources.
We contribute to the security foundations of agent-based systems and surface a new class of automatically computed obfuscated adversarial prompt attacks.
arXiv Detail & Related papers (2024-10-19T01:00:57Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
We propose an agent that perceives its environment solely through screenshot images.<n>By leveraging the reasoning capability of the Large Language Models, we eliminate the need for large-scale human demonstration data.<n>Agent achieves an average success rate of 94.5% on MiniWoB++ and an average task score of 62.3 on WebShop.
arXiv Detail & Related papers (2024-06-11T05:21:20Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.