WebInject: Prompt Injection Attack to Web Agents
- URL: http://arxiv.org/abs/2505.11717v4
- Date: Fri, 17 Oct 2025 01:52:39 GMT
- Title: WebInject: Prompt Injection Attack to Web Agents
- Authors: Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, Neil Zhenqiang Gong,
- Abstract summary: Multi-modal large language model (MLLM)-based web agents interact with webpage environments by generating actions based on screenshots of the webpages.<n>We propose WebInject, a prompt injection attack that manipulates the webpage environment to induce a web agent to perform an attacker-specified action.
- Score: 40.8572462746505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal large language model (MLLM)-based web agents interact with webpage environments by generating actions based on screenshots of the webpages. In this work, we propose WebInject, a prompt injection attack that manipulates the webpage environment to induce a web agent to perform an attacker-specified action. Our attack adds a perturbation to the raw pixel values of the rendered webpage. After these perturbed pixels are mapped into a screenshot, the perturbation induces the web agent to perform the attacker-specified action. We formulate the task of finding the perturbation as an optimization problem. A key challenge in solving this problem is that the mapping between raw pixel values and screenshot is non-differentiable, making it difficult to backpropagate gradients to the perturbation. To overcome this, we train a neural network to approximate the mapping and apply projected gradient descent to solve the reformulated optimization problem. Extensive evaluation on multiple datasets shows that WebInject is highly effective and significantly outperforms baselines.
Related papers
- MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks [10.431616150153992]
MUZZLE is an automated framework for evaluating the security of web agents against indirect prompt injection attacks.<n>It adapts its attack strategy based on the agent's observed execution trajectory and iteratively refines attacks using feedback from failed executions.<n>MUZZLE effectively discovers 37 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties.
arXiv Detail & Related papers (2026-02-09T21:46:18Z) - WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents [45.87204751555924]
Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones.<n>Existing methods for detecting and localizing such attacks achieve limited effectiveness.<n>We propose WebSentinel, a two-step approach for detecting and localizing prompt injection attacks in webpages.
arXiv Detail & Related papers (2026-02-03T17:55:04Z) - InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training [24.578304125533734]
We present InfiniteWeb, a system that automatically generates functional web environments at scale for GUI agent training.<n>We address challenges through unified specification, task-centric test-driven development, and a combination of website seed with reference design image.<n>Experiments show that InfiniteWeb surpasses commercial coding agents at realistic website construction.
arXiv Detail & Related papers (2026-01-07T17:40:08Z) - It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents [52.81924177620322]
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking.<n>Their reliance on dynamic web content makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task.<n>We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks.
arXiv Detail & Related papers (2025-12-29T01:09:10Z) - FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents [76.12500510390439]
Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals.<n>Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction.<n>We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations.
arXiv Detail & Related papers (2025-10-03T17:41:30Z) - BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks [51.803138848305814]
We introduce BrowserArena, a live open-web agent evaluation platform that collects user-submitted tasks.<n>We identify three consistent failure modes: captcha resolution, pop-up banner removal, and direct navigation to URLs.<n>Our findings surface both the diversity and brittleness of current web agents.
arXiv Detail & Related papers (2025-10-02T15:22:21Z) - Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments [61.808686396077036]
We present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon vision-language models (VLMs)<n>Our method manipulates only the visual inputs of a portion of the training samples without altering their corresponding labels or instructions.<n>We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use.
arXiv Detail & Related papers (2025-06-16T08:09:32Z) - AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery [19.989518524625954]
Vision-Language Model (VLM) based Web Agents represent a step towards automating complex tasks by simulating human-like interaction with websites.<n>Existing research on adversarial environmental injection attacks often relies on unrealistic assumptions.<n>We propose AdInject, a novel and real-world black-box attack method that leverages the internet advertising delivery to inject malicious content into the Web Agent's environment.
arXiv Detail & Related papers (2025-05-27T17:59:05Z) - EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection [14.83331240126743]
multimodal agents are increasingly trained to operate graphical user interfaces (GUIs) to complete user tasks.<n>We propose EVA, a framework for indirect prompt injection, which transforms the attack into a closed loop optimization.<n>We evaluate EVA on six widely used generalist and specialist GUI agents in realistic settings such as popup manipulation, chat based phishing, payments, and email composition.
arXiv Detail & Related papers (2025-05-20T12:41:05Z) - AIM: Additional Image Guided Generation of Transferable Adversarial Attacks [72.24101555828256]
Transferable adversarial examples highlight the vulnerability of deep neural networks (DNNs) to imperceptible perturbations across various real-world applications.<n>In this work, we focus on generative approaches for targeted transferable attacks.<n>We introduce a novel plug-and-play module into the general generator architecture to enhance adversarial transferability.
arXiv Detail & Related papers (2025-01-02T07:06:49Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Unsegment Anything by Simulating Deformation [67.10966838805132]
"Anything Unsegmentable" is a task to grant any image "the right to be unsegmented"
We aim to achieve transferable adversarial attacks against all prompt-based segmentation models.
Our approach focuses on disrupting image encoder features to achieve prompt-agnostic attacks.
arXiv Detail & Related papers (2024-04-03T09:09:42Z) - Adversarial examples by perturbing high-level features in intermediate
decoder layers [0.0]
Instead of perturbing pixels, we use an encoder-decoder representation of the input image and perturb intermediate layers in the decoder.
Our perturbation possesses semantic meaning, such as a longer beak or green tints.
We show that our method modifies key features such as edges and that defence techniques based on adversarial training are vulnerable to our attacks.
arXiv Detail & Related papers (2021-10-14T07:08:15Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - OGAN: Disrupting Deepfakes with an Adversarial Attack that Survives
Training [0.0]
We introduce a class of adversarial attacks that can disrupt face-swapping autoencoders.
We propose the Oscillating GAN (OGAN) attack, a novel attack optimized to be training-resistant.
These results demonstrate the existence of training-resistant adversarial attacks, potentially applicable to a wide range of domains.
arXiv Detail & Related papers (2020-06-17T17:18:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.