Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree
- URL: http://arxiv.org/abs/2507.14799v1
- Date: Sun, 20 Jul 2025 03:10:13 GMT
- Title: Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree
- Authors: Sam Johnson, Viet Pham, Thai Le,
- Abstract summary: We show that adversaries can embed universal adversarial triggers in webpage HTML to hijack agent behavior.<n>Our system demonstrates high success rates across real websites in both targeted and general attacks.
- Score: 8.511846002129522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work demonstrates that LLM-based web navigation agents offer powerful automation capabilities but are vulnerable to Indirect Prompt Injection (IPI) attacks. We show that adversaries can embed universal adversarial triggers in webpage HTML to hijack agent behavior that utilizes the accessibility tree to parse HTML, causing unintended or malicious actions. Using the Greedy Coordinate Gradient (GCG) algorithm and a Browser Gym agent powered by Llama-3.1, our system demonstrates high success rates across real websites in both targeted and general attacks, including login credential exfiltration and forced ad clicks. Our empirical results highlight critical security risks and the need for stronger defenses as LLM-driven autonomous web agents become more widely adopted. The system software (https://github.com/sej2020/manipulating-web-agents) is released under the MIT License, with an accompanying publicly available demo website (http://lethaiq.github.io/attack-web-llm-agent).
Related papers
- Mind the Web: The Security of Web Use Agents [8.863542098424558]
We show how attackers can exploit web-use agents' high-privilege capabilities by embedding malicious content in web pages.<n>We introduce the task-aligned injection technique that frame malicious commands as helpful task guidance rather than obvious attacks.<n>We propose comprehensive mitigation strategies including oversight mechanisms, execution constraints, and task-aware reasoning techniques.
arXiv Detail & Related papers (2025-06-08T13:59:55Z) - VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents [74.6761188527948]
Computer-Use Agents (CUAs) with full system access pose significant security and privacy risks.<n>We investigate Visual Prompt Injection (VPI) attacks, where malicious instructions are visually embedded within rendered user interfaces.<n>Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51% and 100%, respectively, on certain platforms.
arXiv Detail & Related papers (2025-06-03T05:21:50Z) - AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery [19.989518524625954]
Vision-Language Model (VLM) based Web Agents represent a step towards automating complex tasks by simulating human-like interaction with websites.<n>Existing research on adversarial environmental injection attacks often relies on unrealistic assumptions.<n>We propose AdInject, a novel and real-world black-box attack method that leverages the internet advertising delivery to inject malicious content into the Web Agent's environment.
arXiv Detail & Related papers (2025-05-27T17:59:05Z) - WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [74.82886755416949]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z) - AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks [36.97842000562324]
We introduce WASP -- a new benchmark for end-to-end evaluation of Web Agent Security against Prompt injection attacks.<n>We show that even top-tier AI models, including those with advanced reasoning capabilities, can be deceived by simple, low-effort human-written injections.<n>Our end-to-end evaluation reveals a previously unobserved insight: while attacks partially succeed in up to 86% of the case, even state-of-the-art agents often struggle to fully complete the attacker goals.
arXiv Detail & Related papers (2025-04-22T17:51:03Z) - DoomArena: A framework for Testing AI Agents Against Evolving Security Threats [84.94654617852322]
We present DoomArena, a security evaluation framework for AI agents.<n>It is a plug-in framework and integrates easily into realistic agentic frameworks.<n>It is modular and decouples the development of attacks from details of the environment in which the agent is deployed.
arXiv Detail & Related papers (2025-04-18T20:36:10Z) - AdvAgent: Controllable Blackbox Red-teaming on Web Agents [22.682464365220916]
AdvAgent is a black-box red-teaming framework for attacking web agents.<n>It employs a reinforcement learning-based pipeline to train an adversarial prompter model.<n>With careful attack design, these prompts effectively exploit agent weaknesses while maintaining stealthiness and controllability.
arXiv Detail & Related papers (2024-10-22T20:18:26Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.