Related papers: Mind the Web: The Security of Web Use Agents

Mind the Web: The Security of Web Use Agents

URL: http://arxiv.org/abs/2506.07153v2
Date: Mon, 20 Oct 2025 18:33:40 GMT
Title: Mind the Web: The Security of Web Use Agents
Authors: Avishag Shapira, Parth Atulbhai Gandhi, Edan Habler, Asaf Shabtai,
Abstract summary: This paper demonstrates how attackers can exploit web-use agents by embedding malicious content in web pages.<n>We introduce the task-aligned injection technique that frames malicious commands as helpful task guidance.<n>We propose comprehensive mitigation strategies including oversight mechanisms, execution constraints, and task-aware reasoning techniques.
Score: 11.075673765065103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Web-use agents are rapidly being deployed to automate complex web tasks with extensive browser capabilities. However, these capabilities create a critical and previously unexplored attack surface. This paper demonstrates how attackers can exploit web-use agents by embedding malicious content in web pages, such as comments, reviews, or advertisements, that agents encounter during legitimate browsing tasks. We introduce the task-aligned injection technique that frames malicious commands as helpful task guidance rather than obvious attacks, exploiting fundamental limitations in LLMs' contextual reasoning. Agents struggle to maintain coherent contextual awareness and fail to detect when seemingly helpful web content contains steering attempts that deviate them from their original task goal. To scale this attack, we developed an automated three-stage pipeline that generates effective injections without manual annotation or costly online agent interactions during training, remaining efficient even with limited training data. This pipeline produces a generator model that we evaluate on five popular agents using payloads organized by the Confidentiality-Integrity-Availability (CIA) security triad, including unauthorized camera activation, file exfiltration, user impersonation, phishing, and denial-of-service. This generator achieves over 80% attack success rate (ASR) with strong transferability across unseen payloads, diverse web environments, and different underlying LLMs. This attack succeed even against agents with built-in safety mechanisms, requiring only the ability to post content on public websites. To address this risk, we propose comprehensive mitigation strategies including oversight mechanisms, execution constraints, and task-aware reasoning techniques.

Related papers

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z)
MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks [10.431616150153992]
MUZZLE is an automated framework for evaluating the security of web agents against indirect prompt injection attacks.<n>It adapts its attack strategy based on the agent's observed execution trajectory and iteratively refines attacks using feedback from failed executions.<n>MUZZLE effectively discovers 37 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties.
arXiv Detail & Related papers (2026-02-09T21:46:18Z)
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents [52.81924177620322]
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking.<n>Their reliance on dynamic web content makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task.<n>We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks.
arXiv Detail & Related papers (2025-12-29T01:09:10Z)
FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents [76.12500510390439]
Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals.<n>Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction.<n>We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations.
arXiv Detail & Related papers (2025-10-03T17:41:30Z)
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z)
AI Kill Switch for malicious web-based LLM agent [4.144114850905779]
We propose an AI Kill Switch technique that can halt the operation of malicious web-based LLM agents.<n>Key idea is generating defensive prompts that trigger the safety mechanisms of malicious LLM agents.<n>AutoGuard achieves over 80% Defense Success Rate (DSR) across diverse malicious agents.
arXiv Detail & Related papers (2025-09-26T02:20:46Z)
Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z)
Context manipulation attacks : Web agents are susceptible to corrupted memory [37.66661108936654]
"Plan injection" is a novel context manipulation attack that corrupts these agents' internal task representations by targeting this vulnerable context.<n>We show that plan injections bypass robust prompt injection defenses, achieving up to 3x higher attack success rates than comparable prompt-based attacks.<n>Our findings highlight that secure memory handling must be a first-class concern in agentic systems.
arXiv Detail & Related papers (2025-06-18T14:29:02Z)
The Hidden Dangers of Browsing AI Agents [0.0]
This paper presents a comprehensive security evaluation of such agents, focusing on systemic vulnerabilities across multiple architectural layers.<n>Our work outlines the first end-to-end threat model for browsing agents and provides actionable guidance for securing their deployment in real-world environments.
arXiv Detail & Related papers (2025-05-19T13:10:29Z)
AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentFuzzer, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentFuzzer on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense [90.71884758066042]
Large vision-language models (LVLMs) introduce a unique vulnerability: susceptibility to malicious attacks via visual inputs.<n>We propose ESIII (Embedding Security Instructions Into Images), a novel methodology for transforming the visual space from a source of vulnerability into an active defense mechanism.
arXiv Detail & Related papers (2025-03-14T17:39:45Z)
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents [6.829628038851487]
Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration.<n>In particular, indirect prompt injection attacks pose a critical threat, where malicious instructions embedded within external data sources can manipulate agents to deviate from user intentions.<n>We propose a novel perspective that reframes agent security from preventing harmful actions to ensuring task alignment, requiring every agent action to serve user objectives.
arXiv Detail & Related papers (2024-12-21T16:17:48Z)
AdvAgent: Controllable Blackbox Red-teaming on Web Agents [22.682464365220916]
AdvAgent is a black-box red-teaming framework for attacking web agents.<n>It employs a reinforcement learning-based pipeline to train an adversarial prompter model.<n>With careful attack design, these prompts effectively exploit agent weaknesses while maintaining stealthiness and controllability.
arXiv Detail & Related papers (2024-10-22T20:18:26Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space.<n>AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
WIPI: A New Web Threat for LLM-Driven Web Agents [28.651763099760664]
We introduce a novel threat, WIPI, that indirectly controls Web Agent to execute malicious instructions embedded in publicly accessible webpages. To launch a successful WIPI works in a black-box environment. Our methodology achieves an average attack success rate (ASR) exceeding 90% even in pure black-box scenarios.
arXiv Detail & Related papers (2024-02-26T19:01:54Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.