Related papers: WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

URL: http://arxiv.org/abs/2602.03792v1
Date: Tue, 03 Feb 2026 17:55:04 GMT
Title: WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents
Authors: Xilong Wang, Yinuo Liu, Zhun Wang, Dawn Song, Neil Gong,
Abstract summary: Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones.<n>Existing methods for detecting and localizing such attacks achieve limited effectiveness.<n>We propose WebSentinel, a two-step approach for detecting and localizing prompt injection attacks in webpages.
Score: 45.87204751555924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks achieve limited effectiveness, as their underlying assumptions often do not hold in the web-agent setting. In this work, we propose WebSentinel, a two-step approach for detecting and localizing prompt injection attacks in webpages. Given a webpage, Step I extracts \emph{segments of interest} that may be contaminated, and Step II evaluates each segment by checking its consistency with the webpage content as context. We show that WebSentinel is highly effective, substantially outperforming baseline methods across multiple datasets of both contaminated and clean webpages that we collected. Our code is available at: https://github.com/wxl-lxw/WebSentinel.

Related papers

Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents [15.381306470663695]
We present a large scale empirical study of TOCTOU vulnerabilities in browser-use agents.<n> Dynamic or adversarial web content can exploit this window to induce unintended actions.<n>We design a lightweight mitigation based on pre-execution validation.
arXiv Detail & Related papers (2026-02-28T05:25:03Z)
SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z)
MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks [10.431616150153992]
MUZZLE is an automated framework for evaluating the security of web agents against indirect prompt injection attacks.<n>It adapts its attack strategy based on the agent's observed execution trajectory and iteratively refines attacks using feedback from failed executions.<n>MUZZLE effectively discovers 37 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties.
arXiv Detail & Related papers (2026-02-09T21:46:18Z)
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents [52.81924177620322]
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking.<n>Their reliance on dynamic web content makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task.<n>We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks.
arXiv Detail & Related papers (2025-12-29T01:09:10Z)
FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents [76.12500510390439]
Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals.<n>Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction.<n>We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations.
arXiv Detail & Related papers (2025-10-03T17:41:30Z)
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks [51.803138848305814]
We introduce BrowserArena, a live open-web agent evaluation platform that collects user-submitted tasks.<n>We identify three consistent failure modes: captcha resolution, pop-up banner removal, and direct navigation to URLs.<n>Our findings surface both the diversity and brittleness of current web agents.
arXiv Detail & Related papers (2025-10-02T15:22:21Z)
WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents [34.909802797979324]
We present the first comprehensive benchmark study on detecting prompt injection attacks targeting web agents.<n>We construct datasets containing both malicious and benign samples.<n>We then systematize both text-based and image-based detection methods.
arXiv Detail & Related papers (2025-10-01T18:34:06Z)
Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z)
Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree [8.511846002129522]
We show that adversaries can embed universal adversarial triggers in webpage HTML to hijack agent behavior.<n>Our system demonstrates high success rates across real websites in both targeted and general attacks.
arXiv Detail & Related papers (2025-07-20T03:10:13Z)
WebInject: Prompt Injection Attack to Web Agents [40.8572462746505]
Multi-modal large language model (MLLM)-based web agents interact with webpage environments by generating actions based on screenshots of the webpages.<n>We propose WebInject, a prompt injection attack that manipulates the webpage environment to induce a web agent to perform an attacker-specified action.
arXiv Detail & Related papers (2025-05-16T22:00:26Z)
WebSuite: Systematically Evaluating Why Web Agents Fail [2.200477647229223]
We describe WebSuite, the first diagnostic benchmark for generalist web agents. This benchmark suite consists of both individual tasks, such as clicking a button, and end-to-end tasks, such as adding an item to a cart. We evaluate two popular generalist web agents, one text-based and one multimodal, and identify unique weaknesses for each agent.
arXiv Detail & Related papers (2024-06-01T00:32:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.