AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System
- URL: http://arxiv.org/abs/2602.03117v2
- Date: Fri, 06 Feb 2026 07:04:35 GMT
- Title: AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System
- Authors: Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, Chaowei Xiao,
- Abstract summary: We introduce AgentDyn, a benchmark featuring 60 challenging open-ended tasks and 560 injection test cases.<n>Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions.<n>Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense.
- Score: 44.50700894734539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external data which agent consumes also leads to the risk of indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behavior. Guided by benchmarks, such as AgentDojo, there has been significant amount of progress in developing defense against the said attacks. As the technology continues to mature, and that agents are increasingly being relied upon for more complex tasks, there is increasing pressing need to also evolve the benchmark to reflect threat landscape faced by emerging agentic systems. In this work, we reveal three fundamental flaws in current benchmarks and push the frontier along these dimensions: (i) lack of dynamic open-ended tasks, (ii) lack of helpful instructions, and (iii) simplistic user tasks. To bridge this gap, we introduce AgentDyn, a manually designed benchmark featuring 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions. Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense, revealing that existing defenses are still far from real-world deployment. Our benchmark is available at https://github.com/leolee99/AgentDyn.
Related papers
- ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z) - BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents [8.923854146974783]
We examine the landscape of prompt injection attacks and synthesize a benchmark of attacks embedded in realistic HTML payloads.<n>Our benchmark goes beyond prior work by emphasizing injections that can influence real-world actions rather than mere text outputs.<n>We propose a multi-layered defense strategy comprising both architectural and model-based defenses.
arXiv Detail & Related papers (2025-11-25T18:28:35Z) - Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? [58.48689960350828]
We show that a simple, modular and model-agnostic defense operating at the agent--tool interface achieves perfect security with high utility.<n>We employ a defense based on two firewalls: a Tool-Input Firewall (Minimizer) and a Tool-Output Firewall (Sanitizer)
arXiv Detail & Related papers (2025-10-06T18:09:02Z) - IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents [33.775221377823925]
Large language model (LLM) agents are widely deployed in real-world applications, where they leverage tools to retrieve and manipulate external data for complex tasks.<n>When interacting with untrusted data sources, tool responses may contain injected instructions that covertly influence agent behaviors and lead to malicious outcomes.<n>We propose a novel defensive task execution paradigm, called IPIGuard, to prevent malicious tool invocations at the source.
arXiv Detail & Related papers (2025-08-21T07:08:16Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - Mind the Web: The Security of Web Use Agents [11.075673765065103]
This paper demonstrates how attackers can exploit web-use agents by embedding malicious content in web pages.<n>We introduce the task-aligned injection technique that frames malicious commands as helpful task guidance.<n>We propose comprehensive mitigation strategies including oversight mechanisms, execution constraints, and task-aware reasoning techniques.
arXiv Detail & Related papers (2025-06-08T13:59:55Z) - AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents [27.701301913159067]
We introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data.
AgentDojo is not a static test suite, but rather an environment for designing and evaluating new agent tasks, defenses, and adaptive attacks.
We populate AgentDojo with 97 realistic tasks, 629 security test cases, and various attack and defense paradigms from the literature.
arXiv Detail & Related papers (2024-06-19T08:55:56Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.