Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
- URL: http://arxiv.org/abs/2511.15203v1
- Date: Wed, 19 Nov 2025 07:47:30 GMT
- Title: Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
- Authors: Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, Shuai Wang,
- Abstract summary: We present the first comprehensive analysis of IPI-centric defense frameworks.<n>We introduce a comprehensive taxonomy of these defenses, classifying them along five dimensions.<n>We then thoroughly assess the security and usability of representative defense frameworks.
- Score: 14.131197965001988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks have emerged. However, these defenses are fragmented, lacking a unified taxonomy and comprehensive evaluation. In this Systematization of Knowledge (SoK), we present the first comprehensive analysis of IPI-centric defense frameworks. We introduce a comprehensive taxonomy of these defenses, classifying them along five dimensions. We then thoroughly assess the security and usability of representative defense frameworks. Through analysis of defensive failures in the assessment, we identify six root causes of defense circumvention. Based on these findings, we design three novel adaptive attacks that significantly improve attack success rates targeting specific frameworks, demonstrating the severity of the flaws in these defenses. Our paper provides a foundation and critical insights for the future development of more secure and usable IPI-centric agent defense frameworks.
Related papers
- The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis [24.51410516475904]
This SoK presents a comprehensive overview of the Prompt Injection (PI) landscape, covering attacks, defenses, and their evaluation practices.<n>We introduce AgentPI, a new benchmark designed to systematically evaluate agent behavior under context-dependent interaction settings.<n>We show that many defenses appear effective under existing benchmarks by suppressing contextual inputs, yet fail to generalize to realistic agent settings where context-dependent reasoning is essential.
arXiv Detail & Related papers (2026-02-11T02:47:10Z) - SoK: The Last Line of Defense: On Backdoor Defense Evaluation [21.126129826672894]
Backdoor attacks pose a significant threat to deep learning models by implanting hidden vulnerabilities that can be activated by malicious inputs.<n>This work presents a systematic (meta-)analysis of backdoor defenses through a comprehensive literature review and empirical evaluation.<n>We analyzed 183 backdoor defense papers published between 2018 and 2025 across major AI and security venues.
arXiv Detail & Related papers (2025-11-17T08:51:18Z) - A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z) - Benchmarking Misuse Mitigation Against Covert Adversaries [80.74502950627736]
Existing language model safety evaluations focus on overt attacks and low-stakes tasks.<n>We develop Benchmarks for Stateful Defenses (BSD), a data generation pipeline that automates evaluations of covert attacks and corresponding defenses.<n>Our evaluations indicate that decomposition attacks are effective misuse enablers, and highlight stateful defenses as a countermeasure.
arXiv Detail & Related papers (2025-06-06T17:33:33Z) - A Critical Evaluation of Defenses against Prompt Injection Attacks [95.81023801370073]
Large Language Models (LLMs) are vulnerable to prompt injection attacks.<n>Several defenses have recently been proposed, often claiming to mitigate these attacks successfully.<n>We argue that existing studies lack a principled approach to evaluating these defenses.
arXiv Detail & Related papers (2025-05-23T19:39:56Z) - LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures [49.1574468325115]
This survey seeks to define and categorize the various attacks targeting large language models (LLMs)<n>A thorough analysis of these attacks is presented, alongside an exploration of defense mechanisms designed to mitigate such threats.
arXiv Detail & Related papers (2025-05-02T10:35:26Z) - Decoding FL Defenses: Systemization, Pitfalls, and Remedies [16.907513505608666]
There are no guidelines for evaluating Federated Learning (FL) defenses.<n>We design a comprehensive systemization of FL defenses along three dimensions.<n>We survey 50 top-tier defense papers and identify the commonly used components in their evaluation setups.
arXiv Detail & Related papers (2025-02-03T23:14:02Z) - The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense [56.32083100401117]
The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise.<n>Recent defense mechanisms against these attacks have reached near-saturation performance on benchmark evaluations.
arXiv Detail & Related papers (2024-11-13T07:57:19Z) - Randomness in ML Defenses Helps Persistent Attackers and Hinders
Evaluators [49.52538232104449]
It is becoming increasingly imperative to design robust ML defenses.
Recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary.
We take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible.
arXiv Detail & Related papers (2023-02-27T01:33:31Z) - A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications.
They are vulnerable to adversarial examples, which motivates the adversarial defense.
This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.