Defense Against Indirect Prompt Injection via Tool Result Parsing
- URL: http://arxiv.org/abs/2601.04795v1
- Date: Thu, 08 Jan 2026 10:21:56 GMT
- Title: Defense Against Indirect Prompt Injection via Tool Result Parsing
- Authors: Qiang Yu, Xinran Cheng, Chuanyi Liu,
- Abstract summary: LLM agents face an escalating threat from indirect prompt injection.<n>This vulnerability poses a significant risk as agents gain more direct control over physical environments.<n>We propose a novel method that provides LLMs with precise data via tool result parsing while effectively filtering out injected malicious code.
- Score: 5.69701430275527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As LLM agents transition from digital assistants to physical controllers in autonomous systems and robotics, they face an escalating threat from indirect prompt injection. By embedding adversarial instructions into the results of tool calls, attackers can hijack the agent's decision-making process to execute unauthorized actions. This vulnerability poses a significant risk as agents gain more direct control over physical environments. Existing defense mechanisms against Indirect Prompt Injection (IPI) generally fall into two categories. The first involves training dedicated detection models; however, this approach entails high computational overhead for both training and inference, and requires frequent updates to keep pace with evolving attack vectors. Alternatively, prompt-based methods leverage the inherent capabilities of LLMs to detect or ignore malicious instructions via prompt engineering. Despite their flexibility, most current prompt-based defenses suffer from high Attack Success Rates (ASR), demonstrating limited robustness against sophisticated injection attacks. In this paper, we propose a novel method that provides LLMs with precise data via tool result parsing while effectively filtering out injected malicious code. Our approach achieves competitive Utility under Attack (UA) while maintaining the lowest Attack Success Rate (ASR) to date, significantly outperforming existing methods. Code is available at GitHub.
Related papers
- SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z) - ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z) - CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization [17.941502260254673]
We present a novel approach inspired by the fundamental principle of computer security: data should not contain executable instructions.<n>Instead of sample-level classification, we propose a token-level sanitization process, which surgically removes any instructions directed at AI systems from tool outputs.<n>This approach is non-blocking, does not require calibration, and is agnostic to the context of tool outputs.
arXiv Detail & Related papers (2025-10-09T21:32:02Z) - Adversarial Reinforcement Learning for Large Language Model Agent Safety [20.704989548285372]
Large Language Model (LLM) agents can leverage tools like Google Search to complete complex tasks.<n>Current defense strategies rely on fine-tuning LLM agents on datasets of known attacks.<n>We propose Adversarial Reinforcement Learning for Agent Safety (ARLAS), a novel framework that leverages adversarial reinforcement learning (RL) by formulating the problem as a two-player zero-sum game.
arXiv Detail & Related papers (2025-10-06T23:09:18Z) - Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods [95.54363609024847]
Large language models (LLMs) are vulnerable to prompt injection attacks.<n>In this paper, we explore more vicious attacks that nullify the prompt injection defense methods.<n> backdoor-powered prompt injection attacks are more harmful than previous prompt injection attacks.
arXiv Detail & Related papers (2025-10-04T07:11:11Z) - IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents [33.775221377823925]
Large language model (LLM) agents are widely deployed in real-world applications, where they leverage tools to retrieve and manipulate external data for complex tasks.<n>When interacting with untrusted data sources, tool responses may contain injected instructions that covertly influence agent behaviors and lead to malicious outcomes.<n>We propose a novel defensive task execution paradigm, called IPIGuard, to prevent malicious tool invocations at the source.
arXiv Detail & Related papers (2025-08-21T07:08:16Z) - BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z) - TopicAttack: An Indirect Prompt Injection Attack via Topic Transition [92.26240528996443]
Large language models (LLMs) are vulnerable to indirect prompt injection attacks.<n>We propose TopicAttack, which prompts the LLM to generate a fabricated transition prompt that gradually shifts the topic toward the injected instruction.<n>We find that a higher injected-to-original attention ratio leads to a greater success probability, and our method achieves a much higher ratio than the baseline methods.
arXiv Detail & Related papers (2025-07-18T06:23:31Z) - To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt [5.8935359767204805]
We propose a novel, lightweight defense mechanism called Polymorphic Prompt Assembling.<n>The approach is based on the insight that prompt injection requires guessing and breaking the structure of the system prompt.<n> PPA prevents attackers from predicting the prompt structure, thereby enhancing security without compromising performance.
arXiv Detail & Related papers (2025-06-06T04:50:57Z) - Defense Against Prompt Injection Attack by Leveraging Attack Techniques [66.65466992544728]
Large language models (LLMs) have achieved remarkable performance across various natural language processing (NLP) tasks.<n>As LLMs continue to evolve, new vulnerabilities, especially prompt injection attacks arise.<n>Recent attack methods leverage LLMs' instruction-following abilities and their inabilities to distinguish instructions injected in the data content.
arXiv Detail & Related papers (2024-11-01T09:14:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.