Prompt Injection Attack to Tool Selection in LLM Agents
- URL: http://arxiv.org/abs/2504.19793v3
- Date: Sun, 24 Aug 2025 03:28:21 GMT
- Title: Prompt Injection Attack to Tool Selection in LLM Agents
- Authors: Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun,
- Abstract summary: A popular approach follows a two-step process - emphretrieval and emphselection - to pick the most appropriate tool from a tool library for a given task.<n>In this work, we introduce textitToolHijacker, a novel prompt injection attack targeting tool selection in no-box scenarios.
- Score: 60.95349602772112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tool selection is a key component of LLM agents. A popular approach follows a two-step process - \emph{retrieval} and \emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce \textit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, DataSentinel, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.
Related papers
- SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z) - MalTool: Malicious Tool Attacks on LLM Agents [52.01975462609959]
MalTool is a coding-LLM-based framework that synthesizes tools exhibiting specified malicious behaviors.<n>We show that MalTool is highly effective even when coding LLMs are safety-aligned.
arXiv Detail & Related papers (2026-02-12T17:27:43Z) - Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning [58.432996881401415]
Recent work augments large language models (LLMs) with external tools to enable agentic reasoning.<n>We propose Sponge Tool Attack (STA), which disrupts agentic reasoning solely by rewriting the input prompt.<n>STA generates benign-looking prompt rewrites from the original one with high semantic fidelity.
arXiv Detail & Related papers (2026-01-24T19:36:51Z) - Quantifying Distributional Robustness of Agentic Tool-Selection [8.457056023589951]
We introduce ToolCert, the first statistical framework that formally certifies tool selection robustness.<n>We show that ToolCert produces a high-confidence lower bound on accuracy, formally quantifying the agent's worst-case performance.<n>Our evaluation with ToolCert uncovers the severe fragility: under attacks injecting deceptive tools or saturating retrieval, the certified accuracy bound drops near zero.
arXiv Detail & Related papers (2025-10-05T01:50:34Z) - ToolTweak: An Attack on Tool Selection in LLM-based Agents [52.17181489286236]
We show that adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives.<n>We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%.<n>To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally.
arXiv Detail & Related papers (2025-10-02T20:44:44Z) - Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools [10.086284534400658]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools.<n>We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents.<n>We propose a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata.
arXiv Detail & Related papers (2025-08-04T06:38:59Z) - AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - Select Me! When You Need a Tool: A Black-box Text Attack on Tool Selection [21.72195531150256]
Tool learning serves as a powerful auxiliary mechanism that extends the capabilities of large language models.<n>Previous work has primarily focused on how to make the output of the invoked tools incorrect or malicious.<n>We introduce, for the first time, a black-box text-based attack that can significantly increase the probability of the target tool being selected.
arXiv Detail & Related papers (2025-04-07T08:04:23Z) - ToolFuzz -- Automated Agent Tool Testing [5.174808367448261]
ToolFuzz is designed to discover two types of errors: (1) user queries leading to tool runtime errors and (2) user queries that lead to incorrect agent responses.<n>We show that ToolFuzz identifies 20x more erroneous inputs compared to the prompt-engineering approaches.
arXiv Detail & Related papers (2025-03-06T14:29:52Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.
MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.
Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks.
We present MELON, a novel IPI defense.
We show that MELON outperforms SOTA defenses in both attack prevention and utility preservation.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection [11.300387488829035]
Tool-calling has changed Large Language Model (LLM) applications by integrating external tools.<n>We present ToolCommander, a novel framework designed to exploit vulnerabilities in LLM tool-calling systems through adversarial tool injection.
arXiv Detail & Related papers (2024-12-13T15:15:24Z) - AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents [27.701301913159067]
We introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data.
AgentDojo is not a static test suite, but rather an environment for designing and evaluating new agent tasks, defenses, and adaptive attacks.
We populate AgentDojo with 97 realistic tasks, 629 security test cases, and various attack and defense paradigms from the literature.
arXiv Detail & Related papers (2024-06-19T08:55:56Z) - Optimization-based Prompt Injection Attack to LLM-as-a-Judge [78.20257854455562]
LLM-as-a-Judge uses a large language model (LLM) to select the best response from a set of candidates for a given question.<n>We propose JudgeDeceiver, an optimization-based prompt injection attack to LLM-as-a-Judge.<n>Our evaluation shows that JudgeDeceive is highly effective, and is much more effective than existing prompt injection attacks.
arXiv Detail & Related papers (2024-03-26T13:58:00Z) - MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use [79.87054552116443]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.<n>We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.<n>We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z) - Large Language Models as Tool Makers [85.00361145117293]
We introduce a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving.
Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks. 2) tool using: another LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving.
arXiv Detail & Related papers (2023-05-26T17:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.