ToolTweak: An Attack on Tool Selection in LLM-based Agents
- URL: http://arxiv.org/abs/2510.02554v1
- Date: Thu, 02 Oct 2025 20:44:44 GMT
- Title: ToolTweak: An Attack on Tool Selection in LLM-based Agents
- Authors: Jonathan Sneh, Ruomei Yan, Jialin Yu, Philip Torr, Yarin Gal, Sunando Sengupta, Eric Sommerlade, Alasdair Paren, Adel Bibi,
- Abstract summary: We show that adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives.<n>We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%.<n>To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally.
- Score: 52.17181489286236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities. These agents typically select tools from growing databases or marketplaces to solve user tasks, creating implicit competition among tool providers and developers for visibility and usage. In this paper, we show that this selection process harbors a critical vulnerability: by iteratively manipulating tool names and descriptions, adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives. We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%, with strong transferability between open-source and closed-source models. Beyond individual tools, we show that such attacks cause distributional shifts in tool usage, revealing risks to fairness, competition, and security in emerging tool ecosystems. To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally. All code will be open-sourced upon acceptance.
Related papers
- Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents [68.20752678837377]
We propose a principled taxonomy that transforms single-turn harmful tasks into multi-turn attack sequences.<n>Using this taxonomy, we construct MT-AgentRisk, the first benchmark to evaluate multi-turn tool-using agent safety.<n>We propose ToolShield, a training-free, tool-agnostic, self-exploration defense.
arXiv Detail & Related papers (2026-02-13T18:38:18Z) - Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning [58.432996881401415]
Recent work augments large language models (LLMs) with external tools to enable agentic reasoning.<n>We propose Sponge Tool Attack (STA), which disrupts agentic reasoning solely by rewriting the input prompt.<n>STA generates benign-looking prompt rewrites from the original one with high semantic fidelity.
arXiv Detail & Related papers (2026-01-24T19:36:51Z) - AgenTRIM: Tool Risk Mitigation for Agentic AI [5.4672006013914975]
We introduce AgenTRIM, a framework for detecting and mitigating tool-driven agency risks.<n>AgenTRIM addresses these risks through complementary offline and online phases.<n>AgenTRIM substantially reduces attack success while maintaining high task performance.
arXiv Detail & Related papers (2026-01-18T15:10:18Z) - Quantifying Distributional Robustness of Agentic Tool-Selection [8.457056023589951]
We introduce ToolCert, the first statistical framework that formally certifies tool selection robustness.<n>We show that ToolCert produces a high-confidence lower bound on accuracy, formally quantifying the agent's worst-case performance.<n>Our evaluation with ToolCert uncovers the severe fragility: under attacks injecting deceptive tools or saturating retrieval, the certified accuracy bound drops near zero.
arXiv Detail & Related papers (2025-10-05T01:50:34Z) - BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models [55.119657444627855]
Large language models (LLMs) often rely on external tools drawn from marketplaces where multiple providers offer functionally equivalent options.<n>This raises a critical point concerning fairness: if selection is systematically biased, it can degrade user experience and distort competition.<n>We introduce a benchmark of diverse tool categories, each containing multiple functionally equivalent tools, to evaluate tool-selection bias.
arXiv Detail & Related papers (2025-09-30T22:02:13Z) - Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools [10.086284534400658]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools.<n>We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents.<n>We propose a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata.
arXiv Detail & Related papers (2025-08-04T06:38:59Z) - Prompt Injection Attack to Tool Selection in LLM Agents [60.95349602772112]
A popular approach follows a two-step process - emphretrieval and emphselection - to pick the most appropriate tool from a tool library for a given task.<n>In this work, we introduce textitToolHijacker, a novel prompt injection attack targeting tool selection in no-box scenarios.
arXiv Detail & Related papers (2025-04-28T13:36:43Z) - SMART: Self-Aware Agent for Tool Overuse Mitigation [58.748554080273585]
Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness.<n>This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks with parametric knowledge.<n>We introduce SMART (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent's self-awareness to optimize task handling and reduce tool overuse.
arXiv Detail & Related papers (2025-02-17T04:50:37Z) - From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection [11.300387488829035]
Tool-calling has changed Large Language Model (LLM) applications by integrating external tools.<n>We present ToolCommander, a novel framework designed to exploit vulnerabilities in LLM tool-calling systems through adversarial tool injection.
arXiv Detail & Related papers (2024-12-13T15:15:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.