On the Security of Tool-Invocation Prompts for LLM-Based Agentic Systems: An Empirical Risk Assessment
- URL: http://arxiv.org/abs/2509.05755v4
- Date: Fri, 19 Sep 2025 17:17:58 GMT
- Title: On the Security of Tool-Invocation Prompts for LLM-Based Agentic Systems: An Empirical Risk Assessment
- Authors: Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Yu Liu, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She,
- Abstract summary: LLM-based agentic systems leverage large language models to handle user queries, make decisions, and execute external tools for complex tasks.<n>A critical component of these systems is the Tool Invocation Prompt (TIP), which defines tool interaction protocols and guides LLMs to ensure the security and correctness of tool usage.<n>This work investigates TIP-related security risks, revealing that major LLM-based systems are vulnerable to attacks such as remote code execution (RCE) and denial of service (DoS)
- Score: 15.189630628711685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-based agentic systems leverage large language models to handle user queries, make decisions, and execute external tools for complex tasks across domains like chatbots, customer service, and software engineering. A critical component of these systems is the Tool Invocation Prompt (TIP), which defines tool interaction protocols and guides LLMs to ensure the security and correctness of tool usage. Despite its importance, TIP security has been largely overlooked. This work investigates TIP-related security risks, revealing that major LLM-based systems like Cursor, Claude Code, and others are vulnerable to attacks such as remote code execution (RCE) and denial of service (DoS). Through a systematic TIP exploitation workflow (TEW), we demonstrate external tool behavior hijacking via manipulated tool invocations. We also propose defense mechanisms to enhance TIP security in LLM-based agentic systems.
Related papers
- MalTool: Malicious Tool Attacks on LLM Agents [52.01975462609959]
MalTool is a coding-LLM-based framework that synthesizes tools exhibiting specified malicious behaviors.<n>We show that MalTool is highly effective even when coding LLMs are safety-aligned.
arXiv Detail & Related papers (2026-02-12T17:27:43Z) - ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback [53.2744585868162]
Monitoring step-level tool invocation behaviors in real time is critical for agent deployment.<n>We first construct TS-Bench, a novel benchmark for step-level tool invocation safety detection in LLM agents.<n>We then develop a guardrail model, TS-Guard, using multi-task reinforcement learning.
arXiv Detail & Related papers (2026-01-15T07:54:32Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - ASTRA: Agentic Steerability and Risk Assessment Framework [3.9756746779772834]
Securing AI agents powered by Large Language Models (LLMs) is one of the most critical challenges in AI security today.<n>ASTRA is a first-of-its-kind framework designed to evaluate the effectiveness of LLMs in supporting the creation of secure agents.
arXiv Detail & Related papers (2025-11-22T16:32:29Z) - Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools [47.32559576064343]
We propose AutoMalTool, an automated red teaming framework for LLM-based agents by generating malicious MCP tools.<n>Our evaluation shows that AutoMalTool effectively generates malicious MCP tools capable of manipulating the behavior of mainstream LLM-based agents.
arXiv Detail & Related papers (2025-09-25T11:14:38Z) - Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem [13.95558554298296]
Large language models (LLMs) are increasingly integrated with external systems through the Model Context Protocol (MCP)<n>In this paper, we reveal a new class of attacks, Parasitic Toolchain Attacks, instantiated as MCP Unintended Privacy Disclosure (MCP-UPD)<n>The malicious logic infiltrates the toolchain and unfolds in three phases: Parasitic Ingestion, Privacy Collection, and Privacy Disclosure, culminating in stealthy exfiltration of private data.
arXiv Detail & Related papers (2025-09-08T11:35:32Z) - Prompt Injection Attack to Tool Selection in LLM Agents [60.95349602772112]
A popular approach follows a two-step process - emphretrieval and emphselection - to pick the most appropriate tool from a tool library for a given task.<n>In this work, we introduce textitToolHijacker, a novel prompt injection attack targeting tool selection in no-box scenarios.
arXiv Detail & Related papers (2025-04-28T13:36:43Z) - Progent: Programmable Privilege Control for LLM Agents [46.31581986508561]
We introduce Progent, the first privilege control framework to secure Large Language Models agents.<n>Progent enforces security at the tool level by restricting agents to performing tool calls necessary for user tasks while blocking potentially malicious ones.<n>Thanks to our modular design, integrating Progent does not alter agent internals and only requires minimal changes to the existing agent implementation.
arXiv Detail & Related papers (2025-04-16T01:58:40Z) - Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents [15.15485816037418]
This paper presents the first systematic security analysis of task control flows in multi-tool-enabled LLM agents.<n>We identify a novel threat, Cross-Tool Harvesting and Polluting (XTHP), which includes multiple attack vectors.<n>To understand the impact of this threat, we developed Chord, a dynamic scanning tool designed to automatically detect real-world agent tools susceptible to XTHP attacks.
arXiv Detail & Related papers (2025-04-04T01:41:06Z) - Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents [12.072737324367937]
We propose Prompt Flow Integrity (PFI) to prevent privilege escalation in Large Language Models (LLMs)<n>PFI features three mitigation techniques -- i.e., agent isolation, secure untrusted data processing, and privilege escalation guardrails.<n>Our evaluation result shows that PFI effectively mitigates privilege escalation attacks while successfully preserving the utility of LLM agents.
arXiv Detail & Related papers (2025-03-17T05:27:57Z) - Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)<n>In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.<n>We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z) - From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection [11.300387488829035]
Tool-calling has changed Large Language Model (LLM) applications by integrating external tools.<n>We present ToolCommander, a novel framework designed to exploit vulnerabilities in LLM tool-calling systems through adversarial tool injection.
arXiv Detail & Related papers (2024-12-13T15:15:24Z) - AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.<n>We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.<n>We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z) - Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents [56.822238860147024]
Augmenting large language models with external tools has emerged as a promising approach to extend their utility.<n>Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning.<n>We propose AutoTools, a framework that enables LLMs to automate the tool-use workflow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.