Related papers: MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

URL: http://arxiv.org/abs/2601.07395v1
Date: Mon, 12 Jan 2026 10:28:46 GMT
Title: MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP
Authors: Ruiqi Li, Zhiqiang Wang, Yunhao Yao, Xiang-Yang Li,
Abstract summary: In implicit tool poisoning, malicious instructions embedded in tool metadata are injected into the agent context during the Model Context Protocol (MCP) registration phase.<n>We propose MCP-ITP, the first automated and adaptive framework for implicit tool poisoning within the MCP ecosystem.
Score: 22.063867518456743
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely adopted. However, integrating external tools expands the attack surface, exposing agents to tool poisoning attacks. In such attacks, malicious instructions embedded in tool metadata are injected into the agent context during MCP registration phase, thereby manipulating agent behavior. Prior work primarily focuses on explicit tool poisoning or relied on manually crafted poisoned tools. In contrast, we focus on a particularly stealthy variant: implicit tool poisoning, where the poisoned tool itself remains uninvoked. Instead, the instructions embedded in the tool metadata induce the agent to invoke a legitimate but high-privilege tool to perform malicious operations. We propose MCP-ITP, the first automated and adaptive framework for implicit tool poisoning within the MCP ecosystem. MCP-ITP formulates poisoned tool generation as a black-box optimization problem and employs an iterative optimization strategy that leverages feedback from both an evaluation LLM and a detection LLM to maximize Attack Success Rate (ASR) while evading current detection mechanisms. Experimental results on the MCPTox dataset across 12 LLM agents demonstrate that MCP-ITP consistently outperforms the manually crafted baseline, achieving up to 84.2% ASR while suppressing the Malicious Tool Detection Rate (MDR) to as low as 0.3%.

Related papers

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents [39.267334469481916]
We propose MCPShield as a plug-in security cognition layer that ensures agent security when invoking MCP-based tools.<n>Our work provides a practical and robust security safeguard for MCP-based tool invocation in open agent ecosystems.
arXiv Detail & Related papers (2026-02-15T19:10:00Z)
SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z)
MalTool: Malicious Tool Attacks on LLM Agents [52.01975462609959]
MalTool is a coding-LLM-based framework that synthesizes tools exhibiting specified malicious behaviors.<n>We show that MalTool is highly effective even when coding LLMs are safety-aligned.
arXiv Detail & Related papers (2026-02-12T17:27:43Z)
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback [53.2744585868162]
Monitoring step-level tool invocation behaviors in real time is critical for agent deployment.<n>We first construct TS-Bench, a novel benchmark for step-level tool invocation safety detection in LLM agents.<n>We then develop a guardrail model, TS-Guard, using multi-task reinforcement learning.
arXiv Detail & Related papers (2026-01-15T07:54:32Z)
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents [14.507665159809138]
The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools.<n>We present MSB (MCP Security Benchmark), the first end-to-end evaluation suite that measures how well LLM agents resist MCP-specific attacks.
arXiv Detail & Related papers (2025-10-14T07:36:25Z)
ToolTweak: An Attack on Tool Selection in LLM-based Agents [52.17181489286236]
We show that adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives.<n>We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%.<n>To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally.
arXiv Detail & Related papers (2025-10-02T20:44:44Z)
Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools [47.32559576064343]
We propose AutoMalTool, an automated red teaming framework for LLM-based agents by generating malicious MCP tools.<n>Our evaluation shows that AutoMalTool effectively generates malicious MCP tools capable of manipulating the behavior of mainstream LLM-based agents.
arXiv Detail & Related papers (2025-09-25T11:14:38Z)
Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem [13.95558554298296]
Large language models (LLMs) are increasingly integrated with external systems through the Model Context Protocol (MCP)<n>In this paper, we reveal a new class of attacks, Parasitic Toolchain Attacks, instantiated as MCP Unintended Privacy Disclosure (MCP-UPD)<n>The malicious logic infiltrates the toolchain and unfolds in three phases: Parasitic Ingestion, Privacy Collection, and Privacy Disclosure, culminating in stealthy exfiltration of private data.
arXiv Detail & Related papers (2025-09-08T11:35:32Z)
MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers [12.669529656631937]
We introduce MCPTox, the first benchmark to evaluate agent robustness against Tool Poisoning in realistic MCP settings.<n> MCPTox generates a comprehensive suite of 1312 malicious test cases by few-shot learning, covering 10 categories of potential risks.<n>Our evaluation reveals a widespread vulnerability to Tool Poisoning, with o1-mini, achieving an attack success rate of 72.8%.
arXiv Detail & Related papers (2025-08-19T10:12:35Z)
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools [10.086284534400658]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools.<n>We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents.<n>We propose a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata.
arXiv Detail & Related papers (2025-08-04T06:38:59Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning. On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.