MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits
- URL: http://arxiv.org/abs/2504.03767v2
- Date: Fri, 11 Apr 2025 16:59:05 GMT
- Title: MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits
- Authors: Brandon Radosevich, John Halloran,
- Abstract summary: The Model Context Protocol (MCP) is an open protocol that standardizes API calls to large language models (LLMs), data sources, and agentic tools.<n>We show that the current MCP design carries a wide range of security risks for end users.<n>We introduce a safety auditing tool, MCPSafetyScanner, to assess the security of an arbitrary MCP server.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To reduce development overhead and enable seamless integration between potential components comprising any given generative AI application, the Model Context Protocol (MCP) (Anthropic, 2024) has recently been released and subsequently widely adopted. The MCP is an open protocol that standardizes API calls to large language models (LLMs), data sources, and agentic tools. By connecting multiple MCP servers, each defined with a set of tools, resources, and prompts, users are able to define automated workflows fully driven by LLMs. However, we show that the current MCP design carries a wide range of security risks for end users. In particular, we demonstrate that industry-leading LLMs may be coerced into using MCP tools to compromise an AI developer's system through various attacks, such as malicious code execution, remote access control, and credential theft. To proactively mitigate these and related attacks, we introduce a safety auditing tool, MCPSafetyScanner, the first agentic tool to assess the security of an arbitrary MCP server. MCPScanner uses several agents to (a) automatically determine adversarial samples given an MCP server's tools and resources; (b) search for related vulnerabilities and remediations based on those samples; and (c) generate a security report detailing all findings. Our work highlights serious security issues with general-purpose agentic workflows while also providing a proactive tool to audit MCP server safety and address detected vulnerabilities before deployment. The described MCP server auditing tool, MCPSafetyScanner, is freely available at: https://github.com/johnhalloran321/mcpSafetyScanner
Related papers
- MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents [39.267334469481916]
We propose MCPShield as a plug-in security cognition layer that ensures agent security when invoking MCP-based tools.<n>Our work provides a practical and robust security safeguard for MCP-based tool invocation in open agent ecosystems.
arXiv Detail & Related papers (2026-02-15T19:10:00Z) - MalTool: Malicious Tool Attacks on LLM Agents [52.01975462609959]
MalTool is a coding-LLM-based framework that synthesizes tools exhibiting specified malicious behaviors.<n>We show that MalTool is highly effective even when coding LLMs are safety-aligned.
arXiv Detail & Related papers (2026-02-12T17:27:43Z) - LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial Scenarios [51.52395368061729]
We present LPS-Bench, a benchmark that evaluates the planning-time safety awareness of MCP-based CUAs under long-horizon tasks.<n> Experiments reveal substantial deficiencies in existing CUAs' ability to maintain safe behavior.<n>We propose mitigation strategies to improve long-horizon planning safety in MCP-based CUA systems.
arXiv Detail & Related papers (2026-02-03T08:40:24Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers [17.96465932881902]
We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers.<n>It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides.<n>Using MCP-SafetyBench, we systematically evaluate leading open- and closed-source LLMs.
arXiv Detail & Related papers (2025-12-17T08:00:32Z) - Securing AI Agent Execution [5.599792629509229]
We introduce AgentBound, the first access control framework for MCP servers.<n>We build a dataset containing the 296 most popular MCP servers, and show that access control policies can be generated automatically from source code with 80.9% accuracy.<n>We also show that AgentBound blocks the majority of security threats in several malicious MCP servers, and that policy enforcement engine introduces negligible overhead.
arXiv Detail & Related papers (2025-10-24T08:10:36Z) - Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools [47.32559576064343]
We propose AutoMalTool, an automated red teaming framework for LLM-based agents by generating malicious MCP tools.<n>Our evaluation shows that AutoMalTool effectively generates malicious MCP tools capable of manipulating the behavior of mainstream LLM-based agents.
arXiv Detail & Related papers (2025-09-25T11:14:38Z) - MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols [7.10162765778832]
We present the first systematic taxonomy of MCP security, identifying 17 attack types across 4 primary attack surfaces.<n>We introduce MCPSecBench, a comprehensive security benchmark and playground that integrates prompt datasets, MCP servers, MCP clients, attack scripts, and protection mechanisms.
arXiv Detail & Related papers (2025-08-17T11:49:16Z) - LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? [50.60770039016318]
We present LiveMCPBench, the first comprehensive benchmark for benchmarking Model Context Protocol (MCP) agents.<n>LiveMCPBench consists of 95 real-world tasks grounded in the MCP ecosystem.<n>Our evaluation covers 10 leading models, with the best-performing model reaching a 78.95% success rate.
arXiv Detail & Related papers (2025-08-03T14:36:42Z) - Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data [0.0]
The Model Context Protocol (MCP) represents a significant advancement in AI-tool integration, enabling seamless communication between AI agents and external services.<n>This paper demonstrates how unsophisticated threat actors, requiring only basic programming skills and free web tools, can exploit MCP's trust model to exfiltrate sensitive financial data.
arXiv Detail & Related papers (2025-07-26T09:22:40Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems [48.345884334050965]
We advocate the research community in LLM safety to pay close attention to the new safety risks issues introduced by MCP.<n>We conduct a series of pilot experiments to demonstrate the safety risks in MCP-powered agent systems is a real threat and its defense is not trivial.
arXiv Detail & Related papers (2025-06-16T16:24:31Z) - Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol Ecosystem [9.147044310206773]
The Model Context Protocol (MCP) is an emerging standard designed to enable seamless interaction between Large Language Model (LLM) applications and external tools or resources.<n>In this paper, we present the first systematic study of attack vectors targeting the MCP ecosystem.
arXiv Detail & Related papers (2025-05-31T08:01:11Z) - Securing GenAI Multi-Agent Systems Against Tool Squatting: A Zero Trust Registry-Based Approach [0.0]
This paper analyzes tool squatting threats within the context of emerging interoperability standards.
It introduces a comprehensive Tool Registry system designed to mitigate these risks.
Based on its design principles, the proposed registry framework aims to effectively prevent common tool squatting vectors.
arXiv Detail & Related papers (2025-04-28T16:22:21Z) - DoomArena: A framework for Testing AI Agents Against Evolving Security Threats [84.94654617852322]
We present DoomArena, a security evaluation framework for AI agents.
It is a plug-in framework and integrates easily into realistic agentic frameworks.
It is modular and decouples the development of attacks from details of the environment in which the agent is deployed.
arXiv Detail & Related papers (2025-04-18T20:36:10Z) - MCP Guardian: A Security-First Layer for Safeguarding MCP-Based AI System [0.0]
We present MCP Guardian, a framework that strengthens MCP-based communication with authentication, rate-limiting, logging, tracing, and Web Application Firewall (WAF) scanning.
Our approach fosters secure, scalable data access for AI assistants, underscoring the importance of a defense-in-depth approach.
arXiv Detail & Related papers (2025-04-17T08:49:10Z) - Progent: Programmable Privilege Control for LLM Agents [46.49787947705293]
We introduce Progent, the first privilege control mechanism for LLM agents.
At its core is a domain-specific language for flexibly expressing privilege control policies applied during agent execution.
This enables agent developers and users to craft suitable policies for their specific use cases and enforce them deterministically to guarantee security.
arXiv Detail & Related papers (2025-04-16T01:58:40Z) - MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers [0.5266869303483376]
MCP Bridge is a lightweight proxy that connects to multiple MCP servers and exposes their capabilities through a unified API.
The system implements a risk-based execution model with three security levels standard execution, confirmation, and Docker isolation while maintaining backward compatibility with standard MCP clients.
arXiv Detail & Related papers (2025-04-11T22:19:48Z) - Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents [15.15485816037418]
We present the first systematic security analysis of task control flows in multi-tool-enabled LLM agents.<n>We identify a novel threat, Cross-Tool Harvesting and Polluting (XTHP), which includes multiple attack vectors.<n>To understand the impact of this threat, we developed Chord, a dynamic scanning tool designed to automatically detect real-world agent tools susceptible to XTHP attacks.
arXiv Detail & Related papers (2025-04-04T01:41:06Z) - Defeating Prompt Injections by Design [79.00910871948787]
CaMeL is a robust defense that creates a protective system layer around the Large Language Models (LLMs)<n>To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query.<n>We demonstrate effectiveness of CaMeL by solving $67%$ of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.
arXiv Detail & Related papers (2025-03-24T15:54:10Z) - Multi-Agent Systems Execute Arbitrary Malicious Code [9.200635465485067]
We show that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities.<n>We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection.
arXiv Detail & Related papers (2025-03-15T16:16:08Z) - Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach [9.483655213280738]
This paper presents a novel approach to evaluating the security of large language models (LLMs)<n>We define prompt leakage as a critical threat to secure LLM deployment.<n>We implement a multi-agent system where cooperative agents are tasked with probing and exploiting the target LLM to elicit its prompt.
arXiv Detail & Related papers (2025-02-18T08:17:32Z) - Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)<n>In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.<n>We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z) - AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.
We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.
We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z) - Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [47.219047422240145]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents.
Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms.
arXiv Detail & Related papers (2024-02-17T06:48:45Z) - A Survey and Comparative Analysis of Security Properties of CAN Authentication Protocols [92.81385447582882]
The Controller Area Network (CAN) bus leaves in-vehicle communications inherently non-secure.
This paper reviews and compares the 15 most prominent authentication protocols for the CAN bus.
We evaluate protocols based on essential operational criteria that contribute to ease of implementation.
arXiv Detail & Related papers (2024-01-19T14:52:04Z) - RatGPT: Turning online LLMs into Proxies for Malware Attacks [0.0]
We present a proof-of-concept where ChatGPT is used for the dissemination of malicious software while evading detection.
We also present the general approach as well as essential elements in order to stay undetected and make the attack a success.
arXiv Detail & Related papers (2023-08-17T20:54:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.