MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications
- URL: http://arxiv.org/abs/2508.10991v2
- Date: Fri, 22 Aug 2025 15:35:04 GMT
- Title: MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications
- Authors: Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, Meng Han,
- Abstract summary: integration of Large Language Models with external tools introduces critical security vulnerabilities.<n>We propose MCP-Guard, a robust, layered defense architecture designed for LLM--tool interactions.<n>We also introduce MCP-AttackBench, a benchmark of over 70,000 samples.
- Score: 21.70488724213541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of Large Language Models (LLMs) with external tools via protocols such as the Model Context Protocol (MCP) introduces critical security vulnerabilities, including prompt injection, data exfiltration, and other threats. To counter these challenges, we propose MCP-Guard, a robust, layered defense architecture designed for LLM--tool interactions. MCP-Guard employs a three-stage detection pipeline that balances efficiency with accuracy: it progresses from lightweight static scanning for overt threats and a deep neural detector for semantic attacks, to our fine-tuned E5-based model achieves (96.01) accuracy in identifying adversarial prompts. Finally, a lightweight LLM arbitrator synthesizes these signals to deliver the final decision while minimizing false positives. To facilitate rigorous training and evaluation, we also introduce MCP-AttackBench, a comprehensive benchmark of over 70,000 samples. Sourced from public datasets and augmented by GPT-4, MCP-AttackBench simulates diverse, real-world attack vectors in the MCP format, providing a foundation for future research into securing LLM-tool ecosystems.
Related papers
- A Lightweight Defense Mechanism against Next Generation of Phishing Emails using Distilled Attention-Augmented BiLSTM [34.0814379994364]
The MobileBERT teacher receives fine-tuning before its transformation into a BiLSTM model with multi-head attention.<n>The system demonstrates excellent performance in terms of accuracy and latency while maintaining a compact size.<n>The paper examines system performance under high traffic conditions and security measures for privacy protection and implementation methods for operational deployment.
arXiv Detail & Related papers (2026-02-24T20:06:45Z) - MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs? [3.4468299705073133]
We present the first synthetic benchmark for evaluating large language models' ability to identify security risks from system logs.<n>We define nine categories of MCP server risks and generate 1,800 synthetic system logs using ten state-of-the-art LLMs.<n>Pilot experiments reveal that smaller models often fail to detect risky system logs, leading to high false positives.
arXiv Detail & Related papers (2025-11-08T05:52:53Z) - Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem [13.95558554298296]
Large language models (LLMs) are increasingly integrated with external systems through the Model Context Protocol (MCP)<n>In this paper, we reveal a new class of attacks, Parasitic Toolchain Attacks, instantiated as MCP Unintended Privacy Disclosure (MCP-UPD)<n>The malicious logic infiltrates the toolchain and unfolds in three phases: Parasitic Ingestion, Privacy Collection, and Privacy Disclosure, culminating in stealthy exfiltration of private data.
arXiv Detail & Related papers (2025-09-08T11:35:32Z) - MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers [86.00932417210477]
We introduce MCP-Universe, the first comprehensive benchmark specifically designed to evaluate LLMs in realistic and hard tasks through interaction with real-world MCP servers.<n>Our benchmark encompasses 6 core domains spanning 11 different MCP servers: Location Navigation, Repository Management, Financial Analysis, 3D Design, Browser Automation, and Web Searching.<n>We find that even SOTA models such as GPT-5 (43.72%), Grok-4 (33.33%) and Claude-4.0-Sonnet (29.44%) exhibit significant performance limitations.
arXiv Detail & Related papers (2025-08-20T13:28:58Z) - MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols [7.10162765778832]
We present the first systematic taxonomy of MCP security, identifying 17 attack types across 4 primary attack surfaces.<n>We introduce MCPSecBench, a comprehensive security benchmark and playground that integrates prompt datasets, MCP servers, MCP clients, attack scripts, and protection mechanisms.
arXiv Detail & Related papers (2025-08-17T11:49:16Z) - LLM4MEA: Data-free Model Extraction Attacks on Sequential Recommenders via Large Language Models [50.794651919028965]
Recent studies have demonstrated the vulnerability of sequential recommender systems to Model Extraction Attacks (MEAs)<n>Black-box attacks in prior MEAs are ineffective at exposing recommender system vulnerabilities due to random sampling in data selection.<n>We propose LLM4MEA, a novel model extraction method that leverages Large Language Models (LLMs) as human-like rankers to generate data.
arXiv Detail & Related papers (2025-07-22T19:20:23Z) - A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z) - From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment [4.379304291229695]
We introduce Refusal-Aware Adaptive Injection (RAAI), a training-free, and model-agnostic framework that repurposes LLM attack techniques.<n> RAAI works by detecting internal refusal signals and adaptively injecting predefined phrases to elicit harmful, yet fluent, completions.<n>Our experiments show RAAI effectively jailbreaks LLMs, increasing the harmful response rate from a baseline of 2.15% to up to 61.04% on average.
arXiv Detail & Related papers (2025-06-07T08:19:01Z) - Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering [3.0823377252469144]
prompt injection attacks have emerged as a significant security threat.<n>Existing defense mechanisms face trade-offs between effectiveness and generalizability.<n>We propose a dual-channel feature fusion detection framework.
arXiv Detail & Related papers (2025-06-05T06:01:19Z) - MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol [47.98229326363512]
This paper proposes a novel framework to enhance Model Context Protocol safety.<n>Based on the MAESTRO framework, we first analyze the missing safety mechanisms in MCP.<n>Next, we develop a fine-grained taxonomy that captures a diverse range of unsafe behaviors observed in MCP scenarios.
arXiv Detail & Related papers (2025-05-20T16:41:45Z) - AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z) - MCP Guardian: A Security-First Layer for Safeguarding MCP-Based AI System [0.0]
We present MCP Guardian, a framework that strengthens MCP-based communication with authentication, rate-limiting, logging, tracing, and Web Application Firewall (WAF) scanning.<n>Our approach fosters secure, scalable data access for AI assistants, underscoring the importance of a defense-in-depth approach.
arXiv Detail & Related papers (2025-04-17T08:49:10Z) - Palisade -- Prompt Injection Detection Framework [0.9620910657090188]
Large Language Models are vulnerable to malicious prompt injection attacks.
This paper proposes a novel NLP based approach for prompt injection detection.
It emphasizes accuracy and optimization through a layered input screening process.
arXiv Detail & Related papers (2024-10-28T15:47:03Z) - What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [64.9691741899956]
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment.
We design a synthetic data generation framework that captures salient aspects of an unsafe input.
Using this, we investigate three well-known safety fine-tuning methods.
arXiv Detail & Related papers (2024-07-14T16:12:57Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.