MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs?
- URL: http://arxiv.org/abs/2511.05867v2
- Date: Thu, 13 Nov 2025 01:43:00 GMT
- Title: MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs?
- Authors: Jiayi Fu, Qiyao Sun,
- Abstract summary: We present the first synthetic benchmark for evaluating large language models' ability to identify security risks from system logs.<n>We define nine categories of MCP server risks and generate 1,800 synthetic system logs using ten state-of-the-art LLMs.<n>Pilot experiments reveal that smaller models often fail to detect risky system logs, leading to high false positives.
- Score: 3.4468299705073133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) demonstrate strong capabilities in solving complex tasks when integrated with external tools. The Model Context Protocol (MCP) has become a standard interface for enabling such tool-based interactions. However, these interactions introduce substantial security concerns, particularly when the MCP server is compromised or untrustworthy. While prior benchmarks primarily focus on prompt injection attacks or analyze the vulnerabilities of LLM MCP interaction trajectories, limited attention has been given to the underlying system logs associated with malicious MCP servers. To address this gap, we present the first synthetic benchmark for evaluating LLMs ability to identify security risks from system logs. We define nine categories of MCP server risks and generate 1,800 synthetic system logs using ten state-of-the-art LLMs. These logs are embedded in the return values of 243 curated MCP servers, yielding a dataset of 2,421 chat histories for training and 471 queries for evaluation. Our pilot experiments reveal that smaller models often fail to detect risky system logs, leading to high false negatives. While models trained with supervised fine-tuning (SFT) tend to over-flag benign logs, resulting in elevated false positives, Reinforcement Learning from Verifiable Reward (RLVR) offers a better precision-recall balance. In particular, after training with Group Relative Policy Optimization (GRPO), Llama3.1-8B-Instruct achieves 83% accuracy, surpassing the best-performing large remote model by 9 percentage points. Fine-grained, per-category analysis further underscores the effectiveness of reinforcement learning in enhancing LLM safety within the MCP framework. Code and data are available at: https://github.com/PorUna-byte/MCP-RiskCue
Related papers
- MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers [17.96465932881902]
We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers.<n>It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides.<n>Using MCP-SafetyBench, we systematically evaluate leading open- and closed-source LLMs.
arXiv Detail & Related papers (2025-12-17T08:00:32Z) - MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers [86.00932417210477]
We introduce MCP-Universe, the first comprehensive benchmark specifically designed to evaluate LLMs in realistic and hard tasks through interaction with real-world MCP servers.<n>Our benchmark encompasses 6 core domains spanning 11 different MCP servers: Location Navigation, Repository Management, Financial Analysis, 3D Design, Browser Automation, and Web Searching.<n>We find that even SOTA models such as GPT-5 (43.72%), Grok-4 (33.33%) and Claude-4.0-Sonnet (29.44%) exhibit significant performance limitations.
arXiv Detail & Related papers (2025-08-20T13:28:58Z) - MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols [7.10162765778832]
We present the first systematic taxonomy of MCP security, identifying 17 attack types across 4 primary attack surfaces.<n>We introduce MCPSecBench, a comprehensive security benchmark and playground that integrates prompt datasets, MCP servers, MCP clients, attack scripts, and protection mechanisms.
arXiv Detail & Related papers (2025-08-17T11:49:16Z) - MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications [21.70488724213541]
integration of Large Language Models with external tools introduces critical security vulnerabilities.<n>We propose MCP-Guard, a robust, layered defense architecture designed for LLM--tool interactions.<n>We also introduce MCP-AttackBench, a benchmark of over 70,000 samples.
arXiv Detail & Related papers (2025-08-14T18:00:25Z) - LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? [50.60770039016318]
We present LiveMCPBench, the first comprehensive benchmark for benchmarking Model Context Protocol (MCP) agents.<n>LiveMCPBench consists of 95 real-world tasks grounded in the MCP ecosystem.<n>Our evaluation covers 10 leading models, with the best-performing model reaching a 78.95% success rate.
arXiv Detail & Related papers (2025-08-03T14:36:42Z) - We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems [48.345884334050965]
We advocate the research community in LLM safety to pay close attention to the new safety risks issues introduced by MCP.<n>We conduct a series of pilot experiments to demonstrate the safety risks in MCP-powered agent systems is a real threat and its defense is not trivial.
arXiv Detail & Related papers (2025-06-16T16:24:31Z) - Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers [16.794115541448758]
Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem in late 2024.<n>Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability.<n>We evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability.
arXiv Detail & Related papers (2025-06-16T14:26:37Z) - MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol [47.98229326363512]
This paper proposes a novel framework to enhance Model Context Protocol safety.<n>Based on the MAESTRO framework, we first analyze the missing safety mechanisms in MCP.<n>Next, we develop a fine-grained taxonomy that captures a diverse range of unsafe behaviors observed in MCP scenarios.
arXiv Detail & Related papers (2025-05-20T16:41:45Z) - How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.