Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem
- URL: http://arxiv.org/abs/2512.08290v2
- Date: Sat, 13 Dec 2025 20:37:14 GMT
- Title: Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem
- Authors: Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav,
- Abstract summary: The Model Context Protocol (MCP) has emerged as the de facto standard for connecting Large Language Models to external data and tools.<n>This paper provides a taxonomy of risks in the MCP ecosystem, distinguishing between adversarial security threats and safety hazards.<n>We demonstrate how "context" can be weaponized to trigger unauthorized operations in multi-agent environments.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Model Context Protocol (MCP) has emerged as the de facto standard for connecting Large Language Models (LLMs) to external data and tools, effectively functioning as the "USB-C for Agentic AI." While this decoupling of context and execution solves critical interoperability challenges, it introduces a profound new threat landscape where the boundary between epistemic errors (hallucinations) and security breaches (unauthorized actions) dissolves. This Systematization of Knowledge (SoK) aims to provide a comprehensive taxonomy of risks in the MCP ecosystem, distinguishing between adversarial security threats (e.g., indirect prompt injection, tool poisoning) and epistemic safety hazards (e.g., alignment failures in distributed tool delegation). We analyze the structural vulnerabilities of MCP primitives, specifically Resources, Prompts, and Tools, and demonstrate how "context" can be weaponized to trigger unauthorized operations in multi-agent environments. Furthermore, we survey state-of-the-art defenses, ranging from cryptographic provenance (ETDI) to runtime intent verification, and conclude with a roadmap for securing the transition from conversational chatbots to autonomous agentic operating systems.
Related papers
- From Secure Agentic AI to Secure Agentic Web: Challenges, Threats, and Future Directions [20.73038673205127]
We provide a transition-oriented view from Secure Agentic AI to a Secure Agentic Web.<n>We first summarize a component-aligned threat taxonomy covering prompt abuse, environment injection, memory attacks, toolchain abuse, model tampering, and agent network attacks.<n>We then review defense strategies, including prompt hardening, safety-aware decoding, privilege control for tools and APIs, runtime monitoring, continuous red-teaming, and protocol-level security mechanisms.
arXiv Detail & Related papers (2026-03-02T07:44:18Z) - SMCP: Secure Model Context Protocol [12.950842281962101]
We introduce the Secure Model Context Protocol (SMCP), which builds on the Model Context Protocol (MCP)<n>MCP has emerged as a standard to unify tool access, allowing agents to discover, invoke, and coordinate with tools more flexibly.<n>SMCP adds unified identity management, robust mutual authentication, ongoing security context propagation, fine-grained policy enforcement, and comprehensive audit logging.
arXiv Detail & Related papers (2026-02-01T09:59:57Z) - Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [65.6660735371212]
We present textbftextscJustAsk, a framework that autonomously discovers effective extraction strategies through interaction alone.<n>It formulates extraction as an online exploration problem, using Upper Confidence Bound--based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration.<n>Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
arXiv Detail & Related papers (2026-01-29T03:53:25Z) - Securing AI Agents in Cyber-Physical Systems: A Survey of Environmental Interactions, Deepfake Threats, and Defenses [2.6726842616701703]
This survey provides a comprehensive review of security threats targeting AI agents in cyber-physical systems.<n>We focus on environmental interactions, deepfake-driven attacks, and MCP-mediated vulnerabilities.<n>We quantitatively illustrate how timing, noise, and false-positive costs constrainable defenses.
arXiv Detail & Related papers (2026-01-28T02:33:24Z) - ORCA -- An Automated Threat Analysis Pipeline for O-RAN Continuous Development [57.61878484176942]
Open-Radio Access Network (O-RAN) integrates numerous software components in a cloud-like deployment, opening the radio access network to previously unconsidered security threats.<n>Current vulnerability assessment practices often rely on manual, labor-intensive, and subjective investigations, leading to inconsistencies in the threat analysis.<n>We propose an automated pipeline that leverages Natural Language Processing (NLP) to minimize human intervention and associated biases.
arXiv Detail & Related papers (2026-01-20T07:31:59Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows [77.95511352806261]
Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms.<n>We propose OS-Sentinel, a novel hybrid safety detection framework that combines a Formal Verifier for detecting explicit system-level violations with a Contextual Judge for assessing contextual risks and agent actions.
arXiv Detail & Related papers (2025-10-28T13:22:39Z) - MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers [16.620755774987774]
The Model Context Protocol (MCP) has emerged as a standardized interface enabling seamless integration between Large Language Models (LLMs) and external data sources and tools.<n>This paper systematically analyzes the security landscape of MCP-based systems, identifying three principal threat categories.
arXiv Detail & Related papers (2025-10-27T05:12:51Z) - Towards Unifying Quantitative Security Benchmarking for Multi Agent Systems [0.0]
Evolving AI systems increasingly deploy multi-agent architectures where autonomous agents collaborate, share information, and delegate tasks through developing protocols.<n>One such risk is a cascading risk: a breach in one agent can cascade through the system, compromising others by exploiting inter-agent trust.<n>In an ACI attack, a malicious input or tool exploit injected at one agent leads to cascading compromises and amplified downstream effects across agents that trust its outputs.
arXiv Detail & Related papers (2025-07-23T13:51:28Z) - SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z) - From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows [1.202155693533555]
Large language models (LLMs) with structured function-calling interfaces have dramatically expanded capabilities for real-time data retrieval and computation.<n>Yet, the explosive proliferation of plugins, connectors, and inter-agent protocols has outpaced discovery mechanisms and security practices.<n>We introduce the first unified, end-to-end threat model for LLM-agent ecosystems, spanning host-to-tool and agent-to-agent communications.
arXiv Detail & Related papers (2025-06-29T14:32:32Z) - AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions [64.85086226439954]
We present SAFE, a benchmark for assessing the safety of embodied VLM agents on hazardous instructions.<n> SAFE comprises three components: SAFE-THOR, SAFE-VERSE, and SAFE-DIAGNOSE.<n>We uncover systematic failures in translating hazard recognition into safe planning and execution.
arXiv Detail & Related papers (2025-06-17T16:37:35Z) - A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models [39.58317527488534]
We propose a novel, instrumented risk-assessment metric that simultaneously evaluates potential threats to three key stakeholders.<n>To validate our metric, we leverage Garak, an open-source framework for vulnerability testing.<n>Results underscore the importance of multi-dimensional risk assessments in operationalizing secure, reliable AI-driven conversational systems.
arXiv Detail & Related papers (2025-05-07T20:26:45Z) - Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents [36.49717045080722]
This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios.<n>We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces.<n>Using ElizaOS, we showcase that malicious injections into prompts or historical records can trigger unauthorized asset transfers and protocol violations.
arXiv Detail & Related papers (2025-03-20T15:44:31Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction.
By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.