Related papers: MPMA: Preference Manipulation Attack Against Model Context Protocol

MPMA: Preference Manipulation Attack Against Model Context Protocol

URL: http://arxiv.org/abs/2505.11154v1
Date: Fri, 16 May 2025 11:55:12 GMT
Title: MPMA: Preference Manipulation Attack Against Model Context Protocol
Authors: Zihan Wang, Hongwei Li, Rui Zhang, Yu Liu, Wenbo Jiang, Wenshu Fan, Qingchuan Zhao, Guowen Xu,
Abstract summary: Model Context Protocol (MCP) standardizes interface mapping for large language models (LLMs) to access external data and tools.<n>Third-party customized versions of the MCP server expose potential security vulnerabilities.<n>In this paper, we first introduce a novel security threat, which we term the MCP Preference Manipulation Attack (MPMA)
Score: 24.584415826402935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model Context Protocol (MCP) standardizes interface mapping for large language models (LLMs) to access external data and tools, which revolutionizes the paradigm of tool selection and facilitates the rapid expansion of the LLM agent tool ecosystem. However, as the MCP is increasingly adopted, third-party customized versions of the MCP server expose potential security vulnerabilities. In this paper, we first introduce a novel security threat, which we term the MCP Preference Manipulation Attack (MPMA). An attacker deploys a customized MCP server to manipulate LLMs, causing them to prioritize it over other competing MCP servers. This can result in economic benefits for attackers, such as revenue from paid MCP services or advertising income generated from free servers. To achieve MPMA, we first design a Direct Preference Manipulation Attack ($\mathtt{DPMA}$) that achieves significant effectiveness by inserting the manipulative word and phrases into the tool name and description. However, such a direct modification is obvious to users and lacks stealthiness. To address these limitations, we further propose Genetic-based Advertising Preference Manipulation Attack ($\mathtt{GAPMA}$). $\mathtt{GAPMA}$ employs four commonly used strategies to initialize descriptions and integrates a Genetic Algorithm (GA) to enhance stealthiness. The experiment results demonstrate that $\mathtt{GAPMA}$ balances high effectiveness and stealthiness. Our study reveals a critical vulnerability of the MCP in open ecosystems, highlighting an urgent need for robust defense mechanisms to ensure the fairness of the MCP ecosystem.

Related papers

Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data [0.0]
The Model Context Protocol (MCP) represents a significant advancement in AI-tool integration, enabling seamless communication between AI agents and external services.<n>This paper demonstrates how unsophisticated threat actors, requiring only basic programming skills and free web tools, can exploit MCP's trust model to exfiltrate sensitive financial data.
arXiv Detail & Related papers (2025-07-26T09:22:40Z)
Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol Ecosystem [9.147044310206773]
The Model Context Protocol (MCP) is an emerging standard designed to enable seamless interaction between Large Language Model (LLM) applications and external tools or resources.<n>In this paper, we present the first systematic study of attack vectors targeting the MCP ecosystem.
arXiv Detail & Related papers (2025-05-31T08:01:11Z)
MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment [0.0]
The model context protocol (MCP) has been widely adapted as an open standard enabling seamless integration of generative AI agents.<n>Recent work has shown the MCP is susceptible to retrieval-based "falsely benign" AI attacks (FBAs), allowing malicious system access and credential theft.<n>We show that attackers need only post malicious content online to deceive MCP agents into carrying out their attacks on unsuspecting victims' systems.
arXiv Detail & Related papers (2025-05-29T16:44:29Z)
AgentXploit: End-to-End Redteaming of Black-Box AI Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentXploit, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentXploit on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
Progent: Programmable Privilege Control for LLM Agents [46.49787947705293]
We introduce Progent, the first privilege control mechanism for LLM agents.<n>At its core is a domain-specific language for flexibly expressing privilege control policies applied during agent execution.<n>This enables agent developers and users to craft suitable policies for their specific use cases and enforce them deterministically to guarantee security.
arXiv Detail & Related papers (2025-04-16T01:58:40Z)
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits [0.0]
The Model Context Protocol (MCP) is an open protocol that standardizes API calls to large language models (LLMs), data sources, and agentic tools.<n>We show that the current MCP design carries a wide range of security risks for end users.<n>We introduce a safety auditing tool, MCPSafetyScanner, to assess the security of an arbitrary MCP server.
arXiv Detail & Related papers (2025-04-02T21:46:02Z)
$\ extit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks [32.42704787246349]
Multi-agent Large Language Model (LLM) systems create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning.<n>In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms.<n>We design a $textitpermutation-invariant adversarial attack$ that optimize prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms.
arXiv Detail & Related papers (2025-03-31T20:43:56Z)
Adversary-Aware DPO: Enhancing Safety Alignment in Vision Language Models via Adversarial Training [50.829723203044395]
We propose $textitAdversary-aware DPO (ADPO)$, a novel training framework that explicitly considers adversarial.<n>$textitADPO$ integrates adversarial training into DPO to enhance the safety alignment of VLMs under worst-case adversarial perturbations.<n>$textitADPO$ ensures that VLMs remain robust and reliable even in the presence of sophisticated jailbreak attacks.
arXiv Detail & Related papers (2025-02-17T05:28:47Z)
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z)
Imprompter: Tricking LLM Agents into Improper Tool Use [35.255462653237885]
Large Language Model (LLM) Agents are an emerging computing paradigm that blends generative machine learning with tools such as code interpreters, web browsing, email, and more generally, external resources. We contribute to the security foundations of agent-based systems and surface a new class of automatically computed obfuscated adversarial prompt attacks.
arXiv Detail & Related papers (2024-10-19T01:00:57Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
Unveiling Vulnerability of Self-Attention [61.85150061213987]
Pre-trained language models (PLMs) are shown to be vulnerable to minor word changes. This paper studies the basic structure of transformer-based PLMs, the self-attention (SA) mechanism. We introduce textitS-Attend, a novel smoothing technique that effectively makes SA robust via structural perturbations.
arXiv Detail & Related papers (2024-02-26T10:31:45Z)
Towards Semantic Communication Protocols: A Probabilistic Logic Perspective [69.68769942563812]
We propose a semantic protocol model (SPM) constructed by transforming an NPM into an interpretable symbolic graph written in the probabilistic logic programming language (ProbLog) By leveraging its interpretability and memory-efficiency, we demonstrate several applications such as SPM reconfiguration for collision-avoidance.
arXiv Detail & Related papers (2022-07-08T14:19:36Z)
Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions. In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms. Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.