Fugu-MT 論文翻訳(概要): MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers

論文の概要: MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers

arxiv url: http://arxiv.org/abs/2508.14925v1
Date: Tue, 19 Aug 2025 10:12:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-22 16:26:46.02422
Title: MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers
Title（参考訳）: MCPTox: 実世界のMCPサーバに対するツール攻撃のベンチマーク
Authors: Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, Xiangyang Li,
Abstract要約: MCPToxは,現実的なMCP設定において,ツールポジショニングに対するエージェントの堅牢性を評価する最初のベンチマークである。 MCPToxは、数ショットの学習によって1312の悪意のあるテストケースの包括的なスイートを生成し、潜在的なリスクの10のカテゴリをカバーする。評価の結果,o1-miniで72.8%の攻撃成功率を達成したツールポイジングの脆弱性が広く報告されている。
参考スコア（独自算出の注目度）: 12.669529656631937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: By providing a standardized interface for LLM agents to interact with external tools, the Model Context Protocol (MCP) is quickly becoming a cornerstone of the modern autonomous agent ecosystem. However, it creates novel attack surfaces due to untrusted external tools. While prior work has focused on attacks injected through external tool outputs, we investigate a more fundamental vulnerability: Tool Poisoning, where malicious instructions are embedded within a tool's metadata without execution. To date, this threat has been primarily demonstrated through isolated cases, lacking a systematic, large-scale evaluation. We introduce MCPTox, the first benchmark to systematically evaluate agent robustness against Tool Poisoning in realistic MCP settings. MCPTox is constructed upon 45 live, real-world MCP servers and 353 authentic tools. To achieve this, we design three distinct attack templates to generate a comprehensive suite of 1312 malicious test cases by few-shot learning, covering 10 categories of potential risks. Our evaluation on 20 prominent LLM agents setting reveals a widespread vulnerability to Tool Poisoning, with o1-mini, achieving an attack success rate of 72.8\%. We find that more capable models are often more susceptible, as the attack exploits their superior instruction-following abilities. Finally, the failure case analysis reveals that agents rarely refuse these attacks, with the highest refused rate (Claude-3.7-Sonnet) less than 3\%, demonstrating that existing safety alignment is ineffective against malicious actions that use legitimate tools for unauthorized operation. Our findings create a crucial empirical baseline for understanding and mitigating this widespread threat, and we release MCPTox for the development of verifiably safer AI agents. Our dataset is available at an anonymized repository: \textit{https://anonymous.4open.science/r/AAAI26-7C02}.
Abstract（参考訳）: LLMエージェントが外部ツールと対話するための標準化されたインターフェースを提供することによって、モデルコンテキストプロトコル(MCP)は、急速に現代の自律エージェントエコシステムの基盤になりつつある。しかし、信頼できない外部ツールにより、新たな攻撃面を生成する。以前の作業では、外部ツール出力を通じて注入された攻撃に重点を置いているが、より根本的な脆弱性を調査している。これまでのところ、この脅威は主に孤立したケースを通じて実証されており、体系的かつ大規模な評価を欠いている。 MCPToxは,現実的なMCP設定において,ツールポジショニングに対するエージェントの堅牢性を体系的に評価する最初のベンチマークである。 MCPToxは45の実環境のMPPサーバと353の認証ツール上に構築されている。これを実現するために、3つの異なるアタックテンプレートを設計し、潜在的なリスクの10のカテゴリをカバーする、1312の悪意のあるテストケースの包括的なスイートを数ショットの学習で生成する。 LLMエージェント設定20件について評価したところ,攻撃成功率72.8\%となるo1-miniのツール・ポジショニングの脆弱性が広範囲にあることが明らかとなった。攻撃が優れた命令追従能力を利用するので、より有能なモデルの方がより感受性が高いことが分かっています。最後に、障害事例分析により、エージェントはこれらの攻撃を滅多に拒否せず、最大拒否率(Claude-3.7-Sonnet)が3倍未満であり、既存の安全アライメントが不正な操作に合法的なツールを使用する悪意のある行為に対して効果がないことを示した。我々の発見は、この広範囲にわたる脅威を理解し緩和するための重要な経験的ベースラインを作成し、より安全なAIエージェントの開発のためのMCPToxをリリースする。我々のデータセットは匿名リポジトリで利用できる。 \textit{https://anonymous.4open.science/r/AAAI26-7C02}。

論文の概要: MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers

関連論文リスト