Fugu-MT 論文翻訳(概要): Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

論文の概要: Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

arxiv url: http://arxiv.org/abs/2512.14860v1
Date: Tue, 16 Dec 2025 19:22:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-18 17:06:26.763329
Title: Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks
Title（参考訳）: エージェントAIの浸透テスト:モデルとフレームワーク間の比較セキュリティ分析
Authors: Viet K. Nguyen, Mohammad I. Husain,
Abstract要約: Agentic AIは、従来のLLMセーフガードが対処できないセキュリティ脆弱性を導入する。エージェントAIシステムの最初の体系的テストと比較評価を行う。新たな「ハロシントコンプライアンス」戦略を含む6つの防衛行動パターンを同定する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agentic AI introduces security vulnerabilities that traditional LLM safeguards fail to address. Although recent work by Unit 42 at Palo Alto Networks demonstrated that ChatGPT-4o successfully executes attacks as an agent that it refuses in chat mode, there is no comparative analysis in multiple models and frameworks. We conducted the first systematic penetration testing and comparative evaluation of agentic AI systems, testing five prominent models (Claude 3.5 Sonnet, Gemini 2.5 Flash, GPT-4o, Grok 2, and Nova Pro) across two agentic AI frameworks (AutoGen and CrewAI) using a seven-agent architecture that mimics the functionality of a university information management system and 13 distinct attack scenarios that span prompt injection, Server Side Request Forgery (SSRF), SQL injection, and tool misuse. Our 130 total test cases reveal significant security disparities: AutoGen demonstrates a 52.3% refusal rate versus CrewAI's 30.8%, while model performance ranges from Nova Pro's 46.2% to Claude and Grok 2's 38.5%. Most critically, Grok 2 on CrewAI rejected only 2 of 13 attacks (15.4% refusal rate), and the overall refusal rate of 41.5% across all configurations indicates that more than half of malicious prompts succeeded despite enterprise-grade safety mechanisms. We identify six distinct defensive behavior patterns including a novel "hallucinated compliance" strategy where models fabricate outputs rather than executing or refusing attacks, and provide actionable recommendations for secure agent deployment. Complete attack prompts are also included in the Appendix to enable reproducibility.
Abstract（参考訳）: Agentic AIは、従来のLLMセーフガードが対処できないセキュリティ脆弱性を導入する。 Palo Alto NetworksのUnit 42による最近の研究は、チャットモードで拒否するエージェントとしてChatGPT-4oが攻撃をうまく実行することを示したが、複数のモデルやフレームワークで比較分析は行われていない。我々は,2つのエージェントAIフレームワーク(AutoGenとCrewAI)に対して,大学情報管理システムの機能を模倣した7エージェントアーキテクチャと,即時インジェクション,サーバサイドリクエストフォージェリ(SSRF),SQLインジェクション,ツール誤用を含む13の異なるアタックシナリオを用いて,エージェントAIシステムの最初の体系的侵入テストと比較評価を行い,その5つの重要なモデル(Claude 3.5 Sonnet,Gemini 2.5 Flash,GPT-4o,Grok 2,Nova Pro)をテストした。 AutoGenはCrewAIの30.8%に対して52.3%の拒絶率を示し、モデルパフォーマンスはNova Proの46.2%からClaudeとGrok 2の38.5%まで様々である。最も重要な点として、CrewAIのGrok 2は13の攻撃のうち2つしか拒否せず(15.4%の拒絶率)、全構成で41.5%の拒絶率は、エンタープライズグレードの安全メカニズムにもかかわらず、悪意のあるプロンプトの半分以上が成功したことを示している。モデルが攻撃の実行や拒否ではなくアウトプットを製造し、安全なエージェント配置のためのアクション可能なレコメンデーションを提供する、新しい「ハロクラシエーション」戦略を含む6つの異なる防御行動パターンを識別する。完全なアタックプロンプトもAppendixに含まれており、再現性を実現している。

論文の概要: Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

関連論文リスト