Fugu-MT 論文翻訳(概要): Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

論文の概要: Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

arxiv url: http://arxiv.org/abs/2605.22001v1
Date: Thu, 21 May 2026 04:58:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.098464
Title: Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
Title（参考訳）: ガードの盲点:多エージェントLDMシステムにおけるドメイン・カモフラージュ・インジェクション・アタックのエバド検出方法
Authors: Aaditya Pai,
Abstract要約: LLMエージェントを保護するために配置された注入検出器は、自分自身をオーバーライドディレクティブとして発表する静的なテンプレートベースのペイロードで校正される。対象文書のドメイン語彙や権限構造を模倣するペイロードが生成されると、標準検出器はそれをフラグ付けできない。我々はこれをキャモフラージュ検出ギャップ (CDG) として定式化し, 静的ペイロードとカモフラージュペイロードの注入検出率の差について検討した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Injection detectors deployed to protect LLM agents are calibrated on static, template-based payloads that announce themselves as override directives. We identify a systematic blind spot: when payloads are generated to mimic the domain vocabulary and authority structures of the target document, what we call domain camouflaged injection, standard detectors fail to flag them, with detection rates dropping from 93.8% to 9.7% on Llama 3.1 8B and from 100% to 55.6% on Gemini 2.0 Flash. We formalize this as the Camouflage Detection Gap (CDG), the difference in injection detection rate between static and camouflaged payloads. Across 45 tasks spanning three domains and two model families, CDG is large and statistically significant (chi^2 = 38.03, p < 0.001 for Llama; chi^2 = 17.05, p < 0.001 for Gemini), with zero reverse discordant pairs in either case. We additionally evaluate Llama Guard 3, a production safety classifier, which detects zero camouflage payloads (IDRcamouflage = 0.000), confirming that the blind spot extends beyond few-shot detectors to dedicated safety classifiers. We further show that multi-agent debate architectures amplify static injection attacks by up to 9.9x on smaller models, while stronger models show collective resistance. Targeted detector augmentation provides only partial remediation (10.2% improvement on Llama, 78.7% on Gemini), suggesting the vulnerability is architectural rather than incidental for weaker models. Our framework, task bank, and payload generator are released publicly.
Abstract（参考訳）: LLMエージェントを保護するために配置された注入検出器は、自分自身をオーバーライドディレクティブとして発表する静的なテンプレートベースのペイロードで校正される。対象文書のドメイン語彙と権限構造を模倣するためにペイロードが生成されるとき、ドメインカモフラージュ注入と呼ばれるものは、標準検出器がフラグを付けず、検出率はLlama 3.1 8Bで93.8%から9.7%、Gemini 2.0 Flashで100%から55.6%に低下する。我々はこれをキャモフラージュ検出ギャップ (CDG) として定式化し, 静的ペイロードとカモフラージュペイロードの注入検出率の差について検討した。 3つの領域と2つのモデル族にまたがる45のタスクにおいて、CDGは大きく統計的に有意である(chi^2 = 38.03, p < 0.001, chi^2 = 17.05, p < 0.001)。また、製造安全分類器であるLlama Guard 3は、ゼロカモフラージュペイロード(IDRcamouflage = 0.000)を検出し、盲点が数発の検出器から専用の安全分類器まで広がることを確認した。さらに、より強力なモデルでは集団抵抗を示す一方、マルチエージェントの議論アーキテクチャは、より小さなモデルで最大9.9倍の静的インジェクション攻撃を増幅することを示した。目標検出器の増設は部分的な修復(ラマでは10.2%、ジェミニでは78.7%)のみを提供し、脆弱性は弱いモデルでは偶然ではなくアーキテクチャであることを示している。私たちのフレームワーク、タスクバンク、ペイロードジェネレータは公開されています。

論文の概要: Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

関連論文リスト