Fugu-MT 論文翻訳(概要): Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

論文の概要: Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

arxiv url: http://arxiv.org/abs/2604.03870v1
Date: Sat, 04 Apr 2026 21:27:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.807077
Title: Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs
Title（参考訳）: エージェントLLMで間接注射の脆弱性が発見される
Authors: Wenhui Zhu, Xuanzhao Dong, Xiwen Chen, Rui Cai, Peijie Qiu, Zhipeng Wang, Oana Frunza, Shao Tang, Jindong Gu, Yalin Wang,
Abstract要約: システム間インタラクションを含む拡張されたアクションスペースは、深刻なセキュリティ上の問題を引き起こす。悪意のある命令をサードパーティのコンテンツ内に隠蔽するIPIは、データ流出などの不正なアクションをトリガーする。 9個のLPMバックボーンにまたがる4つの高度なIPI攻撃ベクトルに対する6つの防御戦略を評価した。
参考スコア（独自算出の注目度）: 32.38053469964495
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid deployment of open-source frameworks has significantly advanced the development of modern multi-agent systems. However, expanded action spaces, including uncontrolled privilege exposure and hidden inter-system interactions, pose severe security challenges. Specifically, Indirect Prompt Injections (IPI), which conceal malicious instructions within third-party content, can trigger unauthorized actions such as data exfiltration during normal operations. While current security evaluations predominantly rely on isolated single-turn benchmarks, the systemic vulnerabilities of these agents within complex dynamic environments remain critically underexplored. To bridge this gap, we systematically evaluate six defense strategies against four sophisticated IPI attack vectors across nine LLM backbones. Crucially, we conduct our evaluation entirely within dynamic multi-step tool-calling environments to capture the true attack surface of modern autonomous agents. Moving beyond binary success rates, our multidimensional analysis reveals a pronounced fragility. Advanced injections successfully bypass nearly all baseline defenses, and some surface-level mitigations even produce counterproductive side effects. Furthermore, while agents execute malicious instructions almost instantaneously, their internal states exhibit abnormally high decision entropy. Motivated by this latent hesitation, we investigate Representation Engineering (RepE) as a robust detection strategy. By extracting hidden states at the tool-input position, we revealed that the RepE-based circuit breaker successfully identifies and intercepts unauthorized actions before the agent commits to them, achieving high detection accuracy across diverse LLM backbones. This study exposes the limitations of current IPI defenses and provides a highly practical paradigm for building resilient multi-agent architectures.
Abstract（参考訳）: オープンソースフレームワークの迅速な展開は、現代のマルチエージェントシステムの開発を著しく前進させてきた。しかし、制御不能な特権暴露やシステム間インタラクションを含む拡張されたアクションスペースは、深刻なセキュリティ上の問題を引き起こす。 Indirect Prompt Injections (IPI)は、サードパーティのコンテンツ内で悪意のある命令を隠蔽し、通常の操作中にデータ消去などの不正なアクションをトリガーする。現在のセキュリティ評価は孤立したシングルターンベンチマークに大きく依存しているが、複雑な動的環境におけるこれらのエージェントのシステム的脆弱性はいまだに過小評価されている。このギャップを埋めるために,9個のLSMバックボーンにまたがる4つの高度なIPI攻撃ベクトルに対する6つの防御戦略を体系的に評価した。重要なことに、我々は、現代の自律エージェントの真の攻撃面を捉えるために、動的多段階ツールコール環境内で完全に評価を行う。 2値の成功率を超えて、我々の多次元分析は明らかな脆弱さを明らかにします。アドバンストインジェクションは、ほぼすべてのベースライン防御を回避し、一部の表面レベルの緩和は、反生産的な副作用も生み出す。さらに、エージェントはほぼ瞬時に悪意のある命令を実行する一方で、内部状態は異常に高い決定エントロピーを示す。本稿では,この潜伏を動機とした表現工学(RepE)をロバストな検出戦略として検討する。ツール入力位置の隠蔽状態を抽出することにより、RepEベースの回路ブレーカが、エージェントがコミットする前に無許可動作を識別し、インターセプトし、多様なLDMバックボーン間で高い検出精度を達成することを明らかにした。本研究は、現在のITI防御の限界を明らかにし、レジリエントなマルチエージェントアーキテクチャを構築するための非常に実践的なパラダイムを提供する。

論文の概要: Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

関連論文リスト