Fugu-MT 論文翻訳(概要): Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

論文の概要: Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

arxiv url: http://arxiv.org/abs/2510.05244v1
Date: Mon, 06 Oct 2025 18:09:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-08 17:57:07.934584
Title: Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?
Title（参考訳）: 間接的プロンプトインジェクション:ファイアウォールはすべて必要なのか、それともより強いベンチマークか?
Authors: Rishika Bhagwatkar, Kevin Kasa, Abhay Puri, Gabriel Huang, Irina Rish, Graham W. Taylor, Krishnamurthy Dj Dvijotham, Alexandre Lacoste,
Abstract要約: エージェントインタフェースにおけるシンプルでモジュール的で,モデルに依存しないディフェンスが,高ユーティリティで完全なセキュリティを実現することを示す。ツール入力ファイアウォール(最小限のファイアウォール)とツール出力ファイアウォール(サニタイザ)の2つのファイアウォールをベースとしたディフェンスを採用している。
参考スコア（独自算出の注目度）: 58.48689960350828
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept of firewalls, we show that a simple, modular and model-agnostic defense operating at the agent--tool interface achieves perfect security (0% or the lowest possible attack success rate) with high utility (task success rate) across four public benchmarks: AgentDojo, Agent Security Bench, InjecAgent and tau-Bench, while achieving a state-of-the-art security-utility tradeoff compared to prior results. Specifically, we employ a defense based on two firewalls: a Tool-Input Firewall (Minimizer) and a Tool-Output Firewall (Sanitizer). Unlike prior complex approaches, this firewall defense makes minimal assumptions on the agent and can be deployed out-of-the-box, while maintaining strong performance without compromising utility. However, our analysis also reveals critical limitations in these existing benchmarks, including flawed success metrics, implementation bugs, and most importantly, weak attacks, hindering significant progress in the field. To foster more meaningful progress, we present targeted fixes to these issues for AgentDojo and Agent Security Bench while proposing best-practices for more robust benchmark design. Further, we demonstrate that although these firewalls push the state-of-the-art on existing benchmarks, it is still possible to bypass them in practice, underscoring the need to incorporate stronger attacks in security benchmarks. Overall, our work shows that existing agentic security benchmarks are easily saturated by a simple approach and highlights the need for stronger agentic security benchmarks with carefully chosen evaluation metrics and strong adaptive attacks.
Abstract（参考訳）: AIエージェントは間接的なインジェクション攻撃に対して脆弱であり、外部コンテンツやツールアウトプットに埋め込まれた悪意のある命令は意図しないあるいは有害な振る舞いを引き起こす。ファイアウォールの概念から着想を得た結果,エージェントツールインターフェースにおけるシンプルでモジュール的でモデルに依存しない防御は,AgentDojo, Agent Security Bench, InjecAgent, Tau-Benchの4つの公開ベンチマークにおいて,完全なセキュリティ(0%ないしは最小の攻撃成功率)を実現し,従来よりも最先端のセキュリティユーティリティトレードオフを実現していることがわかった。具体的には、ツール・インプット・ファイアウォール(Minimizer)とツール・アウトプット・ファイアウォール(Sanitizer)という、2つのファイアウォールに基づくディフェンスを採用しています。従来の複雑なアプローチとは異なり、このファイアウォールディフェンスはエージェントに最小限の仮定をし、有効性を損なうことなく強力なパフォーマンスを維持しながら、最初からデプロイすることができる。しかし、我々の分析では、成功基準の欠陥、実装のバグ、そして最も重要なのは、弱い攻撃など、既存のベンチマークの限界も明らかにしています。より意味のある進歩を促進するため、より堅牢なベンチマーク設計のためのベストプラクティスを提案しながら、AgentDojoとAgent Security Benchのこれらの問題に対するターゲット修正を提示します。さらに、これらのファイアウォールが既存のベンチマークに最先端を推し進めているが、セキュリティベンチマークに強力な攻撃を組み込む必要性を強調して、実際にそれらをバイパスすることは可能であることを実証する。全体として、既存のエージェントセキュリティベンチマークは単純なアプローチで容易に飽和していることを示し、慎重に選択された評価指標と強力な適応攻撃を備えたエージェントセキュリティベンチマークの必要性を強調します。

論文の概要: Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

関連論文リスト