Fugu-MT 論文翻訳(概要): How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

論文の概要: How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

arxiv url: http://arxiv.org/abs/2603.15714v1
Date: Mon, 16 Mar 2026 14:49:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:06.905478
Title: How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition
Title（参考訳）: AIエージェントはどのようにしてプロンプト注入を間接的に行うのか?
Authors: Mateusz Dziemian, Maxwell Lin, Xiaohan Fu, Micha Nowak, Nick Winter, Eliot Jones, Andy Zou, Lama Ahmad, Kamalika Chaudhuri, Sahana Chennabasappa, Xander Davies, Lauren Deason, Benjamin L. Edelman, Tanner Emek, Ivan Evtimov, Jim Gust, Maia Hamin, Kat He, Klaudia Krawiecka, Riccardo Patana, Neil Perry, Troy Peterson, Xiangyu Qi, Javier Rando, Zifan Wang, Zihan Wang, Spencer Whitman, Eric Winsor, Arman Zharmagambetov, Matt Fredrikson, Zico Kolter,
Abstract要約: LLMベースのエージェントは、電子メール、ドキュメント、コードリポジトリなどの外部データソースを処理する高利得設定にますますデプロイされている。これにより間接的なプロンプトインジェクション攻撃が発生し、外部コンテンツに埋め込まれた敵の命令は、ユーザの意識なしにエージェントの動作を操作できる。この2つの目的を3つのエージェント設定で評価した。
参考スコア（独自算出の注目度）: 48.32744727426218
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code repositories. This creates exposure to indirect prompt injection attacks, where adversarial instructions embedded in external content manipulate agent behavior without user awareness. A critical but underexplored dimension of this threat is concealment: since users tend to observe only an agent's final response, an attack can conceal its existence by presenting no clue of compromise in the final user facing response while successfully executing harmful actions. This leaves users unaware of the manipulation and likely to accept harmful outcomes as legitimate. We present findings from a large scale public red teaming competition evaluating this dual objective across three agent settings: tool calling, coding, and computer use. The competition attracted 464 participants who submitted 272000 attack attempts against 13 frontier models, yielding 8648 successful attacks across 41 scenarios. All models proved vulnerable, with attack success rates ranging from 0.5% (Claude Opus 4.5) to 8.5% (Gemini 2.5 Pro). We identify universal attack strategies that transfer across 21 of 41 behaviors and multiple model families, suggesting fundamental weaknesses in instruction following architectures. Capability and robustness showed weak correlation, with Gemini 2.5 Pro exhibiting both high capability and high vulnerability. To address benchmark saturation and obsoleteness, we will endeavor to deliver quarterly updates through continued red teaming competitions. We open source the competition environment for use in evaluations, along with 95 successful attacks against Qwen that did not transfer to any closed source model. We share model-specific attack data with respective frontier labs and the full dataset with the UK AISI and US CAISI to support robustness research.
Abstract（参考訳）: LLMベースのエージェントは、電子メール、ドキュメント、コードリポジトリなどの外部データソースを処理する高利得設定にますますデプロイされている。これにより間接的なプロンプトインジェクション攻撃が発生し、外部コンテンツに埋め込まれた敵の命令は、ユーザの意識なしにエージェントの動作を操作できる。ユーザーはエージェントの最終応答のみを観察する傾向があるため、攻撃は有害なアクションの実行を成功させながら、最終ユーザに対して妥協の手がかりを提示しないことで、その存在を隠蔽することができる。これにより、ユーザーは操作に気付いておらず、有害な結果を合法として受け入れる可能性が高い。我々は,ツールコール,コーディング,コンピュータ利用という3つのエージェント設定において,この2つの目的を評価できる大規模な公開赤チームリングコンペティションの結果を提示する。この大会には464人の参加者が参加し、13のフロンティアモデルに対する2000年27回の攻撃を提出し、41のシナリオで8648回の攻撃に成功した。攻撃の成功率は0.5%(Claude Opus 4.5)から8.5%(Gemini 2.5 Pro)である。 41の行動のうち21の行動と複数のモデルファミリにまたがる普遍的な攻撃戦略を同定し、アーキテクチャに従う命令の根本的な弱点を示唆する。 Gemini 2.5 Proは高い能力と高い脆弱性の両方を示した。ベンチマークの飽和と陳腐化に対応するため、私たちは、継続的なレッド・チームリング・コンペを通じて四半期ごとのアップデートを実施すべく努力します。評価に使用する競合環境をオープンソースとして公開し、クローズドソースモデルに移行しなかったQwenに対する攻撃を95件成功させた。モデル固有の攻撃データを、各フロンティア研究所と共有し、全データセットを英国AISIおよび米国CAISIと共有し、ロバストネス研究を支援します。

論文の概要: How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

関連論文リスト