Fugu-MT 論文翻訳(概要): Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

論文の概要: Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

arxiv url: http://arxiv.org/abs/2606.13385v1
Date: Thu, 11 Jun 2026 14:12:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.843953
Title: Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents
Title（参考訳）: 代償は誰に支払うか : 現実世界のWebエージェントのためのステークホルダー中心のプロンプトインジェクションベンチマーク
Authors: Zihao Wang, Yiming Li, Yutong Wu, Zheyu Liu, Kangjie Chen, Fok Kar Wai, Pin-Yu Chen, Vrizlynn L. L. Thing, Bo Li, Dacheng Tao, Tianwei Zhang,
Abstract要約: 大規模言語モデル(LLM)によって駆動されるWebエージェントは、現実の環境にますますデプロイされる。これにより、一見良質なコンテンツがエージェントの振る舞いを操作する敵の命令を埋め込む、プロンプト・インジェクション・アタックに対して脆弱になる。実世界のWebエージェントシステムにおいて,損害を体系的に分類し,属性付けするベンチマークである textbfsysname を導入する。
参考スコア（独自算出の注目度）: 93.19140872946842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing security benchmarks adopt an \textit{attack-centric} perspective, focusing on the technical feasibility of injections while overlooking the nuanced distribution of resulting harms. In practice, however, prompt-injection risk is victim-dependent: a single exploit can produce asymmetric consequences for different stakeholders, and the same attack pattern may exhibit substantially different effectiveness depending on whom it targets. To capture these properties, we introduce \textbf{\sysname}, a \textit{stakeholder-centric} benchmark to systematically categorize and attribute harm in real-world web agent systems. It distinguishes between affected entities (e.g., user, seller, platform), decomposes the attacks into concrete objectives, and evaluates each case with complementary outcome- and process-level metrics. Our results reveal substantial and heterogeneous vulnerabilities: not a single attack objective is reliably resisted by current agents, and failures distribute across qualitatively distinct modes ranging from \emph{stealthy parasitism} (attack succeeds without disrupting the user's delegated task) to \emph{misaligned disruption} (task disrupted without attack success) and \emph{compounded failure} (both adversarial objective and task integrity simultaneously violated). These patterns are missed by conventional evaluation, highlighting the need for stakeholder-aware assessment of LLM-based agents in real-world deployments. Benchmark is available at https://github.com/StakeBench/SBC.
Abstract（参考訳）: 大規模言語モデル(LLM)によって駆動されるWebエージェントは、信頼できないWebコンテンツを操作し、直接的な結果でアクションを実行する現実世界の環境にますますデプロイされている。これにより、一見良質なコンテンツがエージェントの振る舞いを操作する敵の命令を埋め込む、プロンプト・インジェクション・アタックに対して脆弱になる。既存のセキュリティベンチマークでは‘textit{ attack-centric} の観点を採用しており、結果として生じる害の微妙な分布を見下ろしながら、インジェクションの技術的実現性に注目している。しかし、実際には、プロンプト・インジェクションのリスクは犠牲に依存しており、単一のエクスプロイトは異なる利害関係者に対して非対称な結果をもたらすことができ、同じ攻撃パターンは、その対象者によってかなり異なる効果を示す可能性がある。これらの特性を捉えるために,実世界のWebエージェントシステムにおける害を体系的に分類し,属性付けするために,textit{stakeholder-centric} ベンチマークである \textbf{\sysname} を導入する。影響を受けるエンティティ(ユーザ、販売者、プラットフォームなど)を区別し、攻撃を具体的な目的に分解し、補完的な結果とプロセスレベルのメトリクスで各ケースを評価する。攻撃対象が1つではなく、現在のエージェントによって確実に抵抗されるわけではなく、(ユーザの委任されたタスクを中断することなく)攻撃が成功し、(攻撃が成功せずに中断される)、(敵の目的とタスクの整合性の両方が同時に侵害される)。これらのパターンは従来の評価では欠落しており、現実の展開におけるLCMベースのエージェントの利害関係者意識評価の必要性を強調している。 Benchmarkはhttps://github.com/StakeBench/SBC.comで入手できる。

論文の概要: Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

関連論文リスト