Fugu-MT 論文翻訳(概要): MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

論文の概要: MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

arxiv url: http://arxiv.org/abs/2602.09222v1
Date: Mon, 09 Feb 2026 21:46:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-11 20:17:43.259543
Title: MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
Title（参考訳）: MUZZLE: 間接的プロンプト注入攻撃に対するWebエージェントの適応的エージェント再チーム化
Authors: Georgios Syros, Evan Rose, Brian Grinstead, Christoph Kerschbaumer, William Robertson, Cristina Nita-Rotaru, Alina Oprea,
Abstract要約: MUZZLEは、間接的なプロンプトインジェクション攻撃に対するWebエージェントのセキュリティを評価する自動化フレームワークである。エージェントの観察された実行軌跡に基づいて攻撃戦略を適用し、失敗した実行からのフィードバックを使用して攻撃を反復的に洗練する。 MUZZLEは、機密性、可用性、プライバシ特性に反する10の敵目標を持つ4つのWebアプリケーションに対する37の新たな攻撃を効果的に発見する。
参考スコア（独自算出の注目度）: 10.431616150153992
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and performing actions on users' behalf. While these agents offer powerful capabilities, their design exposes them to indirect prompt injection attacks embedded in untrusted web content, enabling adversaries to hijack agent behavior and violate user intent. Despite growing awareness of this threat, existing evaluations rely on fixed attack templates, manually selected injection surfaces, or narrowly scoped scenarios, limiting their ability to capture realistic, adaptive attacks encountered in practice. We present MUZZLE, an automated agentic framework for evaluating the security of web agents against indirect prompt injection attacks. MUZZLE utilizes the agent's trajectories to automatically identify high-salience injection surfaces, and adaptively generate context-aware malicious instructions that target violations of confidentiality, integrity, and availability. Unlike prior approaches, MUZZLE adapts its attack strategy based on the agent's observed execution trajectory and iteratively refines attacks using feedback from failed executions. We evaluate MUZZLE across diverse web applications, user tasks, and agent configurations, demonstrating its ability to automatically and adaptively assess the security of web agents with minimal human intervention. Our results show that MUZZLE effectively discovers 37 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties. MUZZLE also identifies novel attack strategies, including 2 cross-application prompt injection attacks and an agent-tailored phishing scenario.
Abstract（参考訳）: 大規模言語モデル(LLM)ベースのWebエージェントは、Webサイトと直接対話し、ユーザに代わってアクションを実行することで、複雑なオンラインタスクを自動化するために、ますます多くデプロイされている。これらのエージェントは強力な能力を提供するが、その設計は、信頼できないWebコンテンツに埋め込まれた間接的なインジェクション攻撃を露呈し、敵がエージェントの動作をハイジャックし、ユーザーの意図を侵害することを可能にする。この脅威に対する認識の高まりにもかかわらず、既存の評価は、固定された攻撃テンプレート、手動で選択された注入面、あるいは狭い範囲のシナリオに依存しており、実際に遭遇した現実的で適応的な攻撃をキャプチャする能力を制限する。本稿では,Webエージェントの間接的インジェクション攻撃に対する安全性を評価するための自動エージェントフレームワークMUZZLEを提案する。 MUZZLEはエージェントの軌道を利用して、高濃度の注入面を自動的に識別し、機密性、完全性、可用性の侵害を標的とするコンテキスト認識の悪意のある命令を適応的に生成する。従来のアプローチとは異なり、MUZLEはエージェントの観察された実行軌跡に基づいて攻撃戦略を適応し、失敗した実行からのフィードバックを使って攻撃を反復的に洗練する。我々は、多様なWebアプリケーション、ユーザタスク、エージェント設定にまたがるMUZZLEを評価し、人間の介入を最小限に抑えて、Webエージェントのセキュリティを自動かつ適応的に評価する能力を実証した。その結果,MUZLEは,機密性,可用性,プライバシ性に反する10の敵目標を持つ4つのWebアプリケーションに対して,37の新たな攻撃を効果的に発見できることがわかった。 MUZLEはまた、2つのクロスアプリケーションプロンプトインジェクション攻撃とエージェント調整フィッシングシナリオを含む、新しい攻撃戦略を特定する。

論文の概要: MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

関連論文リスト