Fugu-MT 論文翻訳(概要): AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

論文の概要: AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

arxiv url: http://arxiv.org/abs/2605.26269v1
Date: Mon, 25 May 2026 18:53:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.366305
Title: AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents
Title（参考訳）: AgentSecBench: LLMエージェントにおけるプロンプトインジェクション、プライバシリーク、ツール使用のインテリジェンスの測定
Authors: Faruk Alpay, Taylan Alpay,
Abstract要約: 本稿では,AgentSecBenchを,この問題に対する正式なセキュリティフレームワークの実証的なインスタンス化として紹介する。 3つのゲーム・インストラクション・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス(英語版)・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス(英語版)・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス・インテリジェンス(英語版)を定めている。これは、承認された観察と能力に対するプロジェクションとしてのアプリケーションポリシーを表し、プロジェクションの即時アノテーションとプロジェクションの強化を区別し、敵のアドバンテージと、防衛が生成前に関連するモデル可視チャネルを閉鎖するかどうかを計測する。
参考スコア（独自算出の注目度）: 0.2864713389096699
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM agents process trusted instructions, retrieved records, and tool observations through a common generative channel. This conflates data flow with authority: an untrusted string can affect a secret-bearing response or an action proposal even when no application policy authorizes that influence. We introduce AgentSecBench as an empirical instantiation of a formal security framework for this problem. The framework defines three games-instruction-integrity, retrieval-confidentiality, and capability-integrity-under a common notion of intent-to-execution noninterference with permitted leakage. It represents an application policy as a projection onto authorized observations and capabilities, distinguishes prompt annotations from enforcing projections, and measures both adversarial advantage and whether a defense closes the relevant model-visible channel before generation. The exact-marker experiments are intentionally one observable instantiation of the games rather than a complete semantic security claim: they test disclosure and forbidden-action distinguishers with unambiguous ground truth. We evaluate six defense classes with Qwen3-0.6B and Qwen3-1.7B on paired adversarial and benign-control executions. The measurements show when risk reduction follows channel closure and when a model-visible adversarial capability remains exploitable. The result is a security-oriented evaluation method: prompt text can describe a boundary, whereas provenance projection, capability restriction, and output validation can enforce one.
Abstract（参考訳）: LLMエージェントは、信頼された命令、検索された記録、および共通の生成チャネルを通してツール観察を処理する。信頼できない文字列は、アプリケーションポリシーがその影響を承認していない場合でも、シークレット・ベアリング・レスポンスやアクション・プロポーザルに影響を与える可能性がある。本稿では,AgentSecBenchを,この問題に対する正式なセキュリティフレームワークの実証的なインスタンス化として紹介する。このフレームワークは、3つのゲーム・インストラクション・インストラクション・インテリジェンス、検索・インテリジェンス、そして能力・インテリジェンス・インテリジェンスを、許可されたリークを伴うインテント・トゥ・エグゼクション・ノン・インターオペラレーションという共通の概念の下で定義する。これは、承認された観察と能力に対するプロジェクションとしてのアプリケーションポリシーを表し、プロジェクションの即時アノテーションとプロジェクションの強化を区別し、敵のアドバンテージと、防衛が生成前に関連するモデル可視チャネルを閉鎖するかどうかを計測する。正確なマーカー実験は、完全なセマンティック・セキュリティ・クレームではなく、意図的にゲームの観測可能なインスタンス化である。 Qwen3-0.6BとQwen3-1.7Bの2対対の対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対この測定は、リスク低減がチャネル閉鎖に続く場合と、モデル可視の敵の能力が悪用される場合を示す。プロンプトテキストはバウンダリを記述することができるが、プロヴァンスプロジェクション、能力制限、出力検証はバウンダリを強制することができる。

論文の概要: AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

関連論文リスト