Fugu-MT 論文翻訳(概要): When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents

論文の概要: When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents

arxiv url: http://arxiv.org/abs/2606.22864v1
Date: Mon, 22 Jun 2026 05:13:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 04:04:11.18427
Title: When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents
Title（参考訳）: AUC0.998が不十分な場合--マルチモーダルコンピュータ利用エージェントにおける間接プロンプト注入の隠れた状態プローブの候補評価プロトコル
Authors: Yanhang Li, Zhichao Fan, Zexin Zhuang,
Abstract要約: クリーンvs攻撃分割に関する高確率のAUCは、それ自体が悪意のあるコンテンツ検出の証拠ではないと我々は主張する。本報告では,AUCが行なっている高いクリーンvsアタックを報告し,ライセンスを受けていないことを報告して,診断を候補制御セットとしてパッケージ化する。
参考スコア（独自算出の注目度）: 0.30586855806896046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hidden-state probing -- a linear classifier on a frozen vision-language model's internal activations -- has emerged as an attractive evaluation tool for flagging indirect prompt injection (IPI) in multimodal computer-use agents before the agent emits a corrupted action. We argue, on a single-backbone cautionary case study (Qwen2.5-VL-7B on Mind2Web, teacher-forced replay), that a high probing AUC on a clean-vs-attack split is not, on its own, evidence of malicious-content detection. Two post-hoc diagnostics -- a paired-construction scalar baseline on text-side injections, and same-step nuisance-matched visual controls on the overlay surface -- do not license an unqualified malicious-content interpretation of the headline while leaving room for partly-semantic readings. We package the diagnostics as a candidate control set with reporting heuristics for what a high clean-vs-attack AUC does and does not license. Labels are injection-surface-present, not attack success; generalisation beyond this backbone and benchmark is a conjecture.
Abstract（参考訳）: 凍結した視覚言語モデルの内部アクティベーションの線形分類器であるHidden-state Probingは、エージェントが腐敗したアクションを発行する前に、マルチモーダルコンピュータ使用エージェントに間接的プロンプトインジェクション(IPI)をフラグする魅力的な評価ツールとして登場した。我々は、シングルバックの警告ケーススタディ(Qwen2.5-VL-7B on Mind2Web, teacher-forced replay)において、クリーンvs攻撃分割における高確率AUCは、それ自体は悪意のあるコンテンツ検出の証拠ではないと主張している。 2つのポストホック診断 -- テキストサイドインジェクションのペアコンストラクションスカラーベースラインと、オーバーレイサーフェス上の同じステップのニュアンスマッチングされたビジュアルコントロール -- は、部分的にセマンティックリーディングの余地を残しながら、ヘッドラインの不正なコンテンツ解釈を許可しない。クリーンvs攻撃によるAUCの動作を報告し,ライセンスを受けない報告ヒューリスティックスを備えた候補制御セットとして,診断結果をパッケージ化する。ラベルはインジェクションサーフェスであり、攻撃の成功ではない。

論文の概要: When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents

関連論文リスト