Fugu-MT 論文翻訳(概要): Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

論文の概要: Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

arxiv url: http://arxiv.org/abs/2604.08766v1
Date: Thu, 09 Apr 2026 21:06:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.584023
Title: Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
Title（参考訳）: 視線追跡:VLMによる走査パス予測におけるバックドアアタック
Authors: Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh, Fatima Anwar, Salma Elmalaki,
Abstract要約: VLMに基づくスキャンパス予測に対するバックドアアタックの最初の研究について述べる。提案手法は, 連続的な出力空間における検出可能なクラスタリングを創出するが, 有効な固定パス攻撃が可能であることを示す。さらに、バックドアの挙動は、フラッグシップとレガシーの両方のコモディティスマートフォン上での量子化と展開を生き残ることを実証する。
参考スコア（独自算出の注目度）: 6.034235164126964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across three trigger modalities (visual, textual, and multimodal), multiple poisoning ratios, and five post-training defenses, finding that no defense simultaneously suppresses the attacks and preserves clean performance across all configurations. We further demonstrate that backdoor behavior survives quantization and deployment on both flagship and legacy commodity smartphones, confirming practical threat viability for edge-deployed gaze-driven systems.
Abstract（参考訳）: Scanpath予測モデルは、視覚検索中の人間の修正の順序とタイミングを予測し、その完全性が第一級セキュリティ上の懸念事項であるモバイルシステムにおいて、ファベートされたレンダリングと注意に基づくインタラクションを駆動する。本稿では,GazeFormerとCOCO-Search18を用いて,VLMによるスキャンパス予測に対するバックドア攻撃について検討した。提案手法は, 連続的な出力空間における検出可能なクラスタリングを創出するが, 有効な固定パス攻撃が可能であることを示す。これを解決するために,攻撃対象物に対して予測固定をリダイレクトする入力対応空間攻撃と,視覚検索完了を遅らせるために固定期間を膨らませるスキャンパス攻撃という2つの可変出力攻撃を設計した。どちらの攻撃も入力シーンに出力を条件付け、クラスタベースの検出を回避できる多種多様な可塑性スキャンパスを生成する。我々は3つのトリガーモード(視覚、テキスト、マルチモーダル)、複数の毒素比、および5つのポストトレーニング防御について評価し、同時に防御が攻撃を抑え、全ての構成でクリーンな性能を維持することが確認された。さらに,バックドアの挙動がフラッグシップとレガシの両コモディティ・スマートフォン上での定量化と展開を継続し,エッジ展開型視線駆動システムの実用的脅威生存性を確認することを実証した。

論文の概要: Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

関連論文リスト