Fugu-MT 論文翻訳(概要): Test-Time Attention Purification for Backdoored Large Vision Language Models

論文の概要: Test-Time Attention Purification for Backdoored Large Vision Language Models

arxiv url: http://arxiv.org/abs/2603.12989v1
Date: Fri, 13 Mar 2026 13:45:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:12.101014
Title: Test-Time Attention Purification for Backdoored Large Vision Language Models
Title（参考訳）: バックドア型大規模視覚言語モデルに対するテスト時間アテンションの浄化
Authors: Zhifang Zhang, Bojun Yang, Shuo He, Weitong Chen, Wei Emma Zhang, Olaf Maennel, Lei Feng, Miao Xu,
Abstract要約: 大規模視覚言語モデル(LVLM)におけるバックドア行動の新しい力学的理解を提供する。テスト時に純粋に動作するトレーニングフリーのプラグアンドプレイディフェンスであるCleanSightを提案する。 CleanSightは、さまざまなデータセットとバックドアアタックタイプで、既存のピクセルベースの浄化防御を著しく上回る。
参考スコア（独自算出の注目度）: 23.890959327899925
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where adversaries insert trigger-embedded samples into the training data to implant behaviors that can be maliciously activated at test time. Existing defenses typically rely on retraining backdoored parameters (e.g., adapters or LoRA modules) with clean data, which is computationally expensive and often degrades model performance. In this work, we provide a new mechanistic understanding of backdoor behaviors in LVLMs: the trigger does not influence prediction through low-level visual patterns, but through abnormal cross-modal attention redistribution, where trigger-bearing visual tokens steal attention away from the textual context - a phenomenon we term attention stealing. Motivated by this, we propose CleanSight, a training-free, plug-and-play defense that operates purely at test time. CleanSight (i) detects poisoned inputs based on the relative visual-text attention ratio in selected cross-modal fusion layers, and (ii) purifies the input by selectively pruning the suspicious high-attention visual tokens to neutralize the backdoor activation. Extensive experiments show that CleanSight significantly outperforms existing pixel-based purification defenses across diverse datasets and backdoor attack types, while preserving the model's utility on both clean and poisoned samples.
Abstract（参考訳）: 強力なマルチモーダル性能にもかかわらず、大規模な視覚言語モデル(LVLM)は、バックドア攻撃への微調整において脆弱である。既存のディフェンスは通常、クリーンなデータでバックドアパラメータ(例えばアダプタやLoRAモジュール)を再トレーニングすることに頼っている。本研究は,LVLMのバックドア行動に対する新たな機械的理解を提供する: トリガーは低レベルな視覚パターンによる予測に影響を与えるのではなく,異常なモーダルな注意再分配を通じて,トリガーを付加する視覚トークンがテキストの文脈から注意を盗む現象である。そこで我々は,テスト時に純粋に動作するトレーニングフリーのプラグアンドプレイディフェンスであるCleanSightを提案する。 clean + -sight (i)選択した異種融合層における相対的な視覚的テキストの注意率に基づいて有毒な入力を検出し、 2)不審な視覚トークンを選択的にプルーニングすることで入力を浄化し、バックドアのアクティベーションを中和する。大規模な実験により、CleanSightはさまざまなデータセットやバックドア攻撃タイプで既存のピクセルベースの浄化防御を著しく上回り、クリーンなサンプルと有毒なサンプルの両方でモデルの有用性を保っている。

論文の概要: Test-Time Attention Purification for Backdoored Large Vision Language Models

関連論文リスト