Fugu-MT 論文翻訳(概要): Phantasia: Context-Adaptive Backdoors in Vision Language Models

論文の概要: Phantasia: Context-Adaptive Backdoors in Vision Language Models

arxiv url: http://arxiv.org/abs/2604.08395v1
Date: Thu, 09 Apr 2026 15:55:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:06.007896
Title: Phantasia: Context-Adaptive Backdoors in Vision Language Models
Title（参考訳）: Phantasia: 視覚言語モデルにおけるコンテキスト適応型バックドア
Authors: Nam Duong Tran, Phi Le Nguyen,
Abstract要約: 我々は,既存のVLMバックドア攻撃のステルス性が著しく過大評価されていることを初めて示す。当初、他のドメイン向けに設計された防御技術を適用することで、いくつかの最先端攻撃を驚くほど簡単に検出できることが示される。 Phantasiaはコンテキスト適応型バックドアアタックで、有害な出力を各入力のセマンティクスと動的に一致させる。
参考スコア（独自算出の注目度）: 7.183268823159973
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Vision-Language Models (VLMs) have greatly enhanced the integration of visual perception and linguistic reasoning, driving rapid progress in multimodal understanding. Despite these achievements, the security of VLMs, particularly their vulnerability to backdoor attacks, remains significantly underexplored. Existing backdoor attacks on VLMs are still in an early stage of development, with most current methods relying on generating poisoned responses that contain fixed, easily identifiable patterns. In this work, we make two key contributions. First, we demonstrate for the first time that the stealthiness of existing VLM backdoor attacks has been substantially overestimated. By adapting defense techniques originally designed for other domains (e.g., vision-only and text-only models), we show that several state-of-the-art attacks can be detected with surprising ease. Second, to address this gap, we introduce Phantasia, a context-adaptive backdoor attack that dynamically aligns its poisoned outputs with the semantics of each input. Instead of producing static poisoned patterns, Phantasia encourages models to generate contextually coherent yet malicious responses that remain plausible, thereby significantly improving stealth and adaptability. Extensive experiments across diverse VLM architectures reveal that Phantasia achieves state-of-the-art attack success rates while maintaining benign performance under various defensive settings.
Abstract（参考訳）: VLM(Vision-Language Models)の最近の進歩は、視覚認識と言語推論の統合を大幅に強化し、マルチモーダル理解の急速な進歩を促している。これらの成果にもかかわらず、VLMのセキュリティ、特にバックドア攻撃に対する脆弱性は、明らかに過小評価されている。既存のVLMに対するバックドア攻撃はまだ初期段階にあり、現在のほとんどの方法は、固定的で容易に識別できるパターンを含む有毒な反応を生成することに依存している。この作業では,2つの重要なコントリビューションを行います。まず,既存のVLMバックドア攻撃のステルス性が著しく過大評価されていることを示す。当初、他のドメイン(例えば、ビジョンオンリー、テキストオンリーのモデル)向けに設計された防御技術を適用することで、いくつかの最先端攻撃を驚くほど簡単に検出できることを示す。第二に、このギャップに対処するために、各入力のセマンティクスと有害な出力を動的に調整するコンテキスト適応型バックドアアタックであるPhantasiaを導入します。静的な毒性パターンを生成する代わりに、Phantasiaはモデルに対して、コンテキスト的に一貫性のある悪意のある応答を生成することを奨励し、それによってステルスと適応性を大幅に改善する。多様なVLMアーキテクチャにわたる大規模な実験により、パンタシアは様々な防御環境下での良質な性能を維持しながら、最先端の攻撃成功率を達成することが明らかになった。

論文の概要: Phantasia: Context-Adaptive Backdoors in Vision Language Models

関連論文リスト