Fugu-MT 論文翻訳(概要): Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models

論文の概要: Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models

arxiv url: http://arxiv.org/abs/2604.03117v1
Date: Fri, 03 Apr 2026 15:42:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.515022
Title: Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models
Title（参考訳）: 物理世界におけるセマンティック・脆弱性の解明:赤外線ビジョンランゲージモデルのためのユニバーサル・アドバイサル・パッチ
Authors: Chengyin Hu, Yuxian Dong, Yikun Guo, Xiang Chen, Junqi Wu, Jiahuan Long, Yiwei Wei, Tingsong Jiang, Wen Yao,
Abstract要約: 近赤外視覚言語モデル(IR-VLM)は、低可視環境におけるマルチモーダル知覚のための有望なパラダイムとして登場した。既存の逆パッチ法は主にRGBベースのモデル用にクローズドセット設定で設計されている。我々は、IR-VLMのためのユニバーサル物理対向パッチフレームワークであるユニバーサルカーブグリッドパッチ(UCGP)を提案する。
参考スコア（独自算出の注目度）: 21.429674567539607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Infrared vision-language models (IR-VLMs) have emerged as a promising paradigm for multimodal perception in low-visibility environments, yet their robustness to adversarial attacks remains largely unexplored. Existing adversarial patch methods are mainly designed for RGB-based models in closed-set settings and are not readily applicable to the open-ended semantic understanding and physical deployment requirements of infrared VLMs. To bridge this gap, we propose Universal Curved-Grid Patch (UCGP), a universal physical adversarial patch framework for IR-VLMs. UCGP integrates Curved-Grid Mesh (CGM) parameterization for continuous, low-frequency, and deployable patch generation with a unified representation-driven objective that promotes subspace departure, topology disruption, and stealth. To improve robustness under real-world deployment and domain shift, we further incorporate Meta Differential Evolution and EOT-augmented TPS deformation modeling. Rather than manipulating labels or prompts, UCGP directly disrupts the visual representation space, weakening cross-modal semantic alignment. Extensive experiments demonstrate that UCGP consistently compromises semantic understanding across diverse IR-VLM architectures while maintaining cross-model transferability, cross-dataset generalization, real-world physical effectiveness, and robustness against defenses. These findings reveal a previously overlooked robustness vulnerability in current infrared multimodal systems.
Abstract（参考訳）: 赤外線視覚言語モデル(IR-VLM)は、低視認性環境でのマルチモーダル知覚のための有望なパラダイムとして登場したが、敵の攻撃に対する堅牢性はほとんど未解明のままである。既存の逆パッチ方式は、主にRGBベースのモデルに対して、クローズドセット設定で設計されており、オープンエンドセマンティック理解や赤外線VLMの物理展開要求に容易に適用できない。このギャップを埋めるために、IR-VLMのためのユニバーサル物理対向パッチフレームワークであるユニバーサルカーブグリッドパッチ(UCGP)を提案する。 UCGPは、連続、低周波、デプロイ可能なパッチ生成のためのCurved-Grid Mesh(CGM)パラメータ化と、サブスペースの離脱、トポロジの破壊、ステルスを促進する統一された表現駆動の目的を統合している。実世界の展開とドメインシフト下でのロバスト性を改善するため,メタ微分進化とEOTによるTPS変形モデリングをさらに取り入れた。ラベルやプロンプトを操作する代わりに、UCGPは視覚表現空間を直接破壊し、モーダル間のセマンティックアライメントを弱める。広範囲にわたる実験により、UCGPは多種多様なIR-VLMアーキテクチャのセマンティック理解を一貫して損なうとともに、クロスモデル転送可能性、クロスデータセットの一般化、現実世界の物理的有効性、防御に対する堅牢性を維持していることが示された。これらの結果から、現在の赤外線マルチモーダルシステムにおいて、これまで見過ごされていた堅牢性脆弱性が明らかとなった。

論文の概要: Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models

関連論文リスト