Fugu-MT 論文翻訳(概要): Universal Camouflage Attack on Vision-Language Models for Autonomous Driving

論文の概要: Universal Camouflage Attack on Vision-Language Models for Autonomous Driving

arxiv url: http://arxiv.org/abs/2509.20196v1
Date: Wed, 24 Sep 2025 14:52:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-25 20:53:19.86331
Title: Universal Camouflage Attack on Vision-Language Models for Autonomous Driving
Title（参考訳）: 自律走行のためのビジョンランゲージモデルに対するユニバーサルカモフラージュ攻撃
Authors: Dehong Kong, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, Wenqi Ren,
Abstract要約: 自動運転のためのビジュアル言語モデリングが、有望な研究方向として浮上している。 VLM-ADは、敵の攻撃による深刻なセキュリティ脅威に弱いままである。 VLM-ADのための最初のユニバーサルカモフラージュ攻撃フレームワークを提案する。
参考スコア（独自算出の注目度）: 67.34987318443761
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual language modeling for automated driving is emerging as a promising research direction with substantial improvements in multimodal reasoning capabilities. Despite its advanced reasoning abilities, VLM-AD remains vulnerable to serious security threats from adversarial attacks, which involve misleading model decisions through carefully crafted perturbations. Existing attacks have obvious challenges: 1) Physical adversarial attacks primarily target vision modules. They are difficult to directly transfer to VLM-AD systems because they typically attack low-level perceptual components. 2) Adversarial attacks against VLM-AD have largely concentrated on the digital level. To address these challenges, we propose the first Universal Camouflage Attack (UCA) framework for VLM-AD. Unlike previous methods that focus on optimizing the logit layer, UCA operates in the feature space to generate physically realizable camouflage textures that exhibit strong generalization across different user commands and model architectures. Motivated by the observed vulnerability of encoder and projection layers in VLM-AD, UCA introduces a feature divergence loss (FDL) that maximizes the representational discrepancy between clean and adversarial images. In addition, UCA incorporates a multi-scale learning strategy and adjusts the sampling ratio to enhance its adaptability to changes in scale and viewpoint diversity in real-world scenarios, thereby improving training stability. Extensive experiments demonstrate that UCA can induce incorrect driving commands across various VLM-AD models and driving scenarios, significantly surpassing existing state-of-the-art attack methods (improving 30\% in 3-P metrics). Furthermore, UCA exhibits strong attack robustness under diverse viewpoints and dynamic conditions, indicating high potential for practical deployment.
Abstract（参考訳）: 自動運転のためのビジュアル言語モデリングは、マルチモーダル推論機能を大幅に改善した、有望な研究方向として現れつつある。高度な推論能力にもかかわらず、VLM-ADは敵の攻撃による深刻なセキュリティ上の脅威に弱いままであり、慎重に構築された摂動を通じてモデル決定を誤解させる。既存の攻撃には明らかな課題がある。 1)身体的敵攻撃は主に視覚モジュールを標的とした。通常は低レベルの知覚コンポーネントを攻撃するため、VLM-ADシステムへの直接転送は困難である。 2) VLM-ADに対する攻撃はデジタルレベルに大きく集中している。これらの課題に対処するために、VLM-ADのための最初のユニバーサルカモフラージュ攻撃(UCA)フレームワークを提案する。ロジット層を最適化することに集中する従来の方法とは異なり、UCAは機能領域で動作し、物理的に実現可能なカモフラージュテクスチャを生成し、異なるユーザコマンドやモデルアーキテクチャにわたって強力な一般化を示す。 UCAは、VLM-ADにおけるエンコーダとプロジェクション層の脆弱性に触発され、クリーン画像と逆画像の表現差を最大化する特徴分散損失(FDL)を導入した。さらに、UCAはマルチスケール学習戦略を導入し、サンプル比を調整し、実世界のシナリオにおけるスケールや視点の多様性の変化への適応性を高め、トレーニングの安定性を向上させる。大規模な実験により、UCAは様々なVLM-ADモデルと運転シナリオの不正な運転コマンドを誘導でき、既存の最先端攻撃手法(3Pメトリクスで30倍)を大幅に上回っている。さらに、UCAは多様な視点と動的条件の下で強力な攻撃堅牢性を示し、実用的な展開の可能性を示している。

論文の概要: Universal Camouflage Attack on Vision-Language Models for Autonomous Driving

関連論文リスト