Fugu-MT 論文翻訳(概要): Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

論文の概要: Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2604.01618v1
Date: Thu, 02 Apr 2026 04:55:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.363379
Title: Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models
Title（参考訳）: Tex3D:ビジョン・ランゲージ・アクションモデルのための逆3次元テクスチャによる攻撃面としての物体
Authors: Jiawei Chen, Simin Huang, Jiawei Du, Shuaihang Chen, Yu Tian, Mingjie Wei, Chao Yu, Zhaoxia Yin,
Abstract要約: 本稿では,FBD(Fbeground-Background Decoupling)を導入し,両面アライメントによるテクスチャ最適化を実現する。本稿では,行動クリティカルなフレームを優先するトラジェクティブ・アウェア・アドリア最適化(TAAO)を提案する。 Tex3Dは、複数の操作タスク間でVLA性能を著しく低下させ、最大96.7%のタスク故障率を達成する。
参考スコア（独自算出の注目度）: 24.367080151871487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments. Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance, making it difficult to optimize through an end-to-end manner. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. To further ensure that the attack remains effective across long-horizon and diverse viewpoints in the physical world, we propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization. Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7\%. Our empirical results expose critical vulnerabilities of VLA systems to physically grounded 3D adversarial attacks and highlight the need for robustness-aware training.
Abstract（参考訳）: 視覚言語アクション(VLA)モデルは、ロボット操作において強力な性能を示しているが、物理的に実現可能な敵攻撃に対する堅牢性は、まだ探索されていない。既存の研究では、言語摂動と2次元視覚攻撃による脆弱性が明らかにされているが、これらの攻撃面は実際の展開を代表していないか、物理的リアリズムに制限されているかのいずれかである。対照的に、敵の3Dテクスチャは、自然に操作された物体に付着し、物理的環境への展開が容易であるため、より物理的に可塑性で有害な脅威を引き起こす。敵の3DテクスチャをVLAシステムに持ち込むことは、それでも簡単ではない。中心的な障害は、標準の3Dシミュレータは、VLAの目的関数からオブジェクトの外観への微分可能な最適化パスを提供していないため、エンドツーエンドの方法での最適化が難しいことである。そこで本研究では,FBD(フォアグラウンド・バックグラウンド・デカップリング)を導入し,元のシミュレーション環境を保ちながら,デュアルレンダアライメントによるテクスチャ最適化を実現する。そこで本研究では,行動クリティカルなフレームを優先し,頂点に基づくパラメータ化によって最適化を安定化するTrajectory-Aware Adversarial Optimization (TAAO)を提案する。これらの設計に基づいて構築されたTex3Dは,VLAシミュレーション環境内で直接3次元対角テクスチャをエンドツーエンドに最適化する最初のフレームワークである。シミュレーションと実ロボット設定の両方の実験により、Tex3Dは複数の操作タスク間でのVLA性能を著しく低下させ、最大96.7\%のタスク故障率を達成した。我々の経験的結果は、VLAシステムの物理的に接地された3次元敵攻撃に対する致命的な脆弱性を明らかにし、ロバストネスを意識したトレーニングの必要性を強調している。

論文の概要: Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

関連論文リスト