Fugu-MT 論文翻訳(概要): Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics

論文の概要: Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics

arxiv url: http://arxiv.org/abs/2606.03556v1
Date: Tue, 02 Jun 2026 12:19:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.989262
Title: Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics
Title（参考訳）: ロボットの視覚・言語・行動モデルに対する部分的に観測可能な逆パッチ攻撃
Authors: Xiaofei Wang, Mingliang Han, Tianyu Hao, Yi Yang, Yun-Bo Zhao, Keke Tang,
Abstract要約: ヴィジュアル・ランゲージ・アクション(VLA)モデルはロボット工学において注目を集めているが、敵の攻撃に対するロバスト性はほとんど解明されていない。部分的に観測可能な脅威モデルを定式化し、敵は軌跡の短いプレフィックスのみを利用でき、その後の全てのフレームに適用する固定パッチを生成する。まず、モデルの注意マップを用いてパッチをローカライズし、フルインストラクションに対応する視覚的に重要な領域を特定する。次に,対象対象物のセマンティックグラウンドを乱すパッチを最適化し,動作軌跡の曲率を増加させる。
参考スコア（独自算出の注目度）: 21.834006622805678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language-action (VLA) models are gaining attention in robotics, yet their robustness to adversarial attacks remains largely unexplored. Existing work shows that adversarial patches can mislead VLA-based robots but assumes full access to the entire execution trajectory, an unrealistic requirement in practice. We address this limitation by formulating a partially observable threat model, where the adversary can exploit only a short prefix of the trajectory to generate a fixed patch applied to all subsequent frames. Under this setting, we propose a two-phase framework. First, we localize the patch using the model's attention maps to identify visually critical regions that correspond to the full instruction. Then, we optimize the patch to disrupt the semantic grounding of target objects and increase the curvature of action trajectories, thereby compounding failures in both perception and control. Extensive experiments in simulation and real-world robotic environments show that our method sustains adversarial effects under partial observability, inducing long-horizon disruptions and significantly reducing task success rates.
Abstract（参考訳）: ヴィジュアル・ランゲージ・アクション(VLA)モデルはロボット工学において注目を集めているが、敵の攻撃に対するロバスト性はほとんど解明されていない。既存の研究によると、対向パッチはVLAベースのロボットを誤解させる可能性があるが、実際には非現実的な要求である実行軌跡全体への完全なアクセスを前提としている。この制限は、部分的に観測可能な脅威モデルを定式化し、敵は軌跡の短いプレフィックスのみを利用でき、その後の全てのフレームに適用する固定パッチを生成する。そこで本研究では,2段階の枠組みを提案する。まず、モデルの注意マップを用いてパッチをローカライズし、フルインストラクションに対応する視覚的に重要な領域を特定する。そして,対象対象物のセマンティックグラウンドを乱すパッチを最適化し,動作軌跡の曲率を高めることにより,知覚と制御の両面での障害を複雑化する。シミュレーションおよび実世界のロボット環境における広範囲な実験により,本手法は部分観測可能性下での逆効果を持続し,長期破壊を誘発し,タスク成功率を著しく低下させることが示された。

論文の概要: Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics

関連論文リスト