AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models
- URL: http://arxiv.org/abs/2511.12149v1
- Date: Sat, 15 Nov 2025 10:30:46 GMT
- Title: AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models
- Authors: Jiayu Li, Yunhan Zhao, Xiang Zheng, Zonghuan Xu, Yige Li, Xingjun Ma, Yu-Gang Jiang,
- Abstract summary: Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks.<n>Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear.<n>We propose AttackVLA, a unified framework that aligns with the VLA development lifecycle.
- Score: 60.39655329875822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of perception, language, and control introduces new safety vulnerabilities. Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear due to the absence of a unified evaluation framework. One major issue is that differences in action tokenizers across VLA architectures hinder reproducibility and fair comparison. More importantly, most existing attacks have not been validated in real-world scenarios. To address these challenges, we propose AttackVLA, a unified framework that aligns with the VLA development lifecycle, covering data construction, model training, and inference. Within this framework, we implement a broad suite of attacks, including all existing attacks targeting VLAs and multiple adapted attacks originally developed for vision-language models, and evaluate them in both simulation and real-world settings. Our analysis of existing attacks reveals a critical gap: current methods tend to induce untargeted failures or static action states, leaving targeted attacks that drive VLAs to perform precise long-horizon action sequences largely unexplored. To fill this gap, we introduce BackdoorVLA, a targeted backdoor attack that compels a VLA to execute an attacker-specified long-horizon action sequence whenever a trigger is present. We evaluate BackdoorVLA in both simulated benchmarks and real-world robotic settings, achieving an average targeted success rate of 58.4% and reaching 100% on selected tasks. Our work provides a standardized framework for evaluating VLA vulnerabilities and demonstrates the potential for precise adversarial manipulation, motivating further research on securing VLA-based embodied systems.
Related papers
- TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models [63.51290426425441]
A backdoored VLA agent can be covertly triggered by a pre-injected backdoor to execute adversarial actions.<n>We study targeted backdoor attacks on VLA models and introduce TabVLA, a novel framework that enables such attacks via black-box fine-tuning.<n>Our work highlights the vulnerability of VLA models to targeted backdoor manipulation and underscores the need for more advanced defenses.
arXiv Detail & Related papers (2025-10-13T02:45:48Z) - FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models [124.02734355214325]
Vision-Language-Action (VLA) models are driving rapid progress in robotics.<n> adversarial images can "freeze" VLA models and cause them to ignore subsequent instructions.<n>FreezeVLA generates and evaluates action-freezing attacks via min-max bi-level optimization.
arXiv Detail & Related papers (2025-09-24T08:15:28Z) - Adversarial Attacks on Robotic Vision Language Action Models [118.02118618146568]
We study adversarial attacks on vision-language-action models (VLAs)<n>Our main algorithmic contribution is the adaptation and application of LLM jailbreaking attacks to obtain complete control authority.<n>This differs significantly from LLM jailbreaking literature, as attacks in the real world do not have to be semantically linked to notions of harm.
arXiv Detail & Related papers (2025-06-03T19:43:58Z) - BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization [45.97834622654751]
BadVLA is a backdoor attack method based on Objective-Decoupled Optimization.<n>We show that BadVLA consistently achieves near-100% attack success rates with minimal impact on clean task accuracy.<n>Our work offers the first systematic investigation of backdoor vulnerabilities in VLA models.
arXiv Detail & Related papers (2025-05-22T13:12:46Z) - Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems.<n>We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics.<n>We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z) - Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks [34.40254709148148]
Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding.
Their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks.
We present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update.
arXiv Detail & Related papers (2024-11-24T05:28:07Z) - Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics [68.36528819227641]
This paper systematically evaluates the robustness of Vision-Language-Action (VLA) models.<n>We introduce two untargeted attack objectives that leverage spatial foundations to destabilize robotic actions, and a targeted attack objective that manipulates the robotic trajectory.<n>We design an adversarial patch generation approach that places a small, colorful patch within the camera's view, effectively executing the attack in both digital and physical environments.
arXiv Detail & Related papers (2024-11-18T01:52:20Z) - Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography [21.632703081999036]
Vision-Large-Language-Models (Vision-LLMs) are increasingly being integrated into autonomous driving (AD) systems.
We propose to leverage typographic attacks against AD systems relying on the decision-making capabilities of Vision-LLMs.
arXiv Detail & Related papers (2024-05-23T04:52:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.