TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
- URL: http://arxiv.org/abs/2510.10932v1
- Date: Mon, 13 Oct 2025 02:45:48 GMT
- Title: TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
- Authors: Zonghuan Xu, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang,
- Abstract summary: A backdoored VLA agent can be covertly triggered by a pre-injected backdoor to execute adversarial actions.<n>We study targeted backdoor attacks on VLA models and introduce TabVLA, a novel framework that enables such attacks via black-box fine-tuning.<n>Our work highlights the vulnerability of VLA models to targeted backdoor manipulation and underscores the need for more advanced defenses.
- Score: 63.51290426425441
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the growing deployment of Vision-Language-Action (VLA) models in real-world embodied AI systems, their increasing vulnerability to backdoor attacks poses a serious safety threat. A backdoored VLA agent can be covertly triggered by a pre-injected backdoor to execute adversarial actions, potentially causing system failures or even physical harm. Although backdoor attacks on VLA models have been explored, prior work has focused only on untargeted attacks, leaving the more practically threatening scenario of targeted manipulation unexamined. In this paper, we study targeted backdoor attacks on VLA models and introduce TabVLA, a novel framework that enables such attacks via black-box fine-tuning. TabVLA explores two deployment-relevant inference-time threat models: input-stream editing and in-scene triggering. It formulates poisoned data generation as an optimization problem to improve attack effectivess. Experiments with OpenVLA-7B on the LIBERO benchmark reveal that the vision channel is the principal attack surface: targeted backdoors succeed with minimal poisoning, remain robust across variations in trigger design, and are degraded only by positional mismatches between fine-tuning and inference triggers. We also investigate a potential detection-based defense against TabVLA, which reconstructs latent visual triggers from the input stream to flag activation-conditioned backdoor samples. Our work highlights the vulnerability of VLA models to targeted backdoor manipulation and underscores the need for more advanced defenses.
Related papers
- AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models [60.39655329875822]
Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks.<n>Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear.<n>We propose AttackVLA, a unified framework that aligns with the VLA development lifecycle.
arXiv Detail & Related papers (2025-11-15T10:30:46Z) - Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z) - BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization [45.97834622654751]
BadVLA is a backdoor attack method based on Objective-Decoupled Optimization.<n>We show that BadVLA consistently achieves near-100% attack success rates with minimal impact on clean task accuracy.<n>Our work offers the first systematic investigation of backdoor vulnerabilities in VLA models.
arXiv Detail & Related papers (2025-05-22T13:12:46Z) - VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context.
Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities.
Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images.
We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.