Related papers: Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

URL: http://arxiv.org/abs/2411.13587v2
Date: Fri, 22 Nov 2024 03:16:40 GMT
Title: Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Authors: Taowen Wang, Dongfang Liu, James Chenhao Liang, Wenhao Yang, Qifan Wang, Cheng Han, Jiebo Luo, Ruixiang Tang,
Abstract summary: This paper systematically quantifies the robustness of VLA-based robotic systems. We introduce an untargeted position-aware attack objective that leverages spatial foundations to destabilize robotic actions. We also design an adversarial patch generation approach that places a small, colorful patch within the camera's view, effectively executing the attack in both digital and physical environments.
Score: 70.93622520400385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently in robotics, Vision-Language-Action (VLA) models have emerged as a transformative approach, enabling robots to execute complex tasks by integrating visual and linguistic inputs within an end-to-end learning framework. While VLA models offer significant capabilities, they also introduce new attack surfaces, making them vulnerable to adversarial attacks. With these vulnerabilities largely unexplored, this paper systematically quantifies the robustness of VLA-based robotic systems. Recognizing the unique demands of robotic execution, our attack objectives target the inherent spatial and functional characteristics of robotic systems. In particular, we introduce an untargeted position-aware attack objective that leverages spatial foundations to destabilize robotic actions, and a targeted attack objective that manipulates the robotic trajectory. Additionally, we design an adversarial patch generation approach that places a small, colorful patch within the camera's view, effectively executing the attack in both digital and physical environments. Our evaluation reveals a marked degradation in task success rates, with up to a 100\% reduction across a suite of simulated robotic tasks, highlighting critical security gaps in current VLA architectures. By unveiling these vulnerabilities and proposing actionable evaluation metrics, this work advances both the understanding and enhancement of safety for VLA-based robotic systems, underscoring the necessity for developing robust defense strategies prior to physical-world deployments.

Related papers

Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations [76.79742393097358]
Vision-Language Action (VLAs) models promise to extend the remarkable success of vision-language models (VLMs) to robotics.<n>Existing fine-tuning methods lack specificity, adapting the same set of parameters regardless of a task's visual, linguistic, and physical characteristics.<n>Inspired by functional specificity in neuroscience, we hypothesize that it is more effective to finetune sparse model representations specific to a given task.
arXiv Detail & Related papers (2025-11-27T18:50:21Z)
AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models [60.39655329875822]
Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks.<n>Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear.<n>We propose AttackVLA, a unified framework that aligns with the VLA development lifecycle.
arXiv Detail & Related papers (2025-11-15T10:30:46Z)
Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots [0.6119773373677944]
This work demonstrates the feasibility of deploying small Vision-Language Models (VLMs) on mobile robots to achieve real-time scene understanding and reasoning under strict computational constraints.<n>Unlike prior approaches that separate perception from mobility, the proposed framework enables simultaneous movement and reasoning in dynamic environments using only on-board hardware.
arXiv Detail & Related papers (2025-11-07T17:49:14Z)
Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots [5.993870098970107]
Vision-Language-Action (VLA) models have been proposed as a language guided generalized control framework for real robots.<n>We present the deployment of a VLA model on a soft continuum manipulator to demonstrate autonomous safe human-robot interaction.
arXiv Detail & Related papers (2025-10-20T10:06:39Z)
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models [124.02734355214325]
Vision-Language-Action (VLA) models are driving rapid progress in robotics.<n> adversarial images can "freeze" VLA models and cause them to ignore subsequent instructions.<n>FreezeVLA generates and evaluates action-freezing attacks via min-max bi-level optimization.
arXiv Detail & Related papers (2025-09-24T08:15:28Z)
ANNIE: Be Careful of Your Robots [48.89876809734855]
We present the first systematic study of adversarial safety attacks on embodied AI systems.<n>We show attack success rates exceeding 50% across all safety categories.<n>Results expose a previously underexplored but highly consequential attack surface in embodied AI systems.
arXiv Detail & Related papers (2025-09-03T15:00:28Z)
ROSA: Harnessing Robot States for Vision-Language and Action Alignment [24.426285156386715]
Vision-Language Models (VLMs) have made significant advance in end-to-end robotic control.<n>We propose a novel training paradigm, ROSA, which leverages robot state estimation to improve alignment between vision-language and action spaces.
arXiv Detail & Related papers (2025-06-16T16:34:20Z)
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review [4.540236408836132]
We present the first systematic review of the integration of foundation models in mobile service robotics.<n>We explore the role of such models in enabling real-time sensor fusion, language-conditioned control, and adaptive task execution.<n>We also discuss real-world applications in the domestic assistance, healthcare, and service automation sectors.
arXiv Detail & Related papers (2025-05-26T20:08:09Z)
Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation [23.805401747928745]
This paper proposes a novel adversarial prompt attack tailored to language-conditioned robotic models. We demonstrate that existing adversarial techniques exhibit limited effectiveness when directly transferred to the robotic domain. We identify the beneficial impact of intermediate features on adversarial attacks and leverage the negative gradient of intermediate self-attention features to further enhance attack efficacy.
arXiv Detail & Related papers (2024-11-21T02:46:04Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
A Study on Prompt Injection Attack Against LLM-Integrated Mobile Robotic Systems [4.71242457111104]
Large Language Models (LLMs) can process multi-modal prompts, enabling them to generate more context-aware responses. One of the primary concerns is the potential security risks associated with using LLMs in robotic navigation tasks. This study investigates the impact of prompt injections on mobile robot performance in LLM-integrated systems.
arXiv Detail & Related papers (2024-08-07T02:48:22Z)
Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability. We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z)
Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems [27.316115171846953]
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied AI. LLMs are fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications. This fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems.
arXiv Detail & Related papers (2024-05-27T17:59:43Z)
Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z)
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z)
Bridging Active Exploration and Uncertainty-Aware Deployment Using Probabilistic Ensemble Neural Network Dynamics [11.946807588018595]
This paper presents a unified model-based reinforcement learning framework that bridges active exploration and uncertainty-aware deployment. The two opposing tasks of exploration and deployment are optimized through state-of-the-art sampling-based MPC. We conduct experiments on both autonomous vehicles and wheeled robots, showing promising results for both exploration and deployment.
arXiv Detail & Related papers (2023-05-20T17:20:12Z)
SABER: Data-Driven Motion Planner for Autonomously Navigating Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal. We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution. A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z)
Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z)
Symbiotic System Design for Safe and Resilient Autonomous Robotics in Offshore Wind Farms [3.5409202655473724]
Barriers to Beyond Visual Line of Sight (BVLOS) robotics include operational safety compliance and resilience. We propose a symbiotic system; reflecting the lifecycle learning and co-evolution with knowledge sharing for mutual gain of robotic platforms and remote human operators. Our methodology enables the run-time verification of safety, reliability and resilience during autonomous missions.
arXiv Detail & Related papers (2021-01-23T11:58:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.