Related papers: ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents

ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents

URL: http://arxiv.org/abs/2509.16645v1
Date: Sat, 20 Sep 2025 11:48:11 GMT
Title: ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
Authors: Yichen Wang, Hangtao Zhang, Hewen Pan, Ziqi Zhou, Xianlong Wang, Peijin Guo, Lulu Xue, Shengshan Hu, Minghui Li, Leo Yu Zhang,
Abstract summary: Vision-Language Models (VLMs) are widely used in embodied decision-making tasks.<n>Recent research has explored adversarial attacks on VLMs to reveal their vulnerabilities.<n>We propose a fine-grained adversarial attack framework, ADVEDM, which modifies the VLM's perception of only a few key objects.
Score: 40.066839771776046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Models (VLMs), with their strong reasoning and planning capabilities, are widely used in embodied decision-making (EDM) tasks in embodied agents, such as autonomous driving and robotic manipulation. Recent research has increasingly explored adversarial attacks on VLMs to reveal their vulnerabilities. However, these attacks either rely on overly strong assumptions, requiring full knowledge of the victim VLM, which is impractical for attacking VLM-based agents, or exhibit limited effectiveness. The latter stems from disrupting most semantic information in the image, which leads to a misalignment between the perception and the task context defined by system prompts. This inconsistency interrupts the VLM's reasoning process, resulting in invalid outputs that fail to affect interactions in the physical world. To this end, we propose a fine-grained adversarial attack framework, ADVEDM, which modifies the VLM's perception of only a few key objects while preserving the semantics of the remaining regions. This attack effectively reduces conflicts with the task context, making VLMs output valid but incorrect decisions and affecting the actions of agents, thus posing a more substantial safety threat in the physical world. We design two variants of based on this framework, ADVEDM-R and ADVEDM-A, which respectively remove the semantics of a specific object from the image and add the semantics of a new object into the image. The experimental results in both general scenarios and EDM tasks demonstrate fine-grained control and excellent attack performance.

Related papers

AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models [60.39655329875822]
Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks.<n>Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear.<n>We propose AttackVLA, a unified framework that aligns with the VLA development lifecycle.
arXiv Detail & Related papers (2025-11-15T10:30:46Z)
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning [89.1856483797116]
We introduce BEAT, the first framework to inject visual backdoors into MLLM-based embodied agents.<n>Unlike textual triggers, object triggers exhibit wide variation across viewpoints and lighting, making them difficult to implant reliably.<n>BEAT achieves attack success rates up to 80%, while maintaining strong benign task performance.
arXiv Detail & Related papers (2025-10-31T16:50:49Z)
Universal Camouflage Attack on Vision-Language Models for Autonomous Driving [67.34987318443761]
Visual language modeling for automated driving is emerging as a promising research direction.<n>VLM-AD remains vulnerable to serious security threats from adversarial attacks.<n>We propose the first Universal Camouflage Attack framework for VLM-AD.
arXiv Detail & Related papers (2025-09-24T14:52:01Z)
Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z)
TRAP: Targeted Redirecting of Agentic Preferences [3.6293956720749425]
We introduce TRAP, a generative adversarial framework that manipulates the agent's decision-making using diffusion-based semantic injections.<n>Our method combines negative prompt-based degradation with positive semantic optimization, guided by a Siamese semantic network and layout-aware spatial masking.<n>TRAP achieves a 100% attack success rate on leading models, including LLaVA-34B, Gemma3, and Mistral-3.1.
arXiv Detail & Related papers (2025-05-29T14:57:16Z)
Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion [56.566914768257035]
We present Adversarial Object Fusion (AdvOF), a novel attack framework targeting vision-and-language navigation (VLN) agents in service-oriented environments.<n>We show AdvOF can effectively degrade agent performance under adversarial conditions while maintaining minimal interference with normal navigation tasks.<n>This work advances the understanding of service security in VLM-powered navigation systems, providing computational foundations for robust service composition in physical-world deployments.
arXiv Detail & Related papers (2025-05-29T09:14:50Z)
Attention! You Vision Language Model Could Be Maliciously Manipulated [5.504125658123538]
We propose a novel Vision-language model Manipulation Attack (VMA)<n>VMA integrates first-order and second-order momentum optimization techniques with a differentiable transformation mechanism to effectively optimize the adversarial perturbation.<n>It can be leveraged to implement various attacks, such as jailbreaking, hijacking, privacy breaches, Denial-of-Service, and the generation of sponge examples.
arXiv Detail & Related papers (2025-05-26T12:38:58Z)
Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems.<n>We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics.<n>We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z)
MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents [18.1558732924808]
We reformulate physical adversarial attacks as a one-shot patch generation problem.<n>Our approach generates adversarial patches through a deep generative model.<n>We propose MAGIC, a novel framework powered by multi-modal LLM agents.
arXiv Detail & Related papers (2024-12-11T01:41:19Z)
Visual Adversarial Attack on Vision-Language Models for Autonomous Driving [34.520523134588345]
Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by enhancing reasoning capabilities.<n>These models remain highly vulnerable to adversarial attacks.<n>We propose ADvLM, the first visual adversarial attack framework specifically designed for ADVLMs.
arXiv Detail & Related papers (2024-11-27T12:09:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.