IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding
- URL: http://arxiv.org/abs/2508.09456v2
- Date: Tue, 16 Sep 2025 07:37:39 GMT
- Title: IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding
- Authors: Junxian Li, Beining Xu, Di Zhang,
- Abstract summary: We introduce a novel input-aware backdoor attack method, IAG, to manipulate the grounding behavior of vision-language models.<n>We propose an adaptive trigger generator that embeds the semantic information of the attack target's description into the original image.<n>IAG is evaluated theoretically and empirically, demonstrating its feasibility and effectiveness.
- Score: 7.59817316342973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language models (VLMs) have shown significant advancements in tasks such as visual grounding, where they localize specific objects in images based on natural language queries and images. However, security issues in visual grounding tasks for VLMs remain underexplored, especially in the context of backdoor attacks. In this paper, we introduce a novel input-aware backdoor attack method, IAG, designed to manipulate the grounding behavior of VLMs. This attack forces the model to ground a specific target object in the input image, regardless of the user's query. We propose an adaptive trigger generator that embeds the semantic information of the attack target's description into the original image using a text-conditional U-Net, thereby overcoming the open-vocabulary attack challenge. To ensure the attack's stealthiness, we utilize a reconstruction loss to minimize visual discrepancies between poisoned and clean images. Additionally, we introduce a unified method for generating attack data. IAG is evaluated theoretically and empirically, demonstrating its feasibility and effectiveness. Notably, our ASR@0.5 on InternVL-2.5-8B reaches over 65\% on various testing sets. IAG also shows promising potential on manipulating Ferret-7B and LlaVA-1.5-7B with very little accuracy decrease on clean samples. Extensive specific experiments, such as ablation study and potential defense, also indicate the robustness and transferability of our attack.
Related papers
- Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning [5.0734761482919115]
Open-vocabulary object detectors (OVODs) unify vision and language to detect arbitrary object categories based on text prompts.<n>We conduct the first study of backdoor attacks on OVODs and reveal a new attack surface introduced by prompt tuning.
arXiv Detail & Related papers (2025-11-16T19:05:31Z) - TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models [63.51290426425441]
A backdoored VLA agent can be covertly triggered by a pre-injected backdoor to execute adversarial actions.<n>We study targeted backdoor attacks on VLA models and introduce TabVLA, a novel framework that enables such attacks via black-box fine-tuning.<n>Our work highlights the vulnerability of VLA models to targeted backdoor manipulation and underscores the need for more advanced defenses.
arXiv Detail & Related papers (2025-10-13T02:45:48Z) - Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [34.286224884047385]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z) - Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation [32.24294112337828]
BadSem is a data poisoning attack that injects backdoors by deliberately misaligning image-text pairs during training.<n>Our experiments show that BadSem achieves over 98% average ASR, generalizes well to out-of-distribution datasets, and can transfer across poisoning modalities.<n>Our findings highlight the urgent need to address semantic vulnerabilities in Vision Language Models for their safer deployment.
arXiv Detail & Related papers (2025-06-08T16:40:40Z) - SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs [57.880467106470775]
Attackers can inject imperceptible perturbations into the training data, causing the model to generate malicious, attacker-controlled captions.<n>We propose Semantic Reward Defense (SRD), a reinforcement learning framework that mitigates backdoor behavior without prior knowledge of triggers.<n>SRD uses a Deep Q-Network to learn policies for applying discrete perturbations to sensitive image regions, aiming to disrupt the activation of malicious pathways.
arXiv Detail & Related papers (2025-06-05T08:22:24Z) - Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector [97.92369017531038]
We build a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR)
We then develop a novel iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of Visual Language Models (VLMs) to achieve the detection of adversarial images against benign ones in the input.
arXiv Detail & Related papers (2024-10-30T10:33:10Z) - Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion [4.036142985883415]
Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Networks (CNN) across various computer vision tasks.
ViTs are vulnerable to backdoor attacks, where an adversary embeds a backdoor into the victim model.
We propose an Attention Gradient-based Erosion Backdoor (AGEB) targeted at ViTs.
arXiv Detail & Related papers (2024-10-30T04:06:12Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.