Related papers: Enhancing Targeted Adversarial Attacks on Large Vision-Language Models via Intermediate Projector

Enhancing Targeted Adversarial Attacks on Large Vision-Language Models via Intermediate Projector

URL: http://arxiv.org/abs/2508.13739v2
Date: Wed, 24 Sep 2025 17:02:50 GMT
Title: Enhancing Targeted Adversarial Attacks on Large Vision-Language Models via Intermediate Projector
Authors: Yiming Cao, Yanjie Li, Kaisheng Liang, Bin Xiao,
Abstract summary: Black-box adversarial attacks pose a particularly severe threat to Large Vision-Language Models (VLMs)<n>We propose a novel black-box targeted attack framework that leverages the projector.<n> Specifically, we utilize the widely adopted Querying Transformer (Q-Former) which transforms global image embeddings into fine-grained query outputs.
Score: 24.390527651215944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing deployment of Large Vision-Language Models (VLMs) raises safety concerns, as adversaries may exploit model vulnerabilities to induce harmful outputs, with targeted black-box adversarial attacks posing a particularly severe threat. However, existing methods primarily maximize encoder-level global similarity, which lacks the granularity for stealthy and practical fine-grained attacks, where only specific target should be altered (e.g., modifying a car while preserving its background). Moreover, they largely neglect the projector, a key semantic bridge in VLMs for multimodal alignment. To address these limitations, we propose a novel black-box targeted attack framework that leverages the projector. Specifically, we utilize the widely adopted Querying Transformer (Q-Former) which transforms global image embeddings into fine-grained query outputs, to enhance attack effectiveness and granularity. For standard global targeted attack scenarios, we propose the Intermediate Projector Guided Attack (IPGA), which aligns Q-Former fine-grained query outputs with the target to enhance attack strength and exploits the intermediate pretrained Q-Former that is not fine-tuned for any specific Large Language Model (LLM) to improve attack transferability. For fine-grained attack scenarios, we augment IPGA with the Residual Query Alignment (RQA) module, which preserves unrelated content by constraining non-target query outputs to enhance attack granularity. Extensive experiments demonstrate that IPGA significantly outperforms baselines in global targeted attacks, and IPGA with RQA (IPGA-R) attains superior success rates and unrelated content preservation over baselines in fine-grained attacks. Our method also transfers effectively to commercial VLMs such as Google Gemini and OpenAI GPT.

Related papers

PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention [63.63231191403825]
Large Vision-Language Models (LVLMs) are foundational to modern multimodal applications, yet their susceptibility to adversarial attacks remains a critical concern.<n>We introduce PA-Attack (Prototype-Anchored Attentive Attack) to tackle the attribute-restricted issue and limited task generalization of vanilla attacks.<n>Experiments show that PA-Attack achieves an average 75.1% score reduction rate (SRR), demonstrating strong attack effectiveness, efficiency, and task generalization in LVLMs.
arXiv Detail & Related papers (2026-02-23T01:20:43Z)
MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models [52.37749859972453]
We propose MTAttack, the first multi-target backdoor attack framework for enforcing accurate multiple trigger-target mappings in LVLMs.<n> Experiments on popular benchmarks demonstrate a high success rate of MTAttack for multi-target attacks.<n>Our attack exhibits strong generalizability across datasets and robustness against backdoor defense strategies.
arXiv Detail & Related papers (2025-11-13T09:00:21Z)
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models [33.120141513366136]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation.<n>Existing effective attacks always focus on task-specific white-box settings.<n>We propose a simple yet effective Vision Attack (VEAttack) which targets the vision encoder of LVLMs only.
arXiv Detail & Related papers (2025-05-23T03:46:04Z)
Transferable Adversarial Attacks on Black-Box Vision-Language Models [63.22532779621001]
adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts.<n>We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information.<n>We discover that universal perturbations -- modifications applicable to a wide set of images -- can consistently induce these misinterpretations.
arXiv Detail & Related papers (2025-05-02T06:51:11Z)
AIM: Additional Image Guided Generation of Transferable Adversarial Attacks [72.24101555828256]
Transferable adversarial examples highlight the vulnerability of deep neural networks (DNNs) to imperceptible perturbations across various real-world applications.<n>In this work, we focus on generative approaches for targeted transferable attacks.<n>We introduce a novel plug-and-play module into the general generator architecture to enhance adversarial transferability.
arXiv Detail & Related papers (2025-01-02T07:06:49Z)
Efficient Multi-modal Large Language Models via Visual Token Grouping [55.482198808206284]
High-resolution images and videos pose a barrier to their broader adoption.<n> compressing vision tokens in MLLMs has emerged as a promising approach to reduce inference costs.<n>We introduce VisToG, a novel grouping mechanism that leverages the capabilities of pre-trained vision encoders to group similar image segments.
arXiv Detail & Related papers (2024-11-26T09:36:02Z)
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks [34.40254709148148]
Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding. Their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks. We present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update.
arXiv Detail & Related papers (2024-11-24T05:28:07Z)
Locality Alignment Improves Vision-Language Models [55.275235524659905]
Vision language models (VLMs) have seen growing adoption in recent years, but many still struggle with basic spatial reasoning errors.<n>Our goal is to resolve this with a vision backbone that effectively captures both local and global image semantics.<n>We propose a new efficient post-training stage for ViTs called locality alignment and a novel fine-tuning procedure called MaskEmbed.
arXiv Detail & Related papers (2024-10-14T21:01:01Z)
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models [15.029014337718849]
Large vision-language models (LVLMs) integrate visual information into large language models, showcasing remarkable multi-modal conversational capabilities. In general, LVLMs rely on vision encoders to transform images into visual tokens, which are crucial for the language models to perceive image contents effectively. We propose a non-targeted attack method referred to as VT-Attack, which constructs adversarial examples from multiple perspectives.
arXiv Detail & Related papers (2024-10-09T09:06:56Z)
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models [39.34959092321762]
Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks.<n>We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks.
arXiv Detail & Related papers (2024-10-07T09:45:18Z)
CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks [52.29186466633699]
Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. textitsingle-target generative attacks train a generator for each target class to generate highly transferable perturbations. textbfCLIP-guided textbfGenerative textbfNetwork with textbfCross-attention modules (CGNC) to enhance multi-target attacks.
arXiv Detail & Related papers (2024-07-14T12:30:32Z)
Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence Truncation [49.480978190805125]
Transfer attacks generate significant interest for black-box applications. Existing works essentially directly optimize the single-level objective w.r.t. surrogate model. We propose a bilevel optimization paradigm, which explicitly reforms the nested relationship between the Upper-Level (UL) pseudo-victim attacker and the Lower-Level (LL) surrogate attacker.
arXiv Detail & Related papers (2024-06-04T07:45:27Z)
Adversarial Robustness for Visual Grounding of Multimodal Large Language Models [49.71757071535619]
Multi-modal Large Language Models (MLLMs) have recently achieved enhanced performance across various vision-language tasks. adversarial robustness of visual grounding remains unexplored in MLLMs. We propose three adversarial attack paradigms as follows.
arXiv Detail & Related papers (2024-05-16T10:54:26Z)
GiT: Towards Generalist Vision Transformer through Universal Language Interface [94.33443158125186]
This paper proposes a simple, yet effective framework, called GiT, simultaneously applicable for various vision tasks only with a vanilla ViT. GiT is a multi-task visual model, jointly trained across five representative benchmarks without task-specific fine-tuning.
arXiv Detail & Related papers (2024-03-14T13:47:41Z)
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models [13.21813503235793]
Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM. We propose an instruction-tuned targeted attack (dubbed textscInstructTA) to deliver the targeted adversarial attack on LVLMs with high transferability.
arXiv Detail & Related papers (2023-12-04T13:40:05Z)
On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting. In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP. Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.