PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention
- URL: http://arxiv.org/abs/2602.19418v1
- Date: Mon, 23 Feb 2026 01:20:43 GMT
- Title: PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention
- Authors: Hefei Mei, Zirui Wang, Chang Xu, Jianyuan Guo, Minjing Dong,
- Abstract summary: Large Vision-Language Models (LVLMs) are foundational to modern multimodal applications, yet their susceptibility to adversarial attacks remains a critical concern.<n>We introduce PA-Attack (Prototype-Anchored Attentive Attack) to tackle the attribute-restricted issue and limited task generalization of vanilla attacks.<n>Experiments show that PA-Attack achieves an average 75.1% score reduction rate (SRR), demonstrating strong attack effectiveness, efficiency, and task generalization in LVLMs.
- Score: 63.63231191403825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Vision-Language Models (LVLMs) are foundational to modern multimodal applications, yet their susceptibility to adversarial attacks remains a critical concern. Prior white-box attacks rarely generalize across tasks, and black-box methods depend on expensive transfer, which limits efficiency. The vision encoder, standardized and often shared across LVLMs, provides a stable gray-box pivot with strong cross-model transfer. Building on this premise, we introduce PA-Attack (Prototype-Anchored Attentive Attack). PA-Attack begins with a prototype-anchored guidance that provides a stable attack direction towards a general and dissimilar prototype, tackling the attribute-restricted issue and limited task generalization of vanilla attacks. Building on this, we propose a two-stage attention enhancement mechanism: (i) leverage token-level attention scores to concentrate perturbations on critical visual tokens, and (ii) adaptively recalibrate attention weights to track the evolving attention during the adversarial process. Extensive experiments across diverse downstream tasks and LVLM architectures show that PA-Attack achieves an average 75.1% score reduction rate (SRR), demonstrating strong attack effectiveness, efficiency, and task generalization in LVLMs. Code is available at https://github.com/hefeimei06/PA-Attack.
Related papers
- On the Adversarial Robustness of Discrete Image Tokenizers [56.377796750281796]
We first formulate attacks that aim to perturb the features extracted by discrete tokenizers, and thus change the extracted tokens.<n>We fine-tune popular tokenizers with unsupervised adversarial training, keeping all other components frozen.<n>Our approach significantly improves robustness to both unsupervised and end-to-end supervised attacks and generalizes well to unseen tasks and data.
arXiv Detail & Related papers (2026-02-20T14:39:17Z) - Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models [8.92145245069646]
We show that intermediate activations in decoder-only large language models (LLMs) form a vulnerable attack surface for behavioral control.<n>We exploit this as an attack surface via Sensitivity-Scaled Steering (SSS), a progressive activation-level attack.<n>We show that SSS induces large shifts in evil, hallucination, sycophancy, and sentiment while preserving high coherence and general capabilities.
arXiv Detail & Related papers (2025-11-21T12:19:55Z) - Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models [54.61181161508336]
We introduce Multi-Faceted Attack (MFA), a framework that exposes general safety vulnerabilities in leading defense-equipped Vision-Language Models (VLMs)<n>The core component of MFA is the Attention-Transfer Attack (ATA), which hides harmful instructions inside a meta task with competing objectives.<n>MFA achieves a 58.5% success rate and consistently outperforms existing methods.
arXiv Detail & Related papers (2025-11-20T07:12:54Z) - Universal Camouflage Attack on Vision-Language Models for Autonomous Driving [67.34987318443761]
Visual language modeling for automated driving is emerging as a promising research direction.<n>VLM-AD remains vulnerable to serious security threats from adversarial attacks.<n>We propose the first Universal Camouflage Attack framework for VLM-AD.
arXiv Detail & Related papers (2025-09-24T14:52:01Z) - VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models [33.120141513366136]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation.<n>Existing effective attacks always focus on task-specific white-box settings.<n>We propose a simple yet effective Vision Attack (VEAttack) which targets the vision encoder of LVLMs only.
arXiv Detail & Related papers (2025-05-23T03:46:04Z) - Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems.<n>We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics.<n>We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z) - Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation [15.883062174902093]
Large Vision-Language Models (VLMs) have demonstrated remarkable performance across multimodal tasks by integrating vision encoders with large language models (LLMs)<n>We introduce a novel UAP specifically designed for VLMs: the Doubly-Universal Adversarial Perturbation (Doubly-UAP)
arXiv Detail & Related papers (2024-12-11T05:23:34Z) - Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks [34.40254709148148]
Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding.
Their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks.
We present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update.
arXiv Detail & Related papers (2024-11-24T05:28:07Z) - Adversarial Attacks on LiDAR-Based Tracking Across Road Users: Robustness Evaluation and Target-Aware Black-Box Method [6.6391733681417415]
We introduce a unified framework for conducting adversarial attacks within the context of 3D object tracking.<n>In addressing black-box attack scenarios, we introduce a novel transfer-based approach, the Target-aware Perturbation Generation (TAPG) algorithm.<n>Our experimental findings reveal a significant vulnerability in advanced tracking methods when subjected to both black-box and white-box attacks.
arXiv Detail & Related papers (2024-10-28T10:20:38Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.