AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
- URL: http://arxiv.org/abs/2410.05346v3
- Date: Fri, 28 Mar 2025 02:55:23 GMT
- Title: AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
- Authors: Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Yunhao Chen, Jitao Sang, Dit-Yan Yeung,
- Abstract summary: Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks.<n>We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks.
- Score: 39.34959092321762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks. Traditional targeted adversarial attacks require specific targets and labels, limiting their real-world impact.We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks through a novel foundation model approach. By pre-training on the massive LAION-400M dataset without label supervision, AnyAttack achieves unprecedented flexibility - enabling any image to be transformed into an attack vector targeting any desired output across different VLMs.This approach fundamentally changes the threat landscape, making adversarial capabilities accessible at an unprecedented scale. Our extensive validation across five open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) demonstrates AnyAttack's effectiveness across diverse multimodal tasks. Most concerning, AnyAttack seamlessly transfers to commercial systems including Google Gemini, Claude Sonnet, Microsoft Copilot and OpenAI GPT, revealing a systemic vulnerability requiring immediate attention.
Related papers
- Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails [32.627286570942445]
MultiFaceted Attack is an attack framework designed to bypass Multi-Layered Defenses in Vision Large Language Models.
It exploits the multimodal nature of VLLMs to inject toxic system prompts through images.
It achieves a 61.56% attack success rate, surpassing state-of-the-art methods by at least 42.18%.
arXiv Detail & Related papers (2025-02-09T04:21:27Z) - Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation [15.883062174902093]
Large Vision-Language Models (VLMs) have demonstrated remarkable performance across multimodal tasks by integrating vision encoders with large language models (LLMs)
We introduce a novel UAP specifically designed for VLMs: the Doubly-Universal Adversarial Perturbation (Doubly-UAP)
arXiv Detail & Related papers (2024-12-11T05:23:34Z) - Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector [97.92369017531038]
We build a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR)
We then develop a novel iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of Visual Language Models (VLMs) to achieve the detection of adversarial images against benign ones in the input.
arXiv Detail & Related papers (2024-10-30T10:33:10Z) - Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images [3.537369004801589]
Hiding-in-Plain-Sight (HiPS) attacks subtly modifies model predictions by selectively concealing target object(s)
We propose two HiPS attack variants, HiPS-cls and HiPS-cap, and demonstrate their effectiveness in transferring to downstream image captioning models.
arXiv Detail & Related papers (2024-10-16T20:11:32Z) - White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - Adversarial Robustness for Visual Grounding of Multimodal Large Language Models [49.71757071535619]
Multi-modal Large Language Models (MLLMs) have recently achieved enhanced performance across various vision-language tasks.
adversarial robustness of visual grounding remains unexplored in MLLMs.
We propose three adversarial attack paradigms as follows.
arXiv Detail & Related papers (2024-05-16T10:54:26Z) - Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective [42.04728834962863]
Pretrained vision-language models (VLMs) like CLIP exhibit exceptional generalization across diverse downstream tasks.
Recent studies reveal their vulnerability to adversarial attacks, with defenses against text-based and multimodal attacks remaining largely unexplored.
This work presents the first comprehensive study on improving the adversarial robustness of VLMs against attacks targeting image, text, and multimodal inputs.
arXiv Detail & Related papers (2024-04-30T06:34:21Z) - VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context.
Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities.
Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images.
We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z) - VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
Pre-trained Models [46.14455492739906]
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks.
Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting.
We propose VLATTACK to generate adversarial samples by fusing perturbations of images and texts from both single-modal and multimodal levels.
arXiv Detail & Related papers (2023-10-07T02:18:52Z) - Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks.
We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision.
Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.