AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models
- URL: http://arxiv.org/abs/2410.05346v1
- Date: Mon, 7 Oct 2024 09:45:18 GMT
- Title: AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models
- Authors: Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Jitao Sang, Dit-Yan Yeung,
- Abstract summary: Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks.
We propose AnyAttack, a self-supervised framework that generates targeted adversarial images for VLMs without label supervision.
- Score: 41.044385916368455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks, particularly targeted adversarial images that manipulate the model to generate harmful content specified by the adversary. Current attack methods rely on predefined target labels to create targeted adversarial attacks, which limits their scalability and applicability for large-scale robustness evaluations. In this paper, we propose AnyAttack, a self-supervised framework that generates targeted adversarial images for VLMs without label supervision, allowing any image to serve as a target for the attack. To address the limitation of existing methods that require label supervision, we introduce a contrastive loss that trains a generator on a large-scale unlabeled image dataset, LAION-400M dataset, for generating targeted adversarial noise. This large-scale pre-training endows our method with powerful transferability across a wide range of VLMs. Extensive experiments on five mainstream open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) across three multimodal tasks (image-text retrieval, multimodal classification, and image captioning) demonstrate the effectiveness of our attack. Additionally, we successfully transfer AnyAttack to multiple commercial VLMs, including Google's Gemini, Claude's Sonnet, and Microsoft's Copilot. These results reveal an unprecedented risk to VLMs, highlighting the need for effective countermeasures.
Related papers
- Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector [97.92369017531038]
We build a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR)
We then develop a novel iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of Visual Language Models (VLMs) to achieve the detection of adversarial images against benign ones in the input.
arXiv Detail & Related papers (2024-10-30T10:33:10Z) - White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - Adversarial Robustness for Visual Grounding of Multimodal Large Language Models [49.71757071535619]
Multi-modal Large Language Models (MLLMs) have recently achieved enhanced performance across various vision-language tasks.
adversarial robustness of visual grounding remains unexplored in MLLMs.
We propose three adversarial attack paradigms as follows.
arXiv Detail & Related papers (2024-05-16T10:54:26Z) - Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective [42.04728834962863]
Pretrained vision-language models (VLMs) like CLIP exhibit exceptional generalization across diverse downstream tasks.
Recent studies reveal their vulnerability to adversarial attacks, with defenses against text-based and multimodal attacks remaining largely unexplored.
This work presents the first comprehensive study on improving the adversarial robustness of VLMs against attacks targeting image, text, and multimodal inputs.
arXiv Detail & Related papers (2024-04-30T06:34:21Z) - VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context.
Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities.
Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images.
We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z) - VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
Pre-trained Models [46.14455492739906]
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks.
Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting.
We propose VLATTACK to generate adversarial samples by fusing perturbations of images and texts from both single-modal and multimodal levels.
arXiv Detail & Related papers (2023-10-07T02:18:52Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.