On Evaluating Adversarial Robustness of Large Vision-Language Models
- URL: http://arxiv.org/abs/2305.16934v2
- Date: Sun, 29 Oct 2023 12:32:19 GMT
- Title: On Evaluating Adversarial Robustness of Large Vision-Language Models
- Authors: Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man
Cheung, Min Lin
- Abstract summary: We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
- Score: 64.66104342002882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented
performance in response generation, especially with visual inputs, enabling
more creative and adaptable interaction than large language models such as
ChatGPT. Nonetheless, multimodal generation exacerbates safety concerns, since
adversaries may successfully evade the entire system by subtly manipulating the
most vulnerable modality (e.g., vision). To this end, we propose evaluating the
robustness of open-source large VLMs in the most realistic and high-risk
setting, where adversaries have only black-box system access and seek to
deceive the model into returning the targeted responses. In particular, we
first craft targeted adversarial examples against pretrained models such as
CLIP and BLIP, and then transfer these adversarial examples to other VLMs such
as MiniGPT-4, LLaVA, UniDiffuser, BLIP-2, and Img2Prompt. In addition, we
observe that black-box queries on these VLMs can further improve the
effectiveness of targeted evasion, resulting in a surprisingly high success
rate for generating targeted responses. Our findings provide a quantitative
understanding regarding the adversarial vulnerability of large VLMs and call
for a more thorough examination of their potential security flaws before
deployment in practice. Code is at https://github.com/yunqing-me/AttackVLM.
Related papers
- Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks [34.40254709148148]
Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding.
Their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks.
We present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update.
arXiv Detail & Related papers (2024-11-24T05:28:07Z) - AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models [41.044385916368455]
Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks.
We propose AnyAttack, a self-supervised framework that generates targeted adversarial images for VLMs without label supervision.
arXiv Detail & Related papers (2024-10-07T09:45:18Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors [31.383591942592467]
Vision-language models (VLMs) offer innovative ways to combine visual and textual data for enhanced understanding and interaction.
Patch-based adversarial attack is considered the most realistic threat model in physical vision applications.
We introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, to protectVLMs from the threat of patched visual prompt injectors.
arXiv Detail & Related papers (2024-05-17T04:19:19Z) - AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions [52.9787902653558]
Large Vision-Language Models (LVLMs) have shown significant progress in well responding to visual-instructions from users.
Despite the critical importance of LVLMs' robustness against such threats, current research in this area remains limited.
We introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions.
arXiv Detail & Related papers (2024-03-14T12:51:07Z) - VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context.
Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities.
Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images.
We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z) - Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks.
We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision.
Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.