Related papers: InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models

URL: http://arxiv.org/abs/2312.01886v3
Date: Wed, 26 Jun 2024 03:29:50 GMT
Title: InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Authors: Xunguang Wang, Zhenlan Ji, Pingchuan Ma, Zongjie Li, Shuai Wang,
Abstract summary: Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM. We propose an instruction-tuned targeted attack (dubbed textscInstructTA) to deliver the targeted adversarial attack on LVLMs with high transferability.
Score: 13.21813503235793
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. However, this rich visual interaction also makes LVLMs vulnerable to adversarial examples. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM, without the knowledge of its prompts (which are often proprietary for service providers and not publicly available) and its underlying large language model (LLM). This practical setting poses challenges to the cross-prompt and cross-model transferability of targeted adversarial attack, which aims to confuse the LVLM to output a response that is semantically similar to the attacker's chosen target text. To this end, we propose an instruction-tuned targeted attack (dubbed \textsc{InstructTA}) to deliver the targeted adversarial attack on LVLMs with high transferability. Initially, we utilize a public text-to-image generative model to "reverse" the target response into a target image, and employ GPT-4 to infer a reasonable instruction $\boldsymbol{p}^\prime$ from the target response. We then form a local surrogate model (sharing the same vision encoder with the victim LVLM) to extract instruction-aware features of an adversarial image example and the target image, and minimize the distance between these two features to optimize the adversarial example. To further improve the transferability with instruction tuning, we augment the instruction $\boldsymbol{p}^\prime$ with instructions paraphrased from GPT-4. Extensive experiments demonstrate the superiority of our proposed method in targeted attack performance and transferability. The code is available at https://github.com/xunguangwang/InstructTA.

Related papers

Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models [27.04420374256226]
Large vision-language models (LVLMs) have demonstrated outstanding performance in many downstream tasks.<n>It is important to detect whether an image is used to train the LVLM.<n>Recent studies have investigated membership inference attacks (MIAs) against LVLMs.
arXiv Detail & Related papers (2025-06-14T04:22:36Z)
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models [33.120141513366136]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation.<n>Existing effective attacks always focus on task-specific white-box settings.<n>We propose a simple yet effective Vision Attack (VEAttack) which targets the vision encoder of LVLMs only.
arXiv Detail & Related papers (2025-05-23T03:46:04Z)
AIM: Additional Image Guided Generation of Transferable Adversarial Attacks [72.24101555828256]
Transferable adversarial examples highlight the vulnerability of deep neural networks (DNNs) to imperceptible perturbations across various real-world applications. In this work, we focus on generative approaches for targeted transferable attacks. We introduce a novel plug-and-play module into the general generator architecture to enhance adversarial transferability.
arXiv Detail & Related papers (2025-01-02T07:06:49Z)
Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models [6.649753747542211]
We propose a novel adversarial attack procedure, namely, Replace-then-Perturb and Contrastive-Adv. In Replace-then-Perturb, we first leverage a text-guided segmentation model to find the target object in the image. By doing this, we can generate a target image corresponding to the desired prompt, while maintaining the overall integrity of the original image.
arXiv Detail & Related papers (2024-11-01T04:50:08Z)
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models [15.029014337718849]
Large vision-language models (LVLMs) integrate visual information into large language models, showcasing remarkable multi-modal conversational capabilities. In general, LVLMs rely on vision encoders to transform images into visual tokens, which are crucial for the language models to perceive image contents effectively. We propose a non-targeted attack method referred to as VT-Attack, which constructs adversarial examples from multiple perspectives.
arXiv Detail & Related papers (2024-10-09T09:06:56Z)
AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models [41.044385916368455]
Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks. We propose AnyAttack, a self-supervised framework that generates targeted adversarial images for VLMs without label supervision.
arXiv Detail & Related papers (2024-10-07T09:45:18Z)
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing. We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM. Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z)
Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens [28.356269620160937]
We propose a Contextual-Injection Attack (CIA) that employs gradient-based perturbation to inject target tokens into both visual and textual contexts. CIA enhances the cross-prompt transferability of adversarial images.
arXiv Detail & Related papers (2024-06-19T07:32:55Z)
Adversarial Attacks on Multimodal Agents [73.97379283655127]
Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments. We show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt [60.54666043358946]
This paper introduces the Bi-Modal Adversarial Prompt Attack (BAP), which executes jailbreaks by optimizing textual and visual prompts cohesively. In particular, we utilize a large language model to analyze jailbreak failures and employ chain-of-thought reasoning to refine textual prompts.
arXiv Detail & Related papers (2024-06-06T13:00:42Z)
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images. We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z)
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules. Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z)
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks [62.34019142949628]
Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP. We introduce two novel and more effective textitSelf-Generated attacks which prompt the LVLM to generate an attack against itself. Using our benchmark, we uncover that Self-Generated attacks pose a significant threat, reducing LVLM(s) classification performance by up to 33%.
arXiv Detail & Related papers (2024-02-01T14:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.