Related papers: Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

URL: http://arxiv.org/abs/2410.04884v1
Date: Mon, 7 Oct 2024 10:06:01 GMT
Title: Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models
Authors: Dehong Kong, Siyuan Liang, Xiaopeng Zhu, Yuansheng Zhong, Wenqi Ren,
Abstract summary: We propose a novel strategy that exclusively employs image patches for attacks, thus preserving the integrity of the original text. Our method leverages prior knowledge from diffusion models to enhance the authenticity and naturalness of the perturbations. Comprehensive experiments conducted in a white-box setting for image-to-text scenarios reveal that our proposed method significantly outperforms existing techniques, achieving a 100% attack success rate.
Score: 32.23201683108716
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Visual language pre-training (VLP) models have demonstrated significant success across various domains, yet they remain vulnerable to adversarial attacks. Addressing these adversarial vulnerabilities is crucial for enhancing security in multimodal learning. Traditionally, adversarial methods targeting VLP models involve simultaneously perturbing images and text. However, this approach faces notable challenges: first, adversarial perturbations often fail to translate effectively into real-world scenarios; second, direct modifications to the text are conspicuously visible. To overcome these limitations, we propose a novel strategy that exclusively employs image patches for attacks, thus preserving the integrity of the original text. Our method leverages prior knowledge from diffusion models to enhance the authenticity and naturalness of the perturbations. Moreover, to optimize patch placement and improve the efficacy of our attacks, we utilize the cross-attention mechanism, which encapsulates intermodal interactions by generating attention maps to guide strategic patch placements. Comprehensive experiments conducted in a white-box setting for image-to-text scenarios reveal that our proposed method significantly outperforms existing techniques, achieving a 100% attack success rate. Additionally, it demonstrates commendable performance in transfer tasks involving text-to-image configurations.

Related papers

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection [29.203173410857914]
We propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation.<n>Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected.<n>Our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP.
arXiv Detail & Related papers (2026-02-16T12:08:37Z)
CoDefend: Cross-Modal Collaborative Defense via Diffusion Purification and Prompt Optimization [4.6467356929461925]
Multimodal Large Language Models (MLLMs) have achieved remarkable success in tasks such as image captioning, visual question answering, and cross-modal reasoning.<n>Their multimodal nature exposes them to adversarial threats, where attackers can perturb either modality or both jointly to induce harmful, misleading, or policy violating outputs.<n>Existing defense strategies, such as adversarial training and input purification, face notable limitations.<n>We propose a supervised diffusion based denoising framework that leverages paired adversarial clean image datasets to fine-tune diffusion models.
arXiv Detail & Related papers (2025-10-13T07:44:54Z)
A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement [72.3054292908678]
We propose a unified framework that seamlessly incorporates traditional transferability enhancement strategies into diffusion model-based adversarial example generation via image editing.<n>Our method won first place in the "1st Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media" competition at ACM MM25.
arXiv Detail & Related papers (2025-06-30T09:59:09Z)
MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models [2.5931446496646204]
Referring Expression (RES) enables precise object segmentation in images based on natural language descriptions.<n>Despite its impressive performance, the robustness of RES models against adversarial examples remains largely unexplored.<n>We propose a novel adversarial attack strategy termed textbfMultimodal Bidirectional Attack, tailored for RES models.
arXiv Detail & Related papers (2025-06-19T09:14:04Z)
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense [90.71884758066042]
Large vision-language models (LVLMs) introduce a unique vulnerability: susceptibility to malicious attacks via visual inputs. We propose ESIII (Embedding Security Instructions Into Images), a novel methodology for transforming the visual space from a source of vulnerability into an active defense mechanism.
arXiv Detail & Related papers (2025-03-14T17:39:45Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models. Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input. An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z)
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors [31.383591942592467]
Vision-language models (VLMs) offer innovative ways to combine visual and textual data for enhanced understanding and interaction. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications. We introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, to protectVLMs from the threat of patched visual prompt injectors.
arXiv Detail & Related papers (2024-05-17T04:19:19Z)
Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z)
Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme. Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs) We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z)
DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks [34.86098237949214]
Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks. DIFFender integrates dual tasks of patch localization and restoration within a single diffusion model framework.
arXiv Detail & Related papers (2023-06-15T13:33:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.