A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
- URL: http://arxiv.org/abs/2303.16378v2
- Date: Mon, 3 Apr 2023 03:00:46 GMT
- Title: A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
- Authors: Haomin Zhuang, Yihua Zhang and Sijia Liu
- Abstract summary: We study the problem of adversarial attack generation for Stable Diffusion.
We show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders.
We show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content.
- Score: 10.985088790765873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the record-breaking performance in Text-to-Image (T2I) generation by
Stable Diffusion, less research attention is paid to its adversarial
robustness. In this work, we study the problem of adversarial attack generation
for Stable Diffusion and ask if an adversarial text prompt can be obtained even
in the absence of end-to-end model queries. We call the resulting problem
'query-free attack generation'. To resolve this problem, we show that the
vulnerability of T2I models is rooted in the lack of robustness of text
encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion.
Based on such insight, we propose both untargeted and targeted query-free
attacks, where the former is built on the most influential dimensions in the
text embedding space, which we call steerable key dimensions. By leveraging the
proposed attacks, we empirically show that only a five-character perturbation
to the text prompt is able to cause the significant content shift of
synthesized images using Stable Diffusion. Moreover, we show that the proposed
target attack can precisely steer the diffusion model to scrub the targeted
image content without causing much change in untargeted image content. Our code
is available at https://github.com/OPTML-Group/QF-Attack.
Related papers
- White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - VQAttack: Transferable Adversarial Attacks on Visual Question Answering
via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules.
Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z) - Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks [41.531913152661296]
We formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts.
Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images.
After obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model.
arXiv Detail & Related papers (2024-01-16T12:15:39Z) - Instruct2Attack: Language-Guided Semantic Adversarial Attacks [76.83548867066561]
Instruct2Attack (I2A) is a language-guided semantic attack that generates meaningful perturbations according to free-form language instructions.
We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction.
We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses.
arXiv Detail & Related papers (2023-11-27T05:35:49Z) - Semantic Adversarial Attacks via Diffusion Models [30.169827029761702]
Semantic adversarial attacks focus on changing semantic attributes of clean examples, such as color, context, and features.
We propose a framework to quickly generate a semantic adversarial attack by leveraging recent diffusion models.
Our approaches achieve approximately 100% attack success rate in multiple settings with the best FID as 36.61.
arXiv Detail & Related papers (2023-09-14T02:57:48Z) - Evaluating the Robustness of Text-to-image Diffusion Models against
Real-world Attacks [22.651626059348356]
Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.
One fundamental question is whether existing T2I DMs are robust against variations over input texts.
This work provides the first robustness evaluation of T2I DMs against real-world attacks.
arXiv Detail & Related papers (2023-06-16T00:43:35Z) - I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models [0.0]
We present a gray-box adversarial attack on image-to-text, both untargeted and targeted.
Our attack operates in a gray-box manner, requiring no knowledge about the decoder module.
We also show that our attacks fool the popular open-source platform Hugging Face.
arXiv Detail & Related papers (2023-06-13T07:35:28Z) - Designing a Better Asymmetric VQGAN for StableDiffusion [73.21783102003398]
A revolutionary text-to-image generator, StableDiffusion, learns a diffusion model in the latent space via a VQGAN.
We propose a new asymmetric VQGAN with two simple designs.
It can be widely used in StableDiffusion-based inpainting and local editing methods.
arXiv Detail & Related papers (2023-06-07T17:56:02Z) - Discovering Failure Modes of Text-guided Diffusion Models via
Adversarial Search [52.519433040005126]
Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly.
In this work, we aim to study and understand the failure modes of TDMs in more detail.
We propose SAGE, the first adversarial search method on TDMs.
arXiv Detail & Related papers (2023-06-01T17:59:00Z) - Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling
Augmentation Framework [20.652130361862053]
We propose the Adversarial Decoupling Augmentation Framework (ADAF) to enhance the defensive performance of facial privacy protection algorithms.
ADAF introduces multi-level text-related augmentations for defense stability against various attacker prompts.
arXiv Detail & Related papers (2023-05-06T09:00:50Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.