A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
- URL: http://arxiv.org/abs/2303.16378v2
- Date: Mon, 3 Apr 2023 03:00:46 GMT
- Title: A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
- Authors: Haomin Zhuang, Yihua Zhang and Sijia Liu
- Abstract summary: We study the problem of adversarial attack generation for Stable Diffusion.
We show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders.
We show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content.
- Score: 10.985088790765873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the record-breaking performance in Text-to-Image (T2I) generation by
Stable Diffusion, less research attention is paid to its adversarial
robustness. In this work, we study the problem of adversarial attack generation
for Stable Diffusion and ask if an adversarial text prompt can be obtained even
in the absence of end-to-end model queries. We call the resulting problem
'query-free attack generation'. To resolve this problem, we show that the
vulnerability of T2I models is rooted in the lack of robustness of text
encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion.
Based on such insight, we propose both untargeted and targeted query-free
attacks, where the former is built on the most influential dimensions in the
text embedding space, which we call steerable key dimensions. By leveraging the
proposed attacks, we empirically show that only a five-character perturbation
to the text prompt is able to cause the significant content shift of
synthesized images using Stable Diffusion. Moreover, we show that the proposed
target attack can precisely steer the diffusion model to scrub the targeted
image content without causing much change in untargeted image content. Our code
is available at https://github.com/OPTML-Group/QF-Attack.
Related papers
- SteerDiff: Steering towards Safe Text-to-Image Diffusion Models [5.781285400461636]
Text-to-image (T2I) diffusion models can be misused to produce inappropriate content.
We introduce SteerDiff, a lightweight adaptor module designed to act as an intermediary between user input and the diffusion model.
We conduct extensive experiments across various concept unlearning tasks to evaluate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-03T17:34:55Z) - AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models [10.70975463369742]
We present the Jailbreaking Prompt Attack (JPA)
JPA searches for the target malicious concepts in the text embedding space using a group of antonyms.
A prefix prompt is optimized in the discrete vocabulary space to align malicious concepts semantically in the text embedding space.
arXiv Detail & Related papers (2024-04-02T09:49:35Z) - VQAttack: Transferable Adversarial Attacks on Visual Question Answering
via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules.
Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z) - Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks [41.531913152661296]
We formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts.
Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images.
After obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model.
arXiv Detail & Related papers (2024-01-16T12:15:39Z) - Instruct2Attack: Language-Guided Semantic Adversarial Attacks [76.83548867066561]
Instruct2Attack (I2A) is a language-guided semantic attack that generates meaningful perturbations according to free-form language instructions.
We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction.
We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses.
arXiv Detail & Related papers (2023-11-27T05:35:49Z) - Evaluating the Robustness of Text-to-image Diffusion Models against
Real-world Attacks [22.651626059348356]
Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.
One fundamental question is whether existing T2I DMs are robust against variations over input texts.
This work provides the first robustness evaluation of T2I DMs against real-world attacks.
arXiv Detail & Related papers (2023-06-16T00:43:35Z) - Designing a Better Asymmetric VQGAN for StableDiffusion [73.21783102003398]
A revolutionary text-to-image generator, StableDiffusion, learns a diffusion model in the latent space via a VQGAN.
We propose a new asymmetric VQGAN with two simple designs.
It can be widely used in StableDiffusion-based inpainting and local editing methods.
arXiv Detail & Related papers (2023-06-07T17:56:02Z) - Discovering Failure Modes of Text-guided Diffusion Models via
Adversarial Search [52.519433040005126]
Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly.
In this work, we aim to study and understand the failure modes of TDMs in more detail.
We propose SAGE, the first adversarial search method on TDMs.
arXiv Detail & Related papers (2023-06-01T17:59:00Z) - Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling
Augmentation Framework [20.652130361862053]
We propose the Adversarial Decoupling Augmentation Framework (ADAF) to enhance the defensive performance of facial privacy protection algorithms.
ADAF introduces multi-level text-related augmentations for defense stability against various attacker prompts.
arXiv Detail & Related papers (2023-05-06T09:00:50Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.