Related papers: A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

URL: http://arxiv.org/abs/2303.16378v2
Date: Mon, 3 Apr 2023 03:00:46 GMT
Title: A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
Authors: Haomin Zhuang, Yihua Zhang and Sijia Liu
Abstract summary: We study the problem of adversarial attack generation for Stable Diffusion. We show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders. We show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content.
Score: 10.985088790765873
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content. Our code is available at https://github.com/OPTML-Group/QF-Attack.

Related papers

Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly. Current backdoor samples often exhibit two key abnormalities compared to benign samples. We propose a novel Invisible Backdoor Attack (IBA) to enhance the stealthiness of backdoor samples.
arXiv Detail & Related papers (2025-03-22T10:41:46Z)
Web Artifact Attacks Disrupt Vision Language Models [61.59021920232986]
Vision-language models (VLMs) are trained on large-scale, lightly curated web datasets. They learn unintended correlations between semantic concepts and unrelated visual signals. Prior work has weaponized these correlations as an attack vector to manipulate model predictions. We introduce artifact-based attacks: a novel class of manipulations that mislead models using both non-matching text and graphical elements.
arXiv Detail & Related papers (2025-03-17T18:59:29Z)
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images [26.968240633095615]
We propose Prompt2Perturb, a novel language-guided attack method driven by text instructions. We show that our method outperforms state-of-the-art attack techniques across three breast ultrasound datasets.
arXiv Detail & Related papers (2024-12-13T06:56:12Z)
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models [5.781285400461636]
Text-to-image (T2I) diffusion models can be misused to produce inappropriate content. We introduce SteerDiff, a lightweight adaptor module designed to act as an intermediary between user input and the diffusion model. We conduct extensive experiments across various concept unlearning tasks to evaluate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-03T17:34:55Z)
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z)
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models [10.70975463369742]
We present the Jailbreaking Prompt Attack (JPA) JPA searches for the target malicious concepts in the text embedding space using a group of antonyms. A prefix prompt is optimized in the discrete vocabulary space to align malicious concepts semantically in the text embedding space.
arXiv Detail & Related papers (2024-04-02T09:49:35Z)
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules. Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z)
Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks [41.531913152661296]
We formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts. Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images. After obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model.
arXiv Detail & Related papers (2024-01-16T12:15:39Z)
Instruct2Attack: Language-Guided Semantic Adversarial Attacks [76.83548867066561]
Instruct2Attack (I2A) is a language-guided semantic attack that generates meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses.
arXiv Detail & Related papers (2023-11-27T05:35:49Z)
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks [22.651626059348356]
Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions. One fundamental question is whether existing T2I DMs are robust against variations over input texts. This work provides the first robustness evaluation of T2I DMs against real-world attacks.
arXiv Detail & Related papers (2023-06-16T00:43:35Z)
Designing a Better Asymmetric VQGAN for StableDiffusion [73.21783102003398]
A revolutionary text-to-image generator, StableDiffusion, learns a diffusion model in the latent space via a VQGAN. We propose a new asymmetric VQGAN with two simple designs. It can be widely used in StableDiffusion-based inpainting and local editing methods.
arXiv Detail & Related papers (2023-06-07T17:56:02Z)
Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search [52.519433040005126]
Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. In this work, we aim to study and understand the failure modes of TDMs in more detail. We propose SAGE, the first adversarial search method on TDMs.
arXiv Detail & Related papers (2023-06-01T17:59:00Z)
Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework [20.652130361862053]
We propose the Adversarial Decoupling Augmentation Framework (ADAF) to enhance the defensive performance of facial privacy protection algorithms. ADAF introduces multi-level text-related augmentations for defense stability against various attacker prompts.
arXiv Detail & Related papers (2023-05-06T09:00:50Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.