Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks
- URL: http://arxiv.org/abs/2401.08725v1
- Date: Tue, 16 Jan 2024 12:15:39 GMT
- Title: Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks
- Authors: Chenyu Zhang, Lanjun Wang, Anan Liu
- Abstract summary: We formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts.
Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images.
After obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model.
- Score: 41.531913152661296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in text-to-image models, particularly Stable Diffusion,
have marked significant achievements in various applications. With these
advancements, there are growing safety concerns about the vulnerability of the
model that malicious entities exploit to generate targeted harmful images.
However, the existing methods in the vulnerability of the model mainly evaluate
the alignment between the prompt and generated images, but fall short in
revealing the vulnerability associated with targeted image generation. In this
study, we formulate the problem of targeted adversarial attack on Stable
Diffusion and propose a framework to generate adversarial prompts.
Specifically, we design a gradient-based embedding optimization method to craft
reliable adversarial prompts that guide stable diffusion to generate specific
images. Furthermore, after obtaining successful adversarial prompts, we reveal
the mechanisms that cause the vulnerability of the model. Extensive experiments
on two targeted attack tasks demonstrate the effectiveness of our method in
targeted attacks. The code can be obtained in
https://github.com/datar001/Revealing-Vulnerabilities-in-Stable-Diffusion-via-Targeted-Attacks.
Related papers
- Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Invisible Backdoor Attacks on Diffusion Models [22.08671395877427]
Recent research has brought to light the vulnerability of diffusion models to backdoor attacks.
We present an innovative framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors.
arXiv Detail & Related papers (2024-06-02T17:43:19Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - LFAA: Crafting Transferable Targeted Adversarial Examples with
Low-Frequency Perturbations [25.929492841042666]
We present a novel approach to generate transferable targeted adversarial examples.
We exploit the vulnerability of deep neural networks to perturbations on high-frequency components of images.
Our proposed approach significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-10-31T04:54:55Z) - LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via
Latent Ensemble Attack [11.764601181046496]
Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society.
To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs.
We propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process.
arXiv Detail & Related papers (2023-07-04T07:00:37Z) - Data Forensics in Diffusion Models: A Systematic Analysis of Membership
Privacy [62.16582309504159]
We develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario.
Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios.
arXiv Detail & Related papers (2023-02-15T17:37:49Z) - Threat Model-Agnostic Adversarial Defense using Diffusion Models [14.603209216642034]
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
arXiv Detail & Related papers (2022-07-17T06:50:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.