Stable Diffusion is Unstable
- URL: http://arxiv.org/abs/2306.02583v2
- Date: Tue, 6 Jun 2023 04:28:17 GMT
- Title: Stable Diffusion is Unstable
- Authors: Chengbin Du, Yanxi Li, Zhongwei Qiu, Chang Xu
- Abstract summary: We propose Auto-attack on Text-to-image Models (ATM) to efficiently generate small perturbations.
ATM has achieved a 91.1% success rate in short-text attacks and an 81.2% success rate in long-text attacks.
Further empirical analysis revealed four attack patterns based on: 1) the variability in generation speed, 2) the similarity of coarse-grained characteristics, 3) the polysemy of words, and 4) the positioning of words.
- Score: 21.13934830556678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, text-to-image models have been thriving. Despite their powerful
generative capacity, our research has uncovered a lack of robustness in this
generation process. Specifically, the introduction of small perturbations to
the text prompts can result in the blending of primary subjects with other
categories or their complete disappearance in the generated images. In this
paper, we propose Auto-attack on Text-to-image Models (ATM), a gradient-based
approach, to effectively and efficiently generate such perturbations. By
learning a Gumbel Softmax distribution, we can make the discrete process of
word replacement or extension continuous, thus ensuring the differentiability
of the perturbation generation. Once the distribution is learned, ATM can
sample multiple attack samples simultaneously. These attack samples can prevent
the generative model from generating the desired subjects without compromising
image quality. ATM has achieved a 91.1% success rate in short-text attacks and
an 81.2% success rate in long-text attacks. Further empirical analysis revealed
four attack patterns based on: 1) the variability in generation speed, 2) the
similarity of coarse-grained characteristics, 3) the polysemy of words, and 4)
the positioning of words.
Related papers
- Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum [13.305800254250789]
We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE)
It generates adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label.
Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method.
arXiv Detail & Related papers (2024-10-17T01:22:11Z) - Text Diffusion with Reinforced Conditioning [92.17397504834825]
This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling.
Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling.
arXiv Detail & Related papers (2024-02-19T09:24:02Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts [38.63253101205306]
Previous studies have successfully demonstrated that manipulated prompts can elicit text-to-image models to generate unsafe images.
We propose two poisoning attacks: a basic attack and a utility-preserving attack.
Our findings underscore the potential risks of adopting text-to-image models in real-world scenarios.
arXiv Detail & Related papers (2023-10-25T13:10:44Z) - Counterfactual Image Generation for adversarially robust and
interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Discovering Failure Modes of Text-guided Diffusion Models via
Adversarial Search [52.519433040005126]
Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly.
In this work, we aim to study and understand the failure modes of TDMs in more detail.
We propose SAGE, the first adversarial search method on TDMs.
arXiv Detail & Related papers (2023-06-01T17:59:00Z) - Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt.
We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt.
We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z) - Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text
Generation via Concentrating Attention [85.5379146125199]
Powerful Transformer architectures have proven superior in generating high-quality sentences.
In this work, we find that sparser attention values in Transformer could improve diversity.
We introduce a novel attention regularization loss to control the sharpness of the attention distribution.
arXiv Detail & Related papers (2022-11-14T07:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.