Evaluating the Robustness of Text-to-image Diffusion Models against
Real-world Attacks
- URL: http://arxiv.org/abs/2306.13103v1
- Date: Fri, 16 Jun 2023 00:43:35 GMT
- Title: Evaluating the Robustness of Text-to-image Diffusion Models against
Real-world Attacks
- Authors: Hongcheng Gao, Hao Zhang, Yinpeng Dong, Zhijie Deng
- Abstract summary: Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.
One fundamental question is whether existing T2I DMs are robust against variations over input texts.
This work provides the first robustness evaluation of T2I DMs against real-world attacks.
- Score: 22.651626059348356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) diffusion models (DMs) have shown promise in generating
high-quality images from textual descriptions. The real-world applications of
these models require particular attention to their safety and fidelity, but
this has not been sufficiently explored. One fundamental question is whether
existing T2I DMs are robust against variations over input texts. To answer it,
this work provides the first robustness evaluation of T2I DMs against
real-world attacks. Unlike prior studies that focus on malicious attacks
involving apocryphal alterations to the input texts, we consider an attack
space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can
make, to ensure semantic consistency. Given the inherent randomness of the
generation process, we develop novel distribution-based attack objectives to
mislead T2I DMs. We perform attacks in a black-box manner without any knowledge
of the model. Extensive experiments demonstrate the effectiveness of our method
for attacking popular T2I DMs and simultaneously reveal their non-trivial
robustness issues. Moreover, we provide an in-depth analysis of our method to
show that it is not designed to attack the text encoder in T2I DMs solely.
Related papers
- Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey [22.930713650452894]
Text-to-Image (T2I) Diffusion Models (DMs) have garnered widespread attention for their impressive advancements in image generation.
Their growing popularity has raised ethical and social concerns related to key non-functional properties of trustworthiness.
arXiv Detail & Related papers (2024-09-26T18:46:47Z) - RT-Attack: Jailbreaking Text-to-Image Models via Random Token [24.61198605177661]
We introduce a two-stage query-based black-box attack method utilizing random search.
In the first stage, we establish a preliminary prompt by maximizing the semantic similarity between the adversarial and target harmful prompts.
In the second stage, we use this initial prompt to refine our approach, creating a detailed adversarial prompt aimed at jailbreaking.
arXiv Detail & Related papers (2024-08-25T17:33:40Z) - ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation [18.103478658038846]
Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions.
As is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness.
We introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees.
arXiv Detail & Related papers (2024-02-23T16:48:56Z) - Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with
Multi-Modal Priors [59.43303903348258]
Diffusion models have been widely deployed in various image generation tasks.
They face challenges of being maliciously exploited to generate harmful or sensitive images.
We propose a targeted attack method named MMP-Attack.
arXiv Detail & Related papers (2024-02-02T12:39:49Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion [10.985088790765873]
We study the problem of adversarial attack generation for Stable Diffusion.
We show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders.
We show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content.
arXiv Detail & Related papers (2023-03-29T01:24:25Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Explain2Attack: Text Adversarial Attacks via Cross-Domain
Interpretability [18.92690624514601]
Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans.
In this paper, we propose Explain2Attack, a black-box adversarial attack on text classification task.
We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.
arXiv Detail & Related papers (2020-10-14T04:56:41Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.