Safe Latent Diffusion: Mitigating Inappropriate Degeneration in
Diffusion Models
- URL: http://arxiv.org/abs/2211.05105v4
- Date: Wed, 26 Apr 2023 11:47:44 GMT
- Title: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in
Diffusion Models
- Authors: Patrick Schramowski, Manuel Brack, Bj\"orn Deiseroth, Kristian
Kersting
- Abstract summary: Text-conditioned image generation models suffer from degenerated and biased human behavior.
We present safe latent diffusion (SLD) to help combat these undesired side effects.
We show that SLD removes and suppresses inappropriate image parts during the diffusion process.
- Score: 18.701950647429
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-conditioned image generation models have recently achieved astonishing
results in image quality and text alignment and are consequently employed in a
fast-growing number of applications. Since they are highly data-driven, relying
on billion-sized datasets randomly scraped from the internet, they also suffer,
as we demonstrate, from degenerated and biased human behavior. In turn, they
may even reinforce such biases. To help combat these undesired side effects, we
present safe latent diffusion (SLD). Specifically, to measure the inappropriate
degeneration due to unfiltered and imbalanced training sets, we establish a
novel image generation test bed-inappropriate image prompts (I2P)-containing
dedicated, real-world image-to-text prompts covering concepts such as nudity
and violence. As our exhaustive empirical evaluation demonstrates, the
introduced SLD removes and suppresses inappropriate image parts during the
diffusion process, with no additional training required and no adverse effect
on overall image quality or text alignment.
Related papers
- Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models [29.083402085790016]
We propose a method that coaxes the sampled trajectories of pretrained diffusion models to land on images that fall outside of a reference set.
We achieve this by adding repellency terms to the diffusion SDE throughout the generation trajectory.
We show that adding SPELL to popular diffusion models improves their diversity while impacting their FID only marginally, and performs comparatively better than other recent training-free diversity methods.
arXiv Detail & Related papers (2024-10-08T13:26:32Z) - Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts.
The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts.
concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z) - SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models [28.23494821842336]
Text-to-image models may be tricked into generating not-safe-for-work (NSFW) content.
We present SafeGen, a framework to mitigate sexual content generation by text-to-image models.
arXiv Detail & Related papers (2024-04-10T00:26:08Z) - Text Diffusion with Reinforced Conditioning [92.17397504834825]
This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling.
Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling.
arXiv Detail & Related papers (2024-02-19T09:24:02Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - IMPRESS: Evaluating the Resilience of Imperceptible Perturbations
Against Unauthorized Data Usage in Diffusion-Based Generative AI [52.90082445349903]
Diffusion-based image generation models can create artistic images that mimic the style of an artist or maliciously edit the original images for fake content.
Several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations.
In this work, we introduce a purification perturbation platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure.
arXiv Detail & Related papers (2023-10-30T03:33:41Z) - Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z) - DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset.
Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions.
By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z) - Mitigating Inappropriateness in Image Generation: Can there be Value in
Reflecting the World's Ugliness? [18.701950647429]
We demonstrate inappropriate degeneration on a large-scale for various generative text-to-image models.
We use models' representations of the world's ugliness to align them with human preferences.
arXiv Detail & Related papers (2023-05-28T13:35:50Z) - Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness [15.059419033330126]
We present a novel strategy, called Fair Diffusion, to attenuate biases after the deployment of generative text-to-image models.
Specifically, we demonstrate shifting a bias, based on human instructions, in any direction yielding arbitrarily new proportions for, e.g., identity groups.
This introduced control enables instructing generative image models on fairness, with no data filtering and additional training required.
arXiv Detail & Related papers (2023-02-07T18:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.