Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models
- URL: http://arxiv.org/abs/2307.05977v1
- Date: Wed, 12 Jul 2023 07:48:29 GMT
- Title: Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models
- Authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin,
Juho Lee
- Abstract summary: We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
- Score: 63.20512617502273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale image generation models, with impressive quality made possible by
the vast amount of data available on the Internet, raise social concerns that
these models may generate harmful or copyrighted content. The biases and
harmfulness arise throughout the entire training process and are hard to
completely remove, which have become significant hurdles to the safe deployment
of these models. In this paper, we propose a method called SDD to prevent
problematic content generation in text-to-image diffusion models. We
self-distill the diffusion model to guide the noise estimate conditioned on the
target removal concept to match the unconditional one. Compared to the previous
methods, our method eliminates a much greater proportion of harmful content
from the generated images without degrading the overall image quality.
Furthermore, our method allows the removal of multiple concepts at once,
whereas previous works are limited to removing a single concept at a time.
Related papers
- Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning.
To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers.
The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z) - Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts.
The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts.
concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z) - JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits [76.25962336540226]
JIGMARK is a first-of-its-kind watermarking technique that enhances robustness through contrastive learning.
Our evaluation reveals that JIGMARK significantly surpasses existing watermarking solutions in resilience to diffusion-model edits.
arXiv Detail & Related papers (2024-06-06T03:31:41Z) - Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning [0.0]
We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning.
Our method can erase a concept within 10 s, making concept erasure more accessible than ever before.
arXiv Detail & Related papers (2024-05-12T14:01:05Z) - Removing Undesirable Concepts in Text-to-Image Diffusion Models with Learnable Prompts [23.04942433104886]
We propose a novel method to remove undesirable concepts from text-to-image diffusion models by incorporating a learnable prompt into the cross-attention module.
This learnable prompt acts as additional memory, capturing the knowledge of undesirable concepts.
We demonstrate the effectiveness of our method on the Stable Diffusion model, showcasing its superiority over state-of-the-art erasure methods.
arXiv Detail & Related papers (2024-03-18T23:42:04Z) - All but One: Surgical Concept Erasing with Model Preservation in
Text-to-Image Diffusion Models [22.60023885544265]
Large-scale datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them.
Fine-tuning algorithms have been developed to tackle concept erasing in diffusion models.
We present a new approach that solves all of these challenges.
arXiv Detail & Related papers (2023-12-20T07:04:33Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability.
These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos.
We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.