Related papers: Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2307.05977v1
Date: Wed, 12 Jul 2023 07:48:29 GMT
Title: Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
Authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee
Abstract summary: We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
Score: 63.20512617502273
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale image generation models, with impressive quality made possible by the vast amount of data available on the Internet, raise social concerns that these models may generate harmful or copyrighted content. The biases and harmfulness arise throughout the entire training process and are hard to completely remove, which have become significant hurdles to the safe deployment of these models. In this paper, we propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. We self-distill the diffusion model to guide the noise estimate conditioned on the target removal concept to match the unconditional one. Compared to the previous methods, our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality. Furthermore, our method allows the removal of multiple concepts at once, whereas previous works are limited to removing a single concept at a time.

Related papers

TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models [45.393001061726366]
Recent advances in text-to-image diffusion models enable photorealistic image generation, but they also risk producing malicious content, such as NSFW images. To mitigate risk, concept erasure methods are studied to facilitate the model to unlearn specific concepts. We propose TRCE, using a two-stage concept erasure strategy to achieve an effective trade-off between reliable erasure and knowledge preservation.
arXiv Detail & Related papers (2025-03-10T14:37:53Z)
Continuous Concepts Removal in Text-to-image Diffusion Models [27.262721132177845]
Concerns have been raised about the potential for text-to-image models to create content that infringes on copyrights or depicts disturbing subject matter. We propose a novel approach called CCRT that includes a designed knowledge distillation paradigm. It constrains the text-image alignment behavior during the continuous concept removal process by using a set of text prompts.
arXiv Detail & Related papers (2024-11-30T20:40:10Z)
Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models [57.16056181201623]
Fine-tuning text-to-image diffusion models can inadvertently undo safety measures, causing models to relearn harmful concepts. We present a novel but immediate solution called Modular LoRA, which involves training Safety Low-Rank Adaptation modules separately from Fine-Tuning LoRA components. This method effectively prevents the re-learning of harmful content without compromising the model's performance on new tasks.
arXiv Detail & Related papers (2024-11-30T04:37:38Z)
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [49.60774626839712]
Training multimodal generative models can expose users to harmful, unsafe and controversial or culturally-inappropriate outputs. We propose a modular, dynamic solution that leverages safety-context embeddings and a dual reconstruction process to generate safer images. We achieve state-of-the-art results on safe image generation benchmarks, while offering controllable variation of model safety.
arXiv Detail & Related papers (2024-11-21T09:47:13Z)
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework. Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss. We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z)
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning. To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images. Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z)
Removing Undesirable Concepts in Text-to-Image Diffusion Models with Learnable Prompts [23.04942433104886]
We propose a novel method to remove undesirable concepts from text-to-image diffusion models by incorporating a learnable prompt into the cross-attention module. This learnable prompt acts as additional memory, capturing the knowledge of undesirable concepts. We demonstrate the effectiveness of our method on the Stable Diffusion model, showcasing its superiority over state-of-the-art erasure methods.
arXiv Detail & Related papers (2024-03-18T23:42:04Z)
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models [22.60023885544265]
Large-scale datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Fine-tuning algorithms have been developed to tackle concept erasing in diffusion models. We present a new approach that solves all of these challenges.
arXiv Detail & Related papers (2023-12-20T07:04:33Z)
Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.