All but One: Surgical Concept Erasing with Model Preservation in
Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2312.12807v1
- Date: Wed, 20 Dec 2023 07:04:33 GMT
- Title: All but One: Surgical Concept Erasing with Model Preservation in
Text-to-Image Diffusion Models
- Authors: Seunghoo Hong, Juhun Lee, Simon S. Woo
- Abstract summary: Large-scale datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them.
Fine-tuning algorithms have been developed to tackle concept erasing in diffusion models.
We present a new approach that solves all of these challenges.
- Score: 22.60023885544265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-Image models such as Stable Diffusion have shown impressive image
generation synthesis, thanks to the utilization of large-scale datasets.
However, these datasets may contain sexually explicit, copyrighted, or
undesirable content, which allows the model to directly generate them. Given
that retraining these large models on individual concept deletion requests is
infeasible, fine-tuning algorithms have been developed to tackle concept
erasing in diffusion models. While these algorithms yield good concept erasure,
they all present one of the following issues: 1) the corrupted feature space
yields synthesis of disintegrated objects, 2) the initially synthesized content
undergoes a divergence in both spatial structure and semantics in the generated
images, and 3) sub-optimal training updates heighten the model's susceptibility
to utility harm. These issues severely degrade the original utility of
generative models. In this work, we present a new approach that solves all of
these challenges. We take inspiration from the concept of classifier guidance
and propose a surgical update on the classifier guidance term while
constraining the drift of the unconditional score term. Furthermore, our
algorithm empowers the user to select an alternative to the erasing concept,
allowing for more controllability. Our experimental results show that our
algorithm not only erases the target concept effectively but also preserves the
model's generation capability.
Related papers
- Model Integrity when Unlearning with T2I Diffusion Models [11.321968363411145]
We propose approximate Machine Unlearning algorithms to reduce the generation of specific types of images, characterized by samples from a forget distribution''
We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.
arXiv Detail & Related papers (2024-11-04T13:15:28Z) - Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models [13.479224197351673]
We show that fine-tuning a text-to-image diffusion model on seemingly unrelated images can cause it to "relearn" concepts that were previously "unlearned"
Our findings underscore the fragility of composing incremental model updates.
arXiv Detail & Related papers (2024-10-10T16:10:27Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning.
To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers.
The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z) - Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient [20.091446060893638]
This paper proposes a concept domain correction framework for unlearning concepts in diffusion models.
By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results.
arXiv Detail & Related papers (2024-05-24T07:47:36Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.