Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
- URL: http://arxiv.org/abs/2405.15304v1
- Date: Fri, 24 May 2024 07:47:36 GMT
- Title: Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
- Authors: Yongliang Wu, Shiji Zhou, Mingzhuo Yang, Lianzhe Wang, Wenbo Zhu, Heng Chang, Xiao Zhou, Xu Yang,
- Abstract summary: This paper proposes a concept domain correction framework for unlearning concepts in diffusion models.
By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results.
- Score: 20.091446060893638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current text-to-image diffusion models have achieved groundbreaking results in image generation tasks. However, the unavoidable inclusion of sensitive information during pre-training introduces significant risks such as copyright infringement and privacy violations in the generated images. Machine Unlearning (MU) provides a effective way to the sensitive concepts captured by the model, has been shown to be a promising approach to addressing these issues. Nonetheless, existing MU methods for concept erasure encounter two primary bottlenecks: 1) generalization issues, where concept erasure is effective only for the data within the unlearn set, and prompts outside the unlearn set often still result in the generation of sensitive concepts; and 2) utility drop, where erasing target concepts significantly degrades the model's performance. To this end, this paper first proposes a concept domain correction framework for unlearning concepts in diffusion models. By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results. Secondly, we devise a concept-preserving scheme based on gradient surgery. This approach alleviates the parts of the unlearning gradient that contradict the relearning gradient, ensuring that the process of unlearning minimally disrupts the model's performance. Finally, extensive experiments validate the effectiveness of our model, demonstrating our method's capability to address the challenges of concept unlearning in diffusion models while preserving model utility.
Related papers
- Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them [21.386640828092524]
Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models.
We propose the Adaptive Guided Erasure (AGE) method, which emphdynamically selects optimal target concepts tailored to each undesirable concept.
Results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance.
arXiv Detail & Related papers (2025-01-31T08:17:23Z) - Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization [48.20360860166279]
Large-scale diffusion models produce high-quality images but often generate unwanted content, such as sexually explicit or violent content.
We propose a novel approach for targeted concept replacing in diffusion models, enabling specific concepts to be removed without affecting non-target areas.
Our method introduces a dedicated concept localizer for precisely identifying the target concept during the denoising process, trained with few-shot learning to require minimal labeled data.
Within the identified region, we introduce a training-free Dual Prompts Cross-Attention (DPCA) module to substitute the target concept, ensuring minimal disruption to surrounding content.
arXiv Detail & Related papers (2024-12-02T08:05:39Z) - Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images.
Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z) - ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction [20.43411883845885]
We introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts.
Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models.
We present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.
arXiv Detail & Related papers (2024-07-09T17:50:28Z) - ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning [10.201633236997104]
Large-scale text-to-image diffusion models have demonstrated impressive image-generation capabilities.
We present ConceptPrune, wherein we first identify critical regions within pre-trained models responsible for generating undesirable concepts.
Experiments across a range of concepts including artistic styles, nudity, object erasure, and gender debiasing demonstrate that target concepts can be efficiently erased by pruning a tiny fraction.
arXiv Detail & Related papers (2024-05-29T16:19:37Z) - Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions.
Existing approaches often require numerous human interventions per image to achieve strong performances.
We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z) - Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities.
We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z) - Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models.
The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure.
Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.