Related papers: ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

URL: http://arxiv.org/abs/2405.19237v1
Date: Wed, 29 May 2024 16:19:37 GMT
Title: ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Authors: Ruchika Chavhan, Da Li, Timothy Hospedales,
Abstract summary: Large-scale text-to-image diffusion models have demonstrated impressive image-generation capabilities. We present ConceptPrune, wherein we first identify critical regions within pre-trained models responsible for generating undesirable concepts. Experiments across a range of concepts including artistic styles, nudity, object erasure, and gender debiasing demonstrate that target concepts can be efficiently erased by pruning a tiny fraction.
Score: 10.201633236997104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While large-scale text-to-image diffusion models have demonstrated impressive image-generation capabilities, there are significant concerns about their potential misuse for generating unsafe content, violating copyright, and perpetuating societal biases. Recently, the text-to-image generation community has begun addressing these concerns by editing or unlearning undesired concepts from pre-trained models. However, these methods often involve data-intensive and inefficient fine-tuning or utilize various forms of token remapping, rendering them susceptible to adversarial jailbreaks. In this paper, we present a simple and effective training-free approach, ConceptPrune, wherein we first identify critical regions within pre-trained models responsible for generating undesirable concepts, thereby facilitating straightforward concept unlearning via weight pruning. Experiments across a range of concepts including artistic styles, nudity, object erasure, and gender debiasing demonstrate that target concepts can be efficiently erased by pruning a tiny fraction, approximately 0.12% of total weights, enabling multi-concept erasure and robustness against various white-box and black-box adversarial attacks.

Related papers

Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate [10.996274286143244]
Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion.<n>We propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding emphnonlinear Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts.<n>CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts.
arXiv Detail & Related papers (2025-06-28T08:17:19Z)
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework [127.07102988701092]
We introduce the first text-image Collaborative Concept Erasing (Co-Erasing) framework.<n>Co-Erasing describes the concept jointly by text prompts and the corresponding undesirable images induced by the prompts.<n>We design a text-guided image concept refinement strategy that directs the model to focus on visual features most relevant to the specified text concept.
arXiv Detail & Related papers (2025-05-16T11:25:50Z)
ACE: Attentional Concept Erasure in Diffusion Models [0.0]
Attentional Concept Erasure integrates a closed-form attention manipulation with lightweight fine-tuning. We show that ACE achieves state-of-the-art concept removal efficacy and robustness. Compared to prior methods, ACE better balances generality (erasing concept and related terms) and specificity (preserving unrelated content)
arXiv Detail & Related papers (2025-04-16T08:16:28Z)
Concept Corrector: Erase concepts on the fly for text-to-image diffusion models [13.065682925662237]
Concept erasure aims to erase any undesired concepts that the models can generate. We propose Concept Corrector, which checks target concepts based on visual features provided by final generated images predicted at certain time steps. In the whole pipeline, our method changes no model parameters and only requires a given target concept as well as the corresponding replacement content.
arXiv Detail & Related papers (2025-02-22T21:53:43Z)
OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes. We propose OmniPrism, a visual concept disentangling approach for creative image generation. Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z)
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning. To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images. Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z)
Pruning for Robust Concept Erasing in Diffusion Models [27.67237515704348]
We introduce a new pruning-based strategy for concept erasing. Our method selectively prunes critical parameters associated with the concepts targeted for removal, thereby reducing the sensitivity of concept-related neurons. Experimental results show a significant enhancement in our model's ability to resist adversarial inputs.
arXiv Detail & Related papers (2024-05-26T11:42:20Z)
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient [20.091446060893638]
This paper proposes a concept domain correction framework for unlearning concepts in diffusion models. By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results.
arXiv Detail & Related papers (2024-05-24T07:47:36Z)
Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models. The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure. Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z)
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers [24.64639078273091]
Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. We propose Reliable Concept Erasing via Lightweight Erasers (Receler)
arXiv Detail & Related papers (2023-11-29T15:19:49Z)
Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else [75.6806649860538]
We consider a more ambitious goal: natural multi-concept generation using a pre-trained diffusion model. We observe concept dominance and non-localized contribution that severely degrade multi-concept generation performance. We design a minimal low-cost solution that overcomes the above issues by tweaking the text embeddings for more realistic multi-concept text-to-image generation.
arXiv Detail & Related papers (2023-10-11T12:05:44Z)
Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)
Circumventing Concept Erasure Methods For Text-to-Image Generative Models [26.804057000265434]
Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts. These models have numerous drawbacks, including their potential to generate images featuring sexually explicit content. Various methods have been proposed in order to "erase" sensitive concepts from text-to-image models.
arXiv Detail & Related papers (2023-08-03T02:34:01Z)
Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.