Minimalist Concept Erasure in Generative Models
- URL: http://arxiv.org/abs/2507.13386v1
- Date: Wed, 16 Jul 2025 00:32:14 GMT
- Title: Minimalist Concept Erasure in Generative Models
- Authors: Yang Zhang, Er Jin, Yanfei Dong, Yixuan Wu, Philip Torr, Ashkan Khakzar, Johannes Stegmaier, Kenji Kawaguchi,
- Abstract summary: We develop a novel minimalist concept erasure objective based emphonly on the distributional distance of final generation outputs.<n>To improve the robustness of the erasure, we incorporate neuron masking as an alternative to model fine-tuning.<n> Empirical evaluations on state-of-the-art flow-matching models demonstrate that our method robustly erases concepts without degrading overall model performance.
- Score: 29.548315937553756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in generative models have demonstrated remarkable capabilities in producing high-quality images, but their reliance on large-scale unlabeled data has raised significant safety and copyright concerns. Efforts to address these issues by erasing unwanted concepts have shown promise. However, many existing erasure methods involve excessive modifications that compromise the overall utility of the model. In this work, we address these issues by formulating a novel minimalist concept erasure objective based \emph{only} on the distributional distance of final generation outputs. Building on our formulation, we derive a tractable loss for differentiable optimization that leverages backpropagation through all generation steps in an end-to-end manner. We also conduct extensive analysis to show theoretical connections with other models and methods. To improve the robustness of the erasure, we incorporate neuron masking as an alternative to model fine-tuning. Empirical evaluations on state-of-the-art flow-matching models demonstrate that our method robustly erases concepts without degrading overall model performance, paving the way for safer and more responsible generative models.
Related papers
- TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models [0.0]
Concept erasure aims to remove or suppress specific concept information in a generative model.<n>Trajectory-Constrained Attentional Concept Erasure (TRACE) is a novel method to erase targeted concepts from diffusion models.<n>TRACE achieves state-of-the-art performance, outperforming recent methods such as ANT, EraseAnything, and MACE in terms of removal efficacy and output quality.
arXiv Detail & Related papers (2025-05-29T10:15:22Z) - When Are Concepts Erased From Diffusion Models? [44.89615668122767]
Concept erasure is the ability to selectively prevent a model from generating specific concepts.<n>We propose two conceptual models for the erasure mechanism in diffusion models.<n>To thoroughly assess whether a concept has been truly erased from the model, we introduce a suite of independent evaluations.
arXiv Detail & Related papers (2025-05-22T17:59:09Z) - Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models [93.76814568163353]
We propose a novel bilevel optimization framework for pruned diffusion models.<n>This framework consolidates the fine-tuning and unlearning processes into a unified phase.<n>It is compatible with various pruning and concept unlearning methods.
arXiv Detail & Related papers (2024-12-19T19:13:18Z) - Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data.<n>This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns.<n>We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z) - Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning.
To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers.
The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z) - Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models.
The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure.
Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z) - All but One: Surgical Concept Erasing with Model Preservation in
Text-to-Image Diffusion Models [22.60023885544265]
Large-scale datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them.
Fine-tuning algorithms have been developed to tackle concept erasing in diffusion models.
We present a new approach that solves all of these challenges.
arXiv Detail & Related papers (2023-12-20T07:04:33Z) - Revisiting Implicit Models: Sparsity Trade-offs Capability in
Weight-tied Model for Vision Tasks [4.872984658007499]
Implicit models such as Deep Equilibrium Models (DEQs) have garnered significant attention in the community for their ability to train infinite layer models.
We revisit the line of implicit models and trace them back to the original weight-tied models.
Surprisingly, we observe that weight-tied models are more effective, stable, as well as efficient on vision tasks, compared to the DEQ variants.
arXiv Detail & Related papers (2023-07-16T11:45:35Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.