Related papers: DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

URL: http://arxiv.org/abs/2501.01125v1
Date: Thu, 02 Jan 2025 07:47:34 GMT
Title: DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
Authors: Feng Han, Kai Chen, Chao Gong, Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang,
Abstract summary: We propose our Dual encoder Modulation network (DuMo) which achieves precise erasure of inappropriate target concepts with minimum impairment to non-target concepts.<n>Our method achieves state-of-the-art performance on Explicit Content Erasure, Cartoon Concept Removal and Artistic Style Erasure, clearly outperforming alternative methods.
Score: 75.05165577219425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The exceptional generative capability of text-to-image models has raised substantial safety concerns regarding the generation of Not-Safe-For-Work (NSFW) content and potential copyright infringement. To address these concerns, previous methods safeguard the models by eliminating inappropriate concepts. Nonetheless, these models alter the parameters of the backbone network and exert considerable influences on the structural (low-frequency) components of the image, which undermines the model's ability to retain non-target concepts. In this work, we propose our Dual encoder Modulation network (DuMo), which achieves precise erasure of inappropriate target concepts with minimum impairment to non-target concepts. In contrast to previous methods, DuMo employs the Eraser with PRior Knowledge (EPR) module which modifies the skip connection features of the U-NET and primarily achieves concept erasure on details (high-frequency) components of the image. To minimize the damage to non-target concepts during erasure, the parameters of the backbone U-NET are frozen and the prior knowledge from the original skip connection features is introduced to the erasure process. Meanwhile, the phenomenon is observed that distinct erasing preferences for the image structure and details are demonstrated by the EPR at different timesteps and layers. Therefore, we adopt a novel Time-Layer MOdulation process (TLMO) that adjusts the erasure scale of EPR module's outputs across different layers and timesteps, automatically balancing the erasure effects and model's generative ability. Our method achieves state-of-the-art performance on Explicit Content Erasure, Cartoon Concept Removal and Artistic Style Erasure, clearly outperforming alternative methods. Code is available at https://github.com/Maplebb/DuMo

Related papers

FADE: Adversarial Concept Erasure in Flow Models [4.774890908509861]
We propose a novel textbfconcept erasure method for text-to-image diffusion models.<n>Our method combines a trajectory-aware fine-tuning strategy with an adversarial objective to ensure the concept is reliably removed.<n>We prove a formal guarantee that our approach minimizes the mutual information between the erased concept and the model's outputs.
arXiv Detail & Related papers (2025-07-16T14:31:21Z)
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate [10.996274286143244]
Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion.<n>We propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding emphnonlinear Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts.<n>CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts.
arXiv Detail & Related papers (2025-06-28T08:17:19Z)
TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models [0.0]
Concept erasure aims to remove or suppress specific concept information in a generative model.<n>Trajectory-Constrained Attentional Concept Erasure (TRACE) is a novel method to erase targeted concepts from diffusion models.<n>TRACE achieves state-of-the-art performance, outperforming recent methods such as ANT, EraseAnything, and MACE in terms of removal efficacy and output quality.
arXiv Detail & Related papers (2025-05-29T10:15:22Z)
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models. It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z)
Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models [24.15603438969762]
Interpret then Deactivate (ItD) is a novel framework to enable precise concept removal in T2I diffusion models. ItD uses a sparse autoencoder to interpret each concept as a combination of multiple features. It can be easily extended to erase multiple concepts without requiring further training.
arXiv Detail & Related papers (2025-03-12T14:46:40Z)
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation [73.16975077770765]
Modular customization is essential for applications like concept stylization and multi-concept customization. Instant merging methods often cause identity loss and interference of individual merged concepts. We propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity.
arXiv Detail & Related papers (2025-03-11T16:10:36Z)
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models [41.284399182295026]
We introduce SPEED, a model editing-based concept erasure approach that leverages null-space constraints for scalable, precise, and efficient erasure. SPEED consistently outperforms existing methods in prior preservation while achieving efficient and high-fidelity concept erasure.
arXiv Detail & Related papers (2025-03-10T14:40:01Z)
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models [45.393001061726366]
Recent advances in text-to-image diffusion models enable photorealistic image generation, but they also risk producing malicious content, such as NSFW images. To mitigate risk, concept erasure methods are studied to facilitate the model to unlearn specific concepts. We propose TRCE, using a two-stage concept erasure strategy to achieve an effective trade-off between reliable erasure and knowledge preservation.
arXiv Detail & Related papers (2025-03-10T14:37:53Z)
EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers [33.195628798316754]
EraseAnything is the first method specifically developed to address concept erasure within the latest flow-based T2I framework.<n>We formulate concept erasure as a bi-level optimization problem, employing LoRA-based parameter tuning and an attention map regularizer.<n>We propose a self-contrastive learning strategy to ensure that removing unwanted concepts does not inadvertently harm performance on unrelated ones.
arXiv Detail & Related papers (2024-12-29T09:42:53Z)
Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models [57.16056181201623]
Fine-tuning text-to-image diffusion models can inadvertently undo safety measures, causing models to relearn harmful concepts.<n>We present a novel but immediate solution called Modular LoRA, which involves training Safety Low-Rank Adaptation modules separately from Fine-Tuning LoRA components.<n>This method effectively prevents the re-learning of harmful content without compromising the model's performance on new tasks.
arXiv Detail & Related papers (2024-11-30T04:37:38Z)
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM) It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner. Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z)
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning. To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models [22.60023885544265]
Large-scale datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Fine-tuning algorithms have been developed to tackle concept erasing in diffusion models. We present a new approach that solves all of these challenges.
arXiv Detail & Related papers (2023-12-20T07:04:33Z)
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.