Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models
- URL: http://arxiv.org/abs/2512.13039v2
- Date: Tue, 16 Dec 2025 09:24:35 GMT
- Title: Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models
- Authors: Hao Chen, Yiwei Wang, Songze Li,
- Abstract summary: Concept erasure has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.<n>We propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously.
- Score: 32.35244979539898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Concept erasure, which fine-tunes diffusion models to remove undesired or harmful visual concepts, has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.However, existing removal methods typically adopt a unidirectional erasure strategy by either suppressing the target concept or reinforcing safe alternatives, making it difficult to achieve a balanced trade-off between concept removal and generation quality. To address this limitation, we propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously. Specifically, based on the joint representation of text prompts and corresponding images, Bi-Erasing introduces two decoupled image branches: a negative branch responsible for suppressing harmful semantics and a positive branch providing visual guidance for safe alternatives. By jointly optimizing these complementary directions, our approach achieves a balance between erasure efficacy and generation usability. In addition, we apply mask-based filtering to the image branches to prevent interference from irrelevant content during the erasure process. Across extensive experiment evaluations, the proposed Bi-Erasing outperforms baseline methods in balancing concept removal effectiveness and visual fidelity.
Related papers
- ConceptPrism: Concept Disentanglement in Personalized Diffusion Models via Residual Token Optimization [11.472088067393074]
ConceptPrism is a novel framework that automatically disentangles the shared visual concept from image-specific residuals.<n>In experiments, ConceptPrism effectively resolves concept entanglement, achieving a significantly improved trade-off between fidelity and alignment.
arXiv Detail & Related papers (2026-02-23T07:46:19Z) - Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation [17.59828667571619]
Existing concept erasure approaches focus on removing unsafe concepts without providing guidance toward corresponding safe alternatives.<n>We propose a novel framework, PAIRed Erasing, which reframes concept erasure from simple removal to consistency-preserving semantic realignment.<n>Our approach significantly outperforms state-of-the-art baselines, achieving effective concept erasure while preserving structural integrity, semantic coherence, and generation quality.
arXiv Detail & Related papers (2026-02-05T06:05:24Z) - Differential Vector Erasure: Unified Training-Free Concept Erasure for Flow Matching Models [49.10620605347065]
We propose Differential Vector Erasure (DVE), a training-free concept erasure method specifically designed for flow matching models.<n>Our key insight is that semantic concepts are implicitly encoded in the directional structure of the velocity field governing the generative flow.<n>During inference, DVE selectively removes concept-specific components by projecting the velocity field onto the differential direction, enabling precise concept suppression without affecting irrelevant semantics.
arXiv Detail & Related papers (2026-02-01T08:05:45Z) - GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models [24.278300091974085]
Concept erasure aims to remove harmful, inappropriate, or copyrighted content from text-to-image diffusion models.<n>We propose Graph-Guided Online Concept Erasure (GrOCE), a training-free framework that performs precise and adaptive concept removal.
arXiv Detail & Related papers (2025-11-17T04:47:16Z) - VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation [57.36681904639463]
Methods to safeguard autoregressive text-to-image models remain underexplored.<n>We propose Visual Contrast Exploitation (VCE), a novel framework that precisely decouples unsafe concepts from their associated content semantics.<n>Our experiments demonstrate that our method effectively secures the model, achieving state-of-the-art results while erasing unsafe concepts and maintaining the integrity of unrelated safe concepts.
arXiv Detail & Related papers (2025-09-21T09:00:27Z) - Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate [10.996274286143244]
Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion.<n>We propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding emphnonlinear Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts.<n>CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts.
arXiv Detail & Related papers (2025-06-28T08:17:19Z) - TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models [0.0]
Concept erasure aims to remove or suppress specific concept information in a generative model.<n>Trajectory-Constrained Attentional Concept Erasure (TRACE) is a novel method to erase targeted concepts from diffusion models.<n>TRACE achieves state-of-the-art performance, outperforming recent methods such as ANT, EraseAnything, and MACE in terms of removal efficacy and output quality.
arXiv Detail & Related papers (2025-05-29T10:15:22Z) - One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework [127.07102988701092]
We introduce the first text-image Collaborative Concept Erasing (Co-Erasing) framework.<n>Co-Erasing describes the concept jointly by text prompts and the corresponding undesirable images induced by the prompts.<n>We design a text-guided image concept refinement strategy that directs the model to focus on visual features most relevant to the specified text concept.
arXiv Detail & Related papers (2025-05-16T11:25:50Z) - Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models.<n>It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z) - TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models [53.937498564603054]
Recent advances in text-to-image diffusion models enable photorealistic image generation, but they also risk producing malicious content, such as NSFW images.<n>To mitigate risk, concept erasure methods are studied to facilitate the model to unlearn specific concepts.<n>We propose TRCE, using a two-stage concept erasure strategy to achieve an effective trade-off between reliable erasure and knowledge preservation.
arXiv Detail & Related papers (2025-03-10T14:37:53Z) - Growth Inhibitors for Suppressing Inappropriate Image Concepts in Diffusion Models [35.2881940850787]
Text-to-image diffusion models inadvertently learn inappropriate concepts from vast and unfiltered training data.<n>Our method effectively captures the manifestation of subtle words at the image level, enabling direct and efficient erasure of target concepts.
arXiv Detail & Related papers (2024-08-02T05:17:14Z) - Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts.<n>The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts.<n> concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.