Related papers: Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models

Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models

URL: http://arxiv.org/abs/2512.13039v2
Date: Tue, 16 Dec 2025 09:24:35 GMT
Title: Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models
Authors: Hao Chen, Yiwei Wang, Songze Li,
Abstract summary: Concept erasure has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.<n>We propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously.
Score: 32.35244979539898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concept erasure, which fine-tunes diffusion models to remove undesired or harmful visual concepts, has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.However, existing removal methods typically adopt a unidirectional erasure strategy by either suppressing the target concept or reinforcing safe alternatives, making it difficult to achieve a balanced trade-off between concept removal and generation quality. To address this limitation, we propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously. Specifically, based on the joint representation of text prompts and corresponding images, Bi-Erasing introduces two decoupled image branches: a negative branch responsible for suppressing harmful semantics and a positive branch providing visual guidance for safe alternatives. By jointly optimizing these complementary directions, our approach achieves a balance between erasure efficacy and generation usability. In addition, we apply mask-based filtering to the image branches to prevent interference from irrelevant content during the erasure process. Across extensive experiment evaluations, the proposed Bi-Erasing outperforms baseline methods in balancing concept removal effectiveness and visual fidelity.

Related papers

ConceptPrism: Concept Disentanglement in Personalized Diffusion Models via Residual Token Optimization [11.472088067393074]
ConceptPrism is a novel framework that automatically disentangles the shared visual concept from image-specific residuals.<n>In experiments, ConceptPrism effectively resolves concept entanglement, achieving a significantly improved trade-off between fidelity and alignment.
arXiv Detail & Related papers (2026-02-23T07:46:19Z)
Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation [17.59828667571619]
Existing concept erasure approaches focus on removing unsafe concepts without providing guidance toward corresponding safe alternatives.<n>We propose a novel framework, PAIRed Erasing, which reframes concept erasure from simple removal to consistency-preserving semantic realignment.<n>Our approach significantly outperforms state-of-the-art baselines, achieving effective concept erasure while preserving structural integrity, semantic coherence, and generation quality.
arXiv Detail & Related papers (2026-02-05T06:05:24Z)
Differential Vector Erasure: Unified Training-Free Concept Erasure for Flow Matching Models [49.10620605347065]
We propose Differential Vector Erasure (DVE), a training-free concept erasure method specifically designed for flow matching models.<n>Our key insight is that semantic concepts are implicitly encoded in the directional structure of the velocity field governing the generative flow.<n>During inference, DVE selectively removes concept-specific components by projecting the velocity field onto the differential direction, enabling precise concept suppression without affecting irrelevant semantics.
arXiv Detail & Related papers (2026-02-01T08:05:45Z)
GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models [24.278300091974085]
Concept erasure aims to remove harmful, inappropriate, or copyrighted content from text-to-image diffusion models.<n>We propose Graph-Guided Online Concept Erasure (GrOCE), a training-free framework that performs precise and adaptive concept removal.
arXiv Detail & Related papers (2025-11-17T04:47:16Z)
VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation [57.36681904639463]
Methods to safeguard autoregressive text-to-image models remain underexplored.<n>We propose Visual Contrast Exploitation (VCE), a novel framework that precisely decouples unsafe concepts from their associated content semantics.<n>Our experiments demonstrate that our method effectively secures the model, achieving state-of-the-art results while erasing unsafe concepts and maintaining the integrity of unrelated safe concepts.
arXiv Detail & Related papers (2025-09-21T09:00:27Z)
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate [10.996274286143244]
Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion.<n>We propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding emphnonlinear Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts.<n>CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts.
arXiv Detail & Related papers (2025-06-28T08:17:19Z)
TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models [0.0]
Concept erasure aims to remove or suppress specific concept information in a generative model.<n>Trajectory-Constrained Attentional Concept Erasure (TRACE) is a novel method to erase targeted concepts from diffusion models.<n>TRACE achieves state-of-the-art performance, outperforming recent methods such as ANT, EraseAnything, and MACE in terms of removal efficacy and output quality.
arXiv Detail & Related papers (2025-05-29T10:15:22Z)
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework [127.07102988701092]
We introduce the first text-image Collaborative Concept Erasing (Co-Erasing) framework.<n>Co-Erasing describes the concept jointly by text prompts and the corresponding undesirable images induced by the prompts.<n>We design a text-guided image concept refinement strategy that directs the model to focus on visual features most relevant to the specified text concept.
arXiv Detail & Related papers (2025-05-16T11:25:50Z)
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models.<n>It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z)
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models [53.937498564603054]
Recent advances in text-to-image diffusion models enable photorealistic image generation, but they also risk producing malicious content, such as NSFW images.<n>To mitigate risk, concept erasure methods are studied to facilitate the model to unlearn specific concepts.<n>We propose TRCE, using a two-stage concept erasure strategy to achieve an effective trade-off between reliable erasure and knowledge preservation.
arXiv Detail & Related papers (2025-03-10T14:37:53Z)
Growth Inhibitors for Suppressing Inappropriate Image Concepts in Diffusion Models [35.2881940850787]
Text-to-image diffusion models inadvertently learn inappropriate concepts from vast and unfiltered training data.<n>Our method effectively captures the manifestation of subtle words at the image level, enabling direct and efficient erasure of target concepts.
arXiv Detail & Related papers (2024-08-02T05:17:14Z)
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts.<n>The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts.<n> concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.