Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation
- URL: http://arxiv.org/abs/2511.18684v1
- Date: Mon, 24 Nov 2025 01:48:44 GMT
- Title: Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation
- Authors: Shristi Das Biswas, Arani Roy, Kaushik Roy,
- Abstract summary: We introduce Instant Concept Erasure (ICE), a training-free, modality-agnostic, one-shot weight modification approach that achieves precise, persistent unlearning with zero overhead.<n>ICE defines erase and preserve subspaces using anisotropic energy-weighted scaling, then explicitly regularises against their intersection using a unique, closed-form overlap projector.<n>It efficiently achieves strong erasure with improved robustness to red-teaming, all while causing only minimal degradation of original generative abilities in both T2I and T2V models.
- Score: 7.68494752148263
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Robust concept removal for text-to-image (T2I) and text-to-video (T2V) models is essential for their safe deployment. Existing methods, however, suffer from costly retraining, inference overhead, or vulnerability to adversarial attacks. Crucially, they rarely model the latent semantic overlap between the target erase concept and surrounding content -- causing collateral damage post-erasure -- and even fewer methods work reliably across both T2I and T2V domains. We introduce Instant Concept Erasure (ICE), a training-free, modality-agnostic, one-shot weight modification approach that achieves precise, persistent unlearning with zero overhead. ICE defines erase and preserve subspaces using anisotropic energy-weighted scaling, then explicitly regularises against their intersection using a unique, closed-form overlap projector. We pose a convex and Lipschitz-bounded Spectral Unlearning Objective, balancing erasure fidelity and intersection preservation, that admits a stable and unique analytical solution. This solution defines a dissociation operator that is translated to the model's text-conditioning layers, making the edit permanent and runtime-free. Across targeted removals of artistic styles, objects, identities, and explicit content, ICE efficiently achieves strong erasure with improved robustness to red-teaming, all while causing only minimal degradation of original generative abilities in both T2I and T2V models.
Related papers
- EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization [18.80236205171204]
EraseAnything++ is a unified framework for concept erasure in both image and video diffusion models.<n>Our method anchors erasure on key visual representations and propagates it consistently across spatial and temporal dimensions.<n>In the video setting, we further enhance consistency through an anchor-and-propagate mechanism that initializes erasure on reference frames and enforces it throughout subsequent transformer layers.
arXiv Detail & Related papers (2026-03-01T08:13:05Z) - AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models [36.91937453334139]
Concept erasure helps stop diffusion models (DMs) from generating harmful content, but current methods face retention trade off.<n>This paper introduces Adversarial Erasure with Gradient Informed Synergy (AEGIS), a retention-data-free framework that advances both robustness and retention.
arXiv Detail & Related papers (2026-02-06T15:27:42Z) - CGCE: Classifier-Guided Concept Erasure in Generative Models [53.7410000675294]
Concept erasure has been developed to remove undesirable concepts from pre-trained models.<n>Existing methods remain vulnerable to adversarial attacks that can regenerate the erased content.<n>We introduce an efficient plug-and-play framework that provides robust concept erasure for diverse generative models.
arXiv Detail & Related papers (2025-11-08T05:38:18Z) - Zero-Residual Concept Erasure via Progressive Alignment in Text-to-Image Model [15.636542463543066]
Concept Erasure aims to prevent pretrained text-to-image models from generating content associated with semantic-harmful concepts.<n>Existing methods often result in incomplete erasure due to "non-zero alignment residual"<n>We propose a novel closed-form method ErasePro: it is designed for more complete concept erasure and better preserving overall generative quality.
arXiv Detail & Related papers (2025-08-06T14:19:32Z) - CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models [7.68494752148263]
CURE is a training-free concept unlearning framework that operates directly in the weight space of pre-trained diffusion models.<n>The Spectral Eraser identifies and isolates features unique to the undesired concept while preserving safe attributes.<n>CURE achieves a more efficient and thorough removal for targeted artistic styles, objects, identities, or explicit content.
arXiv Detail & Related papers (2025-05-19T03:53:06Z) - Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models.<n>It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z) - Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models [17.694838443796026]
Interpret then Deactivate (ItD) is a novel framework to enable precise concept removal in T2I diffusion models.<n>ItD uses a sparse autoencoder to interpret each concept as a combination of multiple features.<n>It can be easily extended to erase multiple concepts without requiring further training.
arXiv Detail & Related papers (2025-03-12T14:46:40Z) - SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models [56.83154571623655]
We introduce SPEED, an efficient concept erasure approach that directly edits model parameters.<n>Speedy searches for a null space, a model editing space where parameter updates do not affect non-target concepts.<n>We successfully erase 100 concepts within only 5 seconds.
arXiv Detail & Related papers (2025-03-10T14:40:01Z) - DuMo: Dual Encoder Modulation Network for Precise Concept Erasure [75.05165577219425]
We propose our Dual encoder Modulation network (DuMo) which achieves precise erasure of inappropriate target concepts with minimum impairment to non-target concepts.<n>Our method achieves state-of-the-art performance on Explicit Content Erasure, Cartoon Concept Removal and Artistic Style Erasure, clearly outperforming alternative methods.
arXiv Detail & Related papers (2025-01-02T07:47:34Z) - SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation [65.30207993362595]
Unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges.<n>We propose SAFREE, a training-free approach for safe T2I and T2V.<n>We detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace.
arXiv Detail & Related papers (2024-10-16T17:32:23Z) - Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning.
To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers.
The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.