Related papers: Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

URL: http://arxiv.org/abs/2510.22851v1
Date: Sun, 26 Oct 2025 22:04:17 GMT
Title: Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models
Authors: Lexiang Xiong, Chengyu Liu, Jingwen Ye, Yan Liu, Yuecong Xu,
Abstract summary: We introduce a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process.<n>We achieve superior completeness and robustness while preserving locality and image quality.<n>This robustness also allows our framework to function as a built-in threat detection system, offering a practical solution for safer text-to-image generation.
Score: 27.672305802461377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concept erasure in text-to-image diffusion models is crucial for mitigating harmful content, yet existing methods often compromise generative quality. We introduce Semantic Surgery, a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process. It dynamically estimates the presence of target concepts in a prompt and performs a calibrated vector subtraction to neutralize their influence at the source, enhancing both erasure completeness and locality. The framework includes a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence. As a training-free method, Semantic Surgery adapts dynamically to each prompt, ensuring precise interventions. Extensive experiments on object, explicit content, artistic style, and multi-celebrity erasure tasks show our method significantly outperforms state-of-the-art approaches. We achieve superior completeness and robustness while preserving locality and image quality (e.g., 93.58 H-score in object erasure, reducing explicit content to just 1 instance, and 8.09 H_a in style erasure with no quality degradation). This robustness also allows our framework to function as a built-in threat detection system, offering a practical solution for safer text-to-image generation.

Related papers

SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models [67.84174763413178]
We introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection.<n>We show that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks.
arXiv Detail & Related papers (2026-01-13T15:01:38Z)
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models [48.34555526275907]
We propose a novel framework VARE that enables stable concept erasure in visual autoregressive models.<n>We then introduce S-VARE, a novel and effective concept erasure method designed for VAR.<n>Our approach achieves surgical concept erasure while preserving generation quality, thereby closing the safety gap in autoregressive text-to-image generation.
arXiv Detail & Related papers (2025-09-26T14:26:52Z)
VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation [57.36681904639463]
Methods to safeguard autoregressive text-to-image models remain underexplored.<n>We propose Visual Contrast Exploitation (VCE), a novel framework that precisely decouples unsafe concepts from their associated content semantics.<n>Our experiments demonstrate that our method effectively secures the model, achieving state-of-the-art results while erasing unsafe concepts and maintaining the integrity of unrelated safe concepts.
arXiv Detail & Related papers (2025-09-21T09:00:27Z)
Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and Robustness [4.23067546195708]
textbfSCORE (Secure and Concept-Oriented Robust Erasure) is a novel framework for robust concept removal in diffusion models.<n>SCORE sets a new standard for secure and robust concept erasure in diffusion models.
arXiv Detail & Related papers (2025-09-15T15:05:50Z)
TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models [0.0]
Concept erasure aims to remove or suppress specific concept information in a generative model.<n>Trajectory-Constrained Attentional Concept Erasure (TRACE) is a novel method to erase targeted concepts from diffusion models.<n>TRACE achieves state-of-the-art performance, outperforming recent methods such as ANT, EraseAnything, and MACE in terms of removal efficacy and output quality.
arXiv Detail & Related papers (2025-05-29T10:15:22Z)
CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models [7.68494752148263]
CURE is a training-free concept unlearning framework that operates directly in the weight space of pre-trained diffusion models.<n>The Spectral Eraser identifies and isolates features unique to the undesired concept while preserving safe attributes.<n>CURE achieves a more efficient and thorough removal for targeted artistic styles, objects, identities, or explicit content.
arXiv Detail & Related papers (2025-05-19T03:53:06Z)
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework [127.07102988701092]
We introduce the first text-image Collaborative Concept Erasing (Co-Erasing) framework.<n>Co-Erasing describes the concept jointly by text prompts and the corresponding undesirable images induced by the prompts.<n>We design a text-guided image concept refinement strategy that directs the model to focus on visual features most relevant to the specified text concept.
arXiv Detail & Related papers (2025-05-16T11:25:50Z)
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models [53.937498564603054]
Recent advances in text-to-image diffusion models enable photorealistic image generation, but they also risk producing malicious content, such as NSFW images.<n>To mitigate risk, concept erasure methods are studied to facilitate the model to unlearn specific concepts.<n>We propose TRCE, using a two-stage concept erasure strategy to achieve an effective trade-off between reliable erasure and knowledge preservation.
arXiv Detail & Related papers (2025-03-10T14:37:53Z)
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning. To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient [20.698305103879232]
We propose a novel concept domain correction framework named textbfDoCo (textbfDomaintextbfCorrection)<n>By aligning the output domains of sensitive and anchor concepts through adversarial training, our approach ensures comprehensive unlearning of target concepts.<n>We also introduce a concept-preserving gradient surgery technique that mitigates conflicting gradient components, thereby preserving the model's utility while unlearning specific concepts.
arXiv Detail & Related papers (2024-05-24T07:47:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.