Related papers: Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

URL: http://arxiv.org/abs/2501.18950v2
Date: Thu, 27 Feb 2025 23:36:38 GMT
Title: Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
Authors: Anh Bui, Trang Vu, Long Vuong, Trung Le, Paul Montague, Tamas Abraham, Junae Kim, Dinh Phung,
Abstract summary: Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models.<n>We propose the Adaptive Guided Erasure (AGE) method, which emphdynamically selects optimal target concepts tailored to each undesirable concept.<n>Results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance.
Score: 21.386640828092524
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {https://github.com/tuananhbui89/Adaptive-Guided-Erasure}.

Related papers

SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing [65.82241040239452]
Concept erasing finetunes weights to unlearn undesirable concepts.<n>Existing methods treat unsafe concept as a fixed word and repeatedly erase it.<n>We introduce semantic-augment erasing which transforms concept word erasure into concept domain erasure.
arXiv Detail & Related papers (2025-06-11T03:21:24Z)
Erased or Dormant? Rethinking Concept Erasure Through Reversibility [8.454050090398713]
We evaluate two representative concept erasure methods, Unified Concept Editing and Erased Stable Diffusion.<n>We show that erased concepts often reemerge with substantial visual fidelity after minimal adaptation.<n>Our findings reveal critical limitations in existing concept erasure approaches.
arXiv Detail & Related papers (2025-05-22T03:26:46Z)
ACE: Attentional Concept Erasure in Diffusion Models [0.0]
Attentional Concept Erasure integrates a closed-form attention manipulation with lightweight fine-tuning. We show that ACE achieves state-of-the-art concept removal efficacy and robustness. Compared to prior methods, ACE better balances generality (erasing concept and related terms) and specificity (preserving unrelated content)
arXiv Detail & Related papers (2025-04-16T08:16:28Z)
Fundamental Limits of Perfect Concept Erasure [41.82150352631872]
Concept erasure is useful in several applications, such as removing sensitive concepts to achieve fairness and interpreting the impact of specific concepts on a model's performance. Previous concept erasure techniques have prioritized robustly erasing concepts over retaining the utility of the resultant representations. We show that our approach outperforms existing methods on a range of synthetic and real-world datasets using GPT-4 representations.
arXiv Detail & Related papers (2025-03-25T22:36:10Z)
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models. It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z)
Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion [56.35484513848296]
This research introduces continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models. We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts.
arXiv Detail & Related papers (2025-03-17T23:17:16Z)
Concept Corrector: Erase concepts on the fly for text-to-image diffusion models [13.065682925662237]
Concept erasure aims to erase any undesired concepts that the models can generate. We propose Concept Corrector, which checks target concepts based on visual features provided by final generated images predicted at certain time steps. In the whole pipeline, our method changes no model parameters and only requires a given target concept as well as the corresponding replacement content.
arXiv Detail & Related papers (2025-02-22T21:53:43Z)
EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques [20.2544260436998]
Concept erasure techniques can remove unwanted concepts from text-to-image models.<n>We systematically investigate the failure modes of current concept erasure techniques.<n>We introduce EraseBENCH, a benchmark designed to assess concept erasure methods with greater depth.<n>Our findings reveal that even state-of-the-art techniques struggle with maintaining quality post-erasure, indicating that these approaches are not yet ready for real-world deployment.
arXiv Detail & Related papers (2025-01-16T20:42:17Z)
ACE: Anti-Editing Concept Erasure in Text-to-Image Models [73.00930293474009]
Existing concept erasure methods achieve superior results in preventing the production of erased concept from prompts. We propose an Anti-Editing Concept Erasure (ACE) method, which not only erases the target concept during generation but also filters out it during editing.
arXiv Detail & Related papers (2025-01-03T04:57:27Z)
RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining [25.769144703607214]
Concept erasure has been proposed to remove the model's knowledge about protected and inappropriate concepts. We propose RealEra to address this "concept residue" issue. We show that RealEra outperforms previous concept erasing methods in terms of superior erasing efficacy, specificity, and generality.
arXiv Detail & Related papers (2024-10-11T17:55:30Z)
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning. To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient [20.698305103879232]
We propose a novel concept domain correction framework named textbfDoCo (textbfDomaintextbfCorrection)<n>By aligning the output domains of sensitive and anchor concepts through adversarial training, our approach ensures comprehensive unlearning of target concepts.<n>We also introduce a concept-preserving gradient surgery technique that mitigates conflicting gradient components, thereby preserving the model's utility while unlearning specific concepts.
arXiv Detail & Related papers (2024-05-24T07:47:36Z)
Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models. The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure. Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z)
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers [24.64639078273091]
Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. We propose Reliable Concept Erasing via Lightweight Erasers (Receler)
arXiv Detail & Related papers (2023-11-29T15:19:49Z)
Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.