Related papers: A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models

A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2502.14896v1
Date: Mon, 17 Feb 2025 20:51:20 GMT
Title: A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models
Authors: Changhoon Kim, Yanjun Qi,
Abstract summary: Text-to-Image (T2I) models have made remarkable progress in generating high-quality, diverse visual content from natural language prompts.<n>Their ability to reproduce copyrighted styles, sensitive imagery, and harmful content raises significant ethical and legal concerns.<n> Concept erasure offers a proactive alternative to external filtering by modifying T2I models to prevent the generation of undesired content.
Score: 14.325284311928492
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-Image (T2I) models have made remarkable progress in generating high-quality, diverse visual content from natural language prompts. However, their ability to reproduce copyrighted styles, sensitive imagery, and harmful content raises significant ethical and legal concerns. Concept erasure offers a proactive alternative to external filtering by modifying T2I models to prevent the generation of undesired content. In this survey, we provide a structured overview of concept erasure, categorizing existing methods based on their optimization strategies and the architectural components they modify. We categorize concept erasure methods into fine-tuning for parameter updates, closed-form solutions for efficient edits, and inference-time interventions for content restriction without weight modification. Additionally, we explore adversarial attacks that bypass erasure techniques and discuss emerging defenses. To support further research, we consolidate key datasets, evaluation metrics, and benchmarks for assessing erasure effectiveness and model robustness. This survey serves as a comprehensive resource, offering insights into the evolving landscape of concept erasure, its challenges, and future directions.

Related papers

Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression [10.950528923845955]
Un uncontrolled reproduction of sensitive, copyrighted, or harmful imagery poses serious ethical, legal, and safety challenges.<n>The concept erasure paradigm has emerged as a promising direction, enabling the selective removal of specific semantic concepts from generative models.<n>This survey aims to guide researchers toward safer, more ethically aligned generative models.
arXiv Detail & Related papers (2025-05-26T01:24:34Z)
Erased or Dormant? Rethinking Concept Erasure Through Reversibility [8.454050090398713]
We evaluate two representative concept erasure methods, Unified Concept Editing and Erased Stable Diffusion.<n>We show that erased concepts often reemerge with substantial visual fidelity after minimal adaptation.<n>Our findings reveal critical limitations in existing concept erasure approaches.
arXiv Detail & Related papers (2025-05-22T03:26:46Z)
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models.<n>It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z)
EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques [20.2544260436998]
Concept erasure techniques can remove unwanted concepts from text-to-image models.<n>We systematically investigate the failure modes of current concept erasure techniques.<n>We introduce EraseBENCH, a benchmark designed to assess concept erasure methods with greater depth.<n>Our findings reveal that even state-of-the-art techniques struggle with maintaining quality post-erasure, indicating that these approaches are not yet ready for real-world deployment.
arXiv Detail & Related papers (2025-01-16T20:42:17Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images. Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z)
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts. concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z)
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities. We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z)
Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models. The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure. Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z)
Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.