MACE: Mass Concept Erasure in Diffusion Models
- URL: http://arxiv.org/abs/2403.06135v1
- Date: Sun, 10 Mar 2024 08:50:56 GMT
- Title: MACE: Mass Concept Erasure in Diffusion Models
- Authors: Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, Adams Wai-Kin Kong
- Abstract summary: We introduce MACE, a finetuning framework for the task of mass concept erasure.
This task aims to prevent models from generating images that embody unwanted concepts when prompted.
We conduct extensive evaluations of MACE against prior methods across four different tasks.
- Score: 11.12833789743765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid expansion of large-scale text-to-image diffusion models has raised
growing concerns regarding their potential misuse in creating harmful or
misleading content. In this paper, we introduce MACE, a finetuning framework
for the task of mass concept erasure. This task aims to prevent models from
generating images that embody unwanted concepts when prompted. Existing concept
erasure methods are typically restricted to handling fewer than five concepts
simultaneously and struggle to find a balance between erasing concept synonyms
(generality) and maintaining unrelated concepts (specificity). In contrast,
MACE differs by successfully scaling the erasure scope up to 100 concepts and
by achieving an effective balance between generality and specificity. This is
achieved by leveraging closed-form cross-attention refinement along with LoRA
finetuning, collectively eliminating the information of undesirable concepts.
Furthermore, MACE integrates multiple LoRAs without mutual interference. We
conduct extensive evaluations of MACE against prior methods across four
different tasks: object erasure, celebrity erasure, explicit content erasure,
and artistic style erasure. Our results reveal that MACE surpasses prior
methods in all evaluated tasks. Code is available at
https://github.com/Shilin-LU/MACE.
Related papers
- Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them [21.386640828092524]
Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models.
We propose the Adaptive Guided Erasure (AGE) method, which emphdynamically selects optimal target concepts tailored to each undesirable concept.
Results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance.
arXiv Detail & Related papers (2025-01-31T08:17:23Z) - EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques [20.2544260436998]
Concept erasure techniques can remove unwanted concepts from text-to-image models.
We systematically investigate the failure modes of current concept erasure techniques.
We introduce EraseBENCH, a benchmark designed to assess concept erasure methods with greater depth.
Our findings reveal that even state-of-the-art techniques struggle with maintaining quality post-erasure, indicating that these approaches are not yet ready for real-world deployment.
arXiv Detail & Related papers (2025-01-16T20:42:17Z) - ACE: Anti-Editing Concept Erasure in Text-to-Image Models [73.00930293474009]
Existing concept erasure methods achieve superior results in preventing the production of erased concept from prompts.
We propose an Anti-Editing Concept Erasure (ACE) method, which not only erases the target concept during generation but also filters out it during editing.
arXiv Detail & Related papers (2025-01-03T04:57:27Z) - How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM)
It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner.
Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z) - RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining [25.769144703607214]
Concept erasure has been proposed to remove the model's knowledge about protected and inappropriate concepts.
We propose RealEra to address this "concept residue" issue.
We show that RealEra outperforms previous concept erasing methods in terms of superior erasing efficacy, specificity, and generality.
arXiv Detail & Related papers (2024-10-11T17:55:30Z) - STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models [18.64776777593743]
We propose an approach called STEREO that involves two distinct stages.
The first stage searches thoroughly enough for strong and diverse adversarial prompts that can regenerate an erased concept from a CEM.
In the second robustly erase once stage, we introduce an anchor-concept-based compositional objective to robustly erase the target concept at one go.
arXiv Detail & Related papers (2024-08-29T17:29:26Z) - Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning.
To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers.
The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z) - Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models.
The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure.
Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z) - Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z) - LEACE: Perfect linear concept erasure in closed form [103.61624393221447]
Concept erasure aims to remove specified features from a representation.
We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible.
We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network.
arXiv Detail & Related papers (2023-06-06T16:07:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.