Related papers: Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Circumventing Concept Erasure Methods For Text-to-Image Generative Models

URL: http://arxiv.org/abs/2308.01508v2
Date: Sun, 8 Oct 2023 21:51:46 GMT
Title: Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Authors: Minh Pham, Kelly O. Marshall, Niv Cohen, Govind Mittal, Chinmay Hegde
Abstract summary: Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts. These models have numerous drawbacks, including their potential to generate images featuring sexually explicit content. Various methods have been proposed in order to "erase" sensitive concepts from text-to-image models.
Score: 26.804057000265434
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likenesses of celebrities. Consequently, various methods have been proposed in order to "erase" sensitive concepts from text-to-image models. In this work, we examine five recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we leverage the existence of special learned word embeddings that can retrieve "erased" concepts from the sanitized models with no alterations to their weights. Our results highlight the brittleness of post hoc concept erasure methods, and call into question their use in the algorithmic toolkit for AI safety.

Related papers

Concept Corrector: Erase concepts on the fly for text-to-image diffusion models [13.065682925662237]
Concept erasure aims to erase any undesired concepts that the models can generate. We propose Concept Corrector, which checks target concepts based on visual features provided by final generated images predicted at certain time steps. In the whole pipeline, our method changes no model parameters and only requires a given target concept as well as the corresponding replacement content.
arXiv Detail & Related papers (2025-02-22T21:53:43Z)
OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes. We propose OmniPrism, a visual concept disentangling approach for creative image generation. Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z)
Continuous Concepts Removal in Text-to-image Diffusion Models [27.262721132177845]
Concerns have been raised about the potential for text-to-image models to create content that infringes on copyrights or depicts disturbing subject matter. We propose a novel approach called CCRT that includes a designed knowledge distillation paradigm. It constrains the text-image alignment behavior during the continuous concept removal process by using a set of text prompts.
arXiv Detail & Related papers (2024-11-30T20:40:10Z)
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images. Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z)
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning [10.201633236997104]
Large-scale text-to-image diffusion models have demonstrated impressive image-generation capabilities. We present ConceptPrune, wherein we first identify critical regions within pre-trained models responsible for generating undesirable concepts. Experiments across a range of concepts including artistic styles, nudity, object erasure, and gender debiasing demonstrate that target concepts can be efficiently erased by pruning a tiny fraction.
arXiv Detail & Related papers (2024-05-29T16:19:37Z)
Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning [0.0]
We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning. Our method can erase a concept within 10 s, making concept erasure more accessible than ever before.
arXiv Detail & Related papers (2024-05-12T14:01:05Z)
Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images. Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept. We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z)
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers [24.64639078273091]
Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. We propose Reliable Concept Erasing via Lightweight Erasers (Receler)
arXiv Detail & Related papers (2023-11-29T15:19:49Z)
Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)
Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts. In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module. Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z)
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z)
Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.