Circumventing Concept Erasure Methods For Text-to-Image Generative
Models
- URL: http://arxiv.org/abs/2308.01508v2
- Date: Sun, 8 Oct 2023 21:51:46 GMT
- Title: Circumventing Concept Erasure Methods For Text-to-Image Generative
Models
- Authors: Minh Pham, Kelly O. Marshall, Niv Cohen, Govind Mittal, Chinmay Hegde
- Abstract summary: Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts.
These models have numerous drawbacks, including their potential to generate images featuring sexually explicit content.
Various methods have been proposed in order to "erase" sensitive concepts from text-to-image models.
- Score: 26.804057000265434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generative models can produce photo-realistic images for an
extremely broad range of concepts, and their usage has proliferated widely
among the general public. On the flip side, these models have numerous
drawbacks, including their potential to generate images featuring sexually
explicit content, mirror artistic styles without permission, or even
hallucinate (or deepfake) the likenesses of celebrities. Consequently, various
methods have been proposed in order to "erase" sensitive concepts from
text-to-image models. In this work, we examine five recently proposed concept
erasure methods, and show that targeted concepts are not fully excised from any
of these methods. Specifically, we leverage the existence of special learned
word embeddings that can retrieve "erased" concepts from the sanitized models
with no alterations to their weights. Our results highlight the brittleness of
post hoc concept erasure methods, and call into question their use in the
algorithmic toolkit for AI safety.
Related papers
- OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes.
We propose OmniPrism, a visual concept disentangling approach for creative image generation.
Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z) - Memories of Forgotten Concepts [16.53173953073833]
We show that erased concept images can be generated using the right latent.
We extend this to demonstrate that for every image from the erased concept set, we can generate many seeds that generate the erased concept.
Given the vast space of latents capable of generating ablated concept images, our results suggest that fully erasing concept information may be intractable.
arXiv Detail & Related papers (2024-12-01T12:12:24Z) - Continuous Concepts Removal in Text-to-image Diffusion Models [27.262721132177845]
Concerns have been raised about the potential for text-to-image models to create content that infringes on copyrights or depicts disturbing subject matter.
We propose a novel approach called CCRT that includes a designed knowledge distillation paradigm.
It constrains the text-image alignment behavior during the continuous concept removal process by using a set of text prompts.
arXiv Detail & Related papers (2024-11-30T20:40:10Z) - Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images.
Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z) - Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning [0.0]
We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning.
Our method can erase a concept within 10 s, making concept erasure more accessible than ever before.
arXiv Detail & Related papers (2024-05-12T14:01:05Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z) - Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z) - Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z) - Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability.
These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos.
We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.