Related papers: Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models

URL: http://arxiv.org/abs/2410.08074v2
Date: Mon, 10 Feb 2025 23:21:18 GMT
Title: Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
Authors: Vinith M. Suriyakumar, Rohan Alur, Ayush Sekhari, Manish Raghavan, Ashia C. Wilson,
Abstract summary: We show that fine-tuning a text-to-image diffusion model on seemingly unrelated images can cause it to "relearn" concepts that were previously "unlearned"<n>Our findings underscore the fragility of composing incremental model updates, and raise serious new concerns about current approaches to ensuring the safety and alignment of text-to-image diffusion models.
Score: 13.479224197351673
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image diffusion models rely on massive, web-scale datasets. Training them from scratch is computationally expensive, and as a result, developers often prefer to make incremental updates to existing models. These updates often compose fine-tuning steps (to learn new concepts or improve model performance) with "unlearning" steps (to "forget" existing concepts, such as copyrighted works or explicit content). In this work, we demonstrate a critical and previously unknown vulnerability that arises in this paradigm: even under benign, non-adversarial conditions, fine-tuning a text-to-image diffusion model on seemingly unrelated images can cause it to "relearn" concepts that were previously "unlearned." We comprehensively investigate the causes and scope of this phenomenon, which we term concept resurgence, by performing a series of experiments which compose "concept unlearning" with subsequent fine-tuning of Stable Diffusion v1.4 and Stable Diffusion v2.1. Our findings underscore the fragility of composing incremental model updates, and raise serious new concerns about current approaches to ensuring the safety and alignment of text-to-image diffusion models.

Related papers

Concept Unlearning by Modeling Key Steps of Diffusion Process [16.902624082013872]
Text-to-image diffusion models (T2I DMs) generate highly realistic images based on textual input.<n>We propose the Key Step Concept Unlearning (KSCU) method to balance unlearning effectiveness with generative retainability.<n>We demonstrate that KSCU effectively prevents T2I DMs from generating undesirable images while better retaining the model's generative capabilities.
arXiv Detail & Related papers (2025-07-09T03:55:58Z)
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization [20.783312940122297]
Text-to-image (T2I) diffusion models have achieved remarkable success in generating high-quality images from textual prompts. However, their ability to store vast amounts of knowledge raises concerns in scenarios where selective forgetting is necessary. We propose textbfDynamic Mask coupled with Concept-Aware Loss, a novel unlearning framework designed for multi-concept forgetting.
arXiv Detail & Related papers (2025-04-12T01:38:58Z)
Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z)
Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion [50.26583654615212]
Lifelong few-shot customization for text-to-image diffusion aims to continually generalize existing models for new tasks with minimal data. In this study, we identify and categorize the catastrophic forgetting problems into two folds: relevant concepts forgetting and previous concepts forgetting. Unlike existing methods that rely on additional real data or offline replay of original concept data, our approach enables on-the-fly knowledge distillation to retain the previous concepts while learning new ones.
arXiv Detail & Related papers (2024-11-08T12:58:48Z)
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts. concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z)
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient [20.091446060893638]
This paper proposes a concept domain correction framework for unlearning concepts in diffusion models. By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results.
arXiv Detail & Related papers (2024-05-24T07:47:36Z)
Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective [20.263233740360022]
Unlearning methods have been developed to erase concepts from diffusion models. This paper aims to leverage the transferability of the adversarial attack to probe the unlearning robustness under a black-box setting. Specifically, we employ an adversarial search strategy to search for the adversarial embedding which can transfer across different unlearned models.
arXiv Detail & Related papers (2024-04-30T09:14:54Z)
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention [62.671435607043875]
Research indicates that text-to-image diffusion models replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks. We reveal that during memorization, the cross-attention tends to focus disproportionately on the embeddings of specific tokens. We introduce an innovative approach to detect and mitigate memorization in diffusion models.
arXiv Detail & Related papers (2024-03-17T01:27:00Z)
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models [23.786473791344395]
Cross-attention layers in diffusion models tend to disproportionately focus on certain tokens during the generation process. We introduce attention regulation, an on-the-fly optimization approach at inference time to align attention maps with the input text prompt. Experiment results show that our method consistently outperforms other baselines.
arXiv Detail & Related papers (2024-03-11T02:18:27Z)
Semantic Guidance Tuning for Text-To-Image Diffusion Models [3.3881449308956726]
We propose a training-free approach that modulates the guidance direction of diffusion models during inference. We first decompose the prompt semantics into a set of concepts, and monitor the guidance trajectory in relation to each concept. Based on this observation, we devise a technique to steer the guidance direction towards any concept from which the model diverges.
arXiv Detail & Related papers (2023-12-26T09:02:17Z)
Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey [96.99328714941657]
We present a comprehensive review of recent diffusion model-based methods on image restoration. We classify and emphasize the innovative designs using diffusion models for both IR and blind/real-world IR. We propose five potential and challenging directions for the future research of diffusion model-based IR.
arXiv Detail & Related papers (2023-08-18T08:40:38Z)
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z)
Understanding and Mitigating Copying in Diffusion Models [53.03978584040557]
Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user.
arXiv Detail & Related papers (2023-05-31T17:58:02Z)
Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.