Towards Irreversible Machine Unlearning for Diffusion Models
- URL: http://arxiv.org/abs/2512.03564v1
- Date: Wed, 03 Dec 2025 08:37:33 GMT
- Title: Towards Irreversible Machine Unlearning for Diffusion Models
- Authors: Xun Yuan, Zilong Zhao, Jiayu Li, Aryan Pasikhani, Prosanta Gope, Biplab Sikdar,
- Abstract summary: Diffusion Model Relearning Attack (DiMRA) can reverse the finetuning-based machine unlearning methods.<n>DiMUM memorizes alternative data or features to replace targeted unlearning data or features.<n>We extensively evaluate DiMUM, demonstrating its superior ability to preserve the generative performance of diffusion models.
- Score: 9.662143775645532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models are renowned for their state-of-the-art performance in generating synthetic images. However, concerns related to safety, privacy, and copyright highlight the need for machine unlearning, which can make diffusion models forget specific training data and prevent the generation of sensitive or unwanted content. Current machine unlearning methods for diffusion models are primarily designed for conditional diffusion models and focus on unlearning specific data classes or features. Among these methods, finetuning-based machine unlearning methods are recognized for their efficiency and effectiveness, which update the parameters of pre-trained diffusion models by minimizing carefully designed loss functions. However, in this paper, we propose a novel attack named Diffusion Model Relearning Attack (DiMRA), which can reverse the finetuning-based machine unlearning methods, posing a significant vulnerability of this kind of technique. Without prior knowledge of the unlearning elements, DiMRA optimizes the unlearned diffusion model on an auxiliary dataset to reverse the unlearning, enabling the model to regenerate previously unlearned elements. To mitigate this vulnerability, we propose a novel machine unlearning method for diffusion models, termed as Diffusion Model Unlearning by Memorization (DiMUM). Unlike traditional methods that focus on forgetting, DiMUM memorizes alternative data or features to replace targeted unlearning data or features in order to prevent generating such elements. In our experiments, we demonstrate the effectiveness of DiMRA in reversing state-of-the-art finetuning-based machine unlearning methods for diffusion models, highlighting the need for more robust solutions. We extensively evaluate DiMUM, demonstrating its superior ability to preserve the generative performance of diffusion models while enhancing robustness against DiMRA.
Related papers
- Sparsity-Aware Unlearning for Large Language Models [20.699929903336113]
Large Language Models (LLMs) inevitably memorize sensitive information during training, posing significant privacy risks.<n>Machine unlearning has emerged as a promising solution to selectively remove such information without full retraining.<n>We find that unlearning effectiveness degrades substantially on sparse models.<n>We propose Sparsity-Aware Unlearning (SAU), which decouples unlearning from sparsification objectives through gradient masking.
arXiv Detail & Related papers (2026-01-31T07:45:30Z) - DiffEM: Learning from Corrupted Data with Diffusion Models via Expectation Maximization [42.67146690499833]
We propose a new method for training diffusion models with Expectation-Maximization (EM) from corrupted data.<n>Our proposed method, DiffEM, utilizes conditional diffusion models to reconstruct clean data from observations in the E-step, and then uses the reconstructed data to refine the conditional diffusion model in the M-step.
arXiv Detail & Related papers (2025-10-14T16:25:02Z) - Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models under Data Constraints [9.885531514020437]
We propose a machine unlearning methodology that can prevent the generation of outputs containing undesired features from a pre-trained diffusion model.<n>Our approach is inspired by the variational inference framework with the objective of minimizing a loss function consisting of two terms: plasticity inducer and stability regularizer.<n>We validate the effectiveness of our method through comprehensive experiments for both class unlearning and feature unlearning.
arXiv Detail & Related papers (2025-10-05T06:39:30Z) - A Novel Cloud-Based Diffusion-Guided Hybrid Model for High-Accuracy Accident Detection in Intelligent Transportation Systems [21.082870778158313]
We present a novel hybrid model integrating guidance classification with diffusion techniques.<n>Our implementation is cloud-based, enabling scalable and efficient processing.<n>The proposed diffusion model performs best in image-based accident detection with an accuracy of 97.32%.
arXiv Detail & Related papers (2025-10-04T05:02:15Z) - Model Integrity when Unlearning with T2I Diffusion Models [11.321968363411145]
We propose approximate Machine Unlearning algorithms to reduce the generation of specific types of images, characterized by samples from a forget distribution''
We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.
arXiv Detail & Related papers (2024-11-04T13:15:28Z) - Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models [63.43422118066493]
Machine unlearning (MU) is a crucial foundation for developing safe, secure, and trustworthy GenAI models.<n>Traditional MU methods often rely on stringent assumptions and require access to real data.<n>This paper introduces Score Forgetting Distillation (SFD), an innovative MU approach that promotes the forgetting of undesirable information in diffusion models.
arXiv Detail & Related papers (2024-09-17T14:12:50Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Erasing Undesirable Influence in Diffusion Models [51.225365010401006]
Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content.
In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten.
arXiv Detail & Related papers (2024-01-11T09:30:36Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.