Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
- URL: http://arxiv.org/abs/2410.12777v1
- Date: Wed, 16 Oct 2024 17:51:25 GMT
- Title: Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
- Authors: Hongcheng Gao, Tianyu Pang, Chao Du, Taihang Hu, Zhijie Deng, Min Lin,
- Abstract summary: We propose a framework for meta-unlearning pretrained diffusion models (DMs)
Our framework is compatible with most existing unlearning methods, requiring only the addition of an easy-to-implement meta objective.
- Score: 34.74792073509646
- License:
- Abstract: With the rapid progress of diffusion-based content generation, significant efforts are being made to unlearn harmful or copyrighted concepts from pretrained diffusion models (DMs) to prevent potential model misuse. However, it is observed that even when DMs are properly unlearned before release, malicious finetuning can compromise this process, causing DMs to relearn the unlearned concepts. This occurs partly because certain benign concepts (e.g., "skin") retained in DMs are related to the unlearned ones (e.g., "nudity"), facilitating their relearning via finetuning. To address this, we propose meta-unlearning on DMs. Intuitively, a meta-unlearned DM should behave like an unlearned DM when used as is; moreover, if the meta-unlearned DM undergoes malicious finetuning on unlearned concepts, the related benign concepts retained within it will be triggered to self-destruct, hindering the relearning of unlearned concepts. Our meta-unlearning framework is compatible with most existing unlearning methods, requiring only the addition of an easy-to-implement meta objective. We validate our approach through empirical experiments on meta-unlearning concepts from Stable Diffusion models (SD-v1-4 and SDXL), supported by extensive ablation studies. Our code is available at https://github.com/sail-sg/Meta-Unlearning.
Related papers
- Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models [7.9993879763024065]
We show that the objective functions used for unlearning in the existing methods lead to decoupling of the targeted concepts for the corresponding prompts.
The ineffectiveness of current methods stems primarily from their narrow focus on reducing generation probabilities for specific prompt sets.
We introduce two new evaluation metrics: Concept Retrieval Score (CRS) and Concept Confidence Score (CCS)
arXiv Detail & Related papers (2024-09-09T14:38:31Z) - UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI [50.61495097098296]
We revisit the paradigm in which unlearning is used for Large Language Models (LLMs)
We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context.
We argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation.
arXiv Detail & Related papers (2024-06-27T10:24:35Z) - Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models [42.734578139757886]
Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks.
The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks.
This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning.
arXiv Detail & Related papers (2024-05-24T05:47:23Z) - UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models [31.48739583108113]
diffusion models (DMs) have demonstrated unprecedented capabilities in text-to-image generation and are widely used in diverse applications.
They have also raised significant societal concerns, such as the generation of harmful content and copyright disputes.
Machine unlearning (MU) has emerged as a promising solution, capable of removing undesired generative capabilities from DMs.
arXiv Detail & Related papers (2024-02-19T05:25:53Z) - One-Dimensional Adapter to Rule Them All: Concepts, Diffusion Models and
Erasing Applications [65.66700972754118]
Existing concept erasing methods in academia are all based on full parameter or specification-based fine-tuning.
Previous model-specific erasure impedes the flexible combination of concepts and the training-free transfer towards other models.
We ground our erasing framework on one-dimensional adapters to erase multiple concepts from most DMs at once across versatile erasing applications.
arXiv Detail & Related papers (2023-12-26T18:08:48Z) - To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now [22.75295925610285]
diffusion models (DMs) have revolutionized the generation of realistic and complex images.
DMs also introduce potential safety hazards, such as producing harmful content and infringing data copyrights.
Despite the development of safety-driven unlearning techniques, doubts about their efficacy persist.
arXiv Detail & Related papers (2023-10-18T10:36:34Z) - VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models [69.20464255450788]
Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising.
Recent studies have shown that basic unconditional DMs are vulnerable to backdoor injection.
This paper presents a unified backdoor attack framework to expand the current scope of backdoor analysis for DMs.
arXiv Detail & Related papers (2023-06-12T05:14:13Z) - CMVAE: Causal Meta VAE for Unsupervised Meta-Learning [3.0839245814393728]
Unsupervised meta-learning aims to learn the meta knowledge from unlabeled data and rapidly adapt to novel tasks.
Existing approaches may be misled by the context-bias from the training data.
We propose Causal Meta VAE (CMVAE) that encodes the priors into latent codes in the causal space and learns their relationships simultaneously to achieve the downstream few-shot image classification task.
arXiv Detail & Related papers (2023-02-20T02:49:35Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - On Fast Adversarial Robustness Adaptation in Model-Agnostic
Meta-Learning [100.14809391594109]
Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning.
Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning.
We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning.
arXiv Detail & Related papers (2021-02-20T22:03:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.