FuguReport

Unlearning in Diffusion Models: A Unified Framework with KL Divergence and Likelihood Constraints

Authors Shervin Khalafi, Alejandro Ribeiro, Dongsheng Ding
Affiliations University of Pennsylvania / The University of Tennessee
Categories Method / Model Unlearning / Unlearning in diffusion models, Theory / Statistical Constraints / KL divergence constraint formulation, Evaluation / Performance Comparison / Effectiveness of likelihood-based unlearning
License CC BY-SA 4.0

Abstract Overview

This paper formulates diffusion-model unlearning as a constrained optimization problem that explicitly balances retaining a pretrained model against separating from data or concepts to be forgotten. It studies three formulations: reverse KL-constrained unlearning for concept removal, forward KL-constrained unlearning for data removal, and a likelihood-constrained formulation that directly limits the likelihood of unlearning distributions. The authors show strong duality for all three problems, including the nonconvex KL-constrained cases, and derive explicit optimal target distributions together with primal-dual algorithms for diffusion models. Experiments on Gaussian mixtures, Stable Diffusion concept unlearning, and DDPM-based data unlearning evaluate the resulting retention-unlearning tradeoffs.

Novelty

The main novelty is a unified constrained optimization framework for diffusion-model unlearning that covers both concept and data unlearning through reverse KL, forward KL, and likelihood constraints. The paper also claims a novel likelihood-based formulation and proves strong duality for these unlearning problems, enabling explicit target characterizations and principled primal-dual optimization.

Results

Across concept and data unlearning experiments, the constrained methods achieve better retention-unlearning tradeoffs than unconstrained or equal-weight baselines. In particular, the KL-constrained methods obtain similar unlearning with less deviation from the pretrained model, while the likelihood-constrained method matches unlearning effectiveness but better preserves retained concepts.

Key Points

  1. The framework defines unlearning as minimizing deviation from a pretrained diffusion model subject to explicit separation constraints from unwanted concept or data distributions.
  2. The paper derives closed-form target distributions for reverse-KL, forward-KL, and likelihood-constrained objectives and uses strong duality to justify primal-dual training algorithms.
  3. Empirical studies on Gaussian mixtures, Stable Diffusion concept unlearning, and CelebA-HQ sample removal show improved retention relative to baselines at comparable levels of unlearning.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.