Unlearning in Diffusion Models: A Unified Framework with KL Divergence and Likelihood Constraints
Abstract Overview
This paper formulates diffusion-model unlearning as a constrained optimization problem that explicitly balances retaining a pretrained model against separating from data or concepts to be forgotten. It studies three formulations: reverse KL-constrained unlearning for concept removal, forward KL-constrained unlearning for data removal, and a likelihood-constrained formulation that directly limits the likelihood of unlearning distributions. The authors show strong duality for all three problems, including the nonconvex KL-constrained cases, and derive explicit optimal target distributions together with primal-dual algorithms for diffusion models. Experiments on Gaussian mixtures, Stable Diffusion concept unlearning, and DDPM-based data unlearning evaluate the resulting retention-unlearning tradeoffs.
Novelty
The main novelty is a unified constrained optimization framework for diffusion-model unlearning that covers both concept and data unlearning through reverse KL, forward KL, and likelihood constraints. The paper also claims a novel likelihood-based formulation and proves strong duality for these unlearning problems, enabling explicit target characterizations and principled primal-dual optimization.
Results
Across concept and data unlearning experiments, the constrained methods achieve better retention-unlearning tradeoffs than unconstrained or equal-weight baselines. In particular, the KL-constrained methods obtain similar unlearning with less deviation from the pretrained model, while the likelihood-constrained method matches unlearning effectiveness but better preserves retained concepts.
Key Points
- The framework defines unlearning as minimizing deviation from a pretrained diffusion model subject to explicit separation constraints from unwanted concept or data distributions.
- The paper derives closed-form target distributions for reverse-KL, forward-KL, and likelihood-constrained objectives and uses strong duality to justify primal-dual training algorithms.
- Empirical studies on Gaussian mixtures, Stable Diffusion concept unlearning, and CelebA-HQ sample removal show improved retention relative to baselines at comparable levels of unlearning.