Data Unlearning Beyond Uniform Forgetting via Diffusion Time and Frequency Selection
- URL: http://arxiv.org/abs/2510.17917v1
- Date: Mon, 20 Oct 2025 02:00:12 GMT
- Title: Data Unlearning Beyond Uniform Forgetting via Diffusion Time and Frequency Selection
- Authors: Jinseong Park, Mijung Park,
- Abstract summary: Data unlearning aims to remove the influence of specific training samples from a trained model without requiring full retraining.<n>We argue that forgetting occurs disproportionately across time and frequency, depending on the model and scenarios.<n>By selectively focusing on specific time-frequency ranges during training, we achieve samples with higher aesthetic quality and lower noise.
- Score: 5.350009804371616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data unlearning aims to remove the influence of specific training samples from a trained model without requiring full retraining. Unlike concept unlearning, data unlearning in diffusion models remains underexplored and often suffers from quality degradation or incomplete forgetting. To address this, we first observe that most existing methods attempt to unlearn the samples at all diffusion time steps equally, leading to poor-quality generation. We argue that forgetting occurs disproportionately across time and frequency, depending on the model and scenarios. By selectively focusing on specific time-frequency ranges during training, we achieve samples with higher aesthetic quality and lower noise. We validate this improvement by applying our time-frequency selective approach to diverse settings, including gradient-based and preference optimization objectives, as well as both image-level and text-to-image tasks. Finally, to evaluate both deletion and quality of unlearned data samples, we propose a simple normalized version of SSCD. Together, our analysis and methods establish a clearer understanding of the unique challenges in data unlearning for diffusion models, providing practical strategies to improve both evaluation and unlearning performance.
Related papers
- Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning [43.678382510171986]
Diffusion models generate samples through an iterative denoising process, guided by a neural network.<n>We introduce an inverse reinforcement learning framework for learning sampling strategies without retraining the denoiser.<n>We provide experimental evidence that this approach can improve the quality of samples generated by pretrained diffusion models.
arXiv Detail & Related papers (2026-02-09T14:10:44Z) - Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning [35.359482937263145]
We propose a novel Differential-Informed Sample Selection (DISSect) method, which accurately and efficiently discriminates the noisy correspondence for training acceleration.<n>Specifically, we rethink the impact of noisy correspondence on contrastive learning and propose that the differential between the predicted correlation of the current model and that of a historical model is more informative to characterize sample quality.
arXiv Detail & Related papers (2025-07-17T11:13:44Z) - OASIS: Online Sample Selection for Continual Visual Instruction Tuning [55.92362550389058]
In continual instruction tuning (CIT) scenarios, new instruction tuning data continuously arrive in an online streaming manner.<n>Data selection can mitigate this overhead, but existing strategies often rely on pretrained reference models.<n>Recent reference model-free online sample selection methods address this, but typically select a fixed number of samples per batch.
arXiv Detail & Related papers (2025-05-27T20:32:43Z) - Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training [4.760537994346813]
As data distributions grow more complex, training diffusion models to convergence becomes increasingly intensive.
We introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps.
Our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures.
arXiv Detail & Related papers (2024-11-15T07:12:18Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Conditional Variational Diffusion Models [1.8657053208839998]
Inverse problems aim to determine parameters from observations, a crucial task in engineering and science.
We propose a novel approach for learning the variance schedule as part of the training process.
Our method supports probabilistic conditioning on data, provides high-quality solutions, and is flexible, proving able to adapt to different applications with minimum overhead.
arXiv Detail & Related papers (2023-12-04T14:45:56Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.