Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization
- URL: http://arxiv.org/abs/2502.03435v2
- Date: Tue, 06 May 2025 13:17:30 GMT
- Title: Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization
- Authors: Yu-Han Wu, Pierre Marion, Gérard Biau, Claire Boyer,
- Abstract summary: We show that when trained by gradient descent with a large enough learning rate, neural networks cannot converge to a local minimum with small excess risk.<n>Experiments validate the crucial role of the learning rate in preventing memorization, even beyond the one-dimensional setting.
- Score: 11.088273093231324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Denoising score matching plays a pivotal role in the performance of diffusion-based generative models. However, the empirical optimal score--the exact solution to the denoising score matching--leads to memorization, where generated samples replicate the training data. Yet, in practice, only a moderate degree of memorization is observed, even without explicit regularization. In this paper, we investigate this phenomenon by uncovering an implicit regularization mechanism driven by large learning rates. Specifically, we show that in the small-noise regime, the empirical optimal score exhibits high irregularity. We then prove that, when trained by stochastic gradient descent with a large enough learning rate, neural networks cannot stably converge to a local minimum with arbitrarily small excess risk. Consequently, the learned score cannot be arbitrarily close to the empirical optimal score, thereby mitigating memorization. To make the analysis tractable, we consider one-dimensional data and two-layer neural networks. Experiments validate the crucial role of the learning rate in preventing memorization, even beyond the one-dimensional setting.
Related papers
- Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities [11.882528379148141]
We formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem.<n>We derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale.<n>We validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets.
arXiv Detail & Related papers (2026-02-24T11:29:12Z) - Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability [9.133729396364952]
Diffusion-based image generative models produce high-fidelity images through iterative denoising but remain vulnerable to memorization.<n>Recent memorization detection methods are primarily based on the norm of score difference as indicators of memorization.<n>We develop a memorization detection metric by integrating isotropic norm and anisotropic alignment.
arXiv Detail & Related papers (2026-01-28T14:29:42Z) - U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model [12.192022160630165]
This paper explores the outcome of training state-ofthe-art dereverberation models with supervision settings ranging from weakly-supervised to fully unsupervised.<n>Most of the existing deep learning approaches typically require paired dry and reverberant data, which are difficult to obtain in practice.<n>We develop instead a sequential learning strategy motivated by a bayesian formulation of the dereverberation problem, wherein acoustic parameters and dry signals are estimated from reverberant inputs using deep neural networks.
arXiv Detail & Related papers (2025-07-17T12:26:18Z) - Memorization and Regularization in Generative Diffusion Models [5.128303432235475]
Diffusion models have emerged as a powerful framework for generative modeling.<n>The analysis highlights the need for regularization to avoid reproducing the analytically tractable minimizer.<n>Experiments are evaluated in the context of memorization, and directions for future development of regularization are highlighted.
arXiv Detail & Related papers (2025-01-27T05:17:06Z) - Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.<n> diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.<n>We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data.
Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.