Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training
- URL: http://arxiv.org/abs/2505.17638v1
- Date: Fri, 23 May 2025 08:58:47 GMT
- Title: Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training
- Authors: Tony Bonnaire, Raphaël Urfin, Giulio Biroli, Marc Mézard,
- Abstract summary: We investigate the role of the training dynamics in the transition from generalization to memorization.<n>We find that $tau_mathrmmem$ increases linearly with the training set size $n$, while $tau_mathrmgen$ remains constant.<n>It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times.
- Score: 8.824077990271503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify two distinct timescales: an early time $\tau_\mathrm{gen}$ at which models begin to generate high-quality samples, and a later time $\tau_\mathrm{mem}$ beyond which memorization emerges. Crucially, we find that $\tau_\mathrm{mem}$ increases linearly with the training set size $n$, while $\tau_\mathrm{gen}$ remains constant. This creates a growing window of training times with $n$ where models generalize effectively, despite showing strong memorization if training continues beyond it. It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times. These findings reveal a form of implicit dynamical regularization in the training dynamics, which allow to avoid memorization even in highly overparameterized settings. Our results are supported by numerical experiments with standard U-Net architectures on realistic and synthetic datasets, and by a theoretical analysis using a tractable random features model studied in the high-dimensional limit.
Related papers
- Dynamics is what you need for time-series forecasting! [6.775534755081169]
We develop an original $textttPRO-DYN$ nomenclature to analyze existing models through the lens of dynamics.<n>We conduct extensive experiments to confirm our observations on a set of performance-varying models with diverse backbones.
arXiv Detail & Related papers (2025-07-21T16:29:29Z) - Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z) - In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Data-Augmented Predictive Deep Neural Network: Enhancing the extrapolation capabilities of non-intrusive surrogate models [0.5735035463793009]
We propose a new deep learning framework, where kernel dynamic mode decomposition (KDMD) is employed to evolve the dynamics of the latent space generated by the encoder part of a convolutional autoencoder (CAE)
After adding the KDMD-decoder-extrapolated data into the original data set, we train the CAE along with a feed-forward deep neural network using the augmented data.
The trained network can predict future states outside the training time interval at any out-of-training parameter samples.
arXiv Detail & Related papers (2024-10-17T09:26:14Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - Learning continuous models for continuous physics [94.42705784823997]
We develop a test based on numerical analysis theory to validate machine learning models for science and engineering applications.
Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
arXiv Detail & Related papers (2022-02-17T07:56:46Z) - Analysis of feature learning in weight-tied autoencoders via the mean
field lens [3.553493344868413]
We analyze a class of two-layer weight-tied nonlinear autoencoders in the mean field framework.
Models trained with gradient descent are shown to admit a mean field limiting dynamics.
Experiments on real-life data demonstrate an interesting match with the theory.
arXiv Detail & Related papers (2021-02-16T18:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.