When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
- URL: http://arxiv.org/abs/2506.19031v1
- Date: Mon, 23 Jun 2025 18:38:55 GMT
- Title: When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
- Authors: Chen Zeno, Hila Manor, Greg Ongie, Nir Weinberger, Tomer Michaeli, Daniel Soudry,
- Abstract summary: Key question is when probability flow converges to training samples or more general points on the data manifold.<n>We analyze this by studying the probability flow of shallow ReLU neural network denoisers trained with minimal $ell2$ norm.
- Score: 47.818753335400714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold. We analyze this by studying the probability flow of shallow ReLU neural network denoisers trained with minimal $\ell^2$ norm. For intuition, we introduce a simpler score flow and show that for orthogonal datasets, both flows follow similar trajectories, converging to a training point or a sum of training points. However, early stopping by the diffusion time scheduler allows probability flow to reach more general manifold points. This reflects the tendency of diffusion models to both memorize training samples and generate novel points that combine aspects of multiple samples, motivating our study of such behavior in simplified settings. We extend these results to obtuse simplex data and, through simulations in the orthogonal case, confirm that probability flow converges to a training point, a sum of training points, or a manifold point. Moreover, memorization decreases when the number of training samples grows, as fewer samples accumulate near training points.
Related papers
- Generative Modeling with Continuous Flows: Sample Complexity of Flow Matching [60.37045080890305]
We provide the first analysis of the sample complexity for flow-matching based generative models.<n>We decompose the velocity field estimation error into neural-network approximation error, statistical error due to the finite sample size, and optimization error due to the finite number of optimization steps for estimating the velocity field.
arXiv Detail & Related papers (2025-12-01T05:14:25Z) - The Principles of Diffusion Models [81.12042238390075]
Diffusion modeling starts by defining a forward process that gradually corrupts data into noise.<n>The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates.<n>The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution.<n>The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data.
arXiv Detail & Related papers (2025-10-24T02:29:02Z) - Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents [55.43139356528315]
Consistency Models (CMs) are trained to be consistent on flow ordinary differential equation trajectories.<n>CMs typically require prolonged training with large batch sizes to obtain competitive sample quality.<n>We propose a new loss function, called the manifold feature distance (MFD), which provides manifold-aligned tangents that point toward the data manifold.
arXiv Detail & Related papers (2025-10-01T08:35:18Z) - Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities [85.83359661628575]
We propose Progressive Inference-Time Annealing (PITA) to learn diffusion-based samplers.<n>PITA combines two complementary techniques: Annealing of the Boltzmann distribution and Diffusion smoothing.<n>It enables equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates.
arXiv Detail & Related papers (2025-06-19T17:14:22Z) - Resolving Memorization in Empirical Diffusion Model for Manifold Data in High-Dimensional Spaces [5.716752583983991]
When the data distribution consists of n points, empirical diffusion models tend to reproduce existing data points.<n>This work shows that the memorization issue can be solved simply by applying an inertia update at the end of the empirical diffusion simulation.<n>We demonstrate that the distribution of samples from this model approximates the true data distribution on a $C2$ manifold of dimension $d$, within a Wasserstein-1 distance of order $O(n-frac2d+4)$.
arXiv Detail & Related papers (2025-05-05T09:40:41Z) - Neural Flow Samplers with Shortcut Models [19.81513273510523]
Continuous flow-based neural samplers offer a promising approach to generate samples from unnormalized densities.<n>We introduce an improved estimator for these challenging quantities, employing a velocity-driven Sequential Monte Carlo method.<n>Our proposed Neural Flow Shortcut Sampler empirically outperforms existing flow-based neural samplers on both synthetic datasets and complex n-body system targets.
arXiv Detail & Related papers (2025-02-11T07:55:41Z) - No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers [41.867855070932706]
We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant.<n>Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem.<n>We propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow.
arXiv Detail & Related papers (2025-02-10T17:13:11Z) - A solvable generative model with a linear, one-step denoiser [0.0]
We develop an analytically tractable single-step diffusion model based on a linear denoiser.<n>We show that the monotonic fall phase of Kullback-Leibler divergence begins when the training dataset size reaches the dimension of the data points.
arXiv Detail & Related papers (2024-11-26T19:00:01Z) - Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.<n>We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Likelihood-Free Inference with Generative Neural Networks via Scoring
Rule Minimization [0.0]
Inference methods yield posterior approximations for simulator models with intractable likelihood.
Many works trained neural networks to approximate either the intractable likelihood or the posterior directly.
Here, we propose to approximate the posterior with generative networks trained by Scoring Rule minimization.
arXiv Detail & Related papers (2022-05-31T13:32:55Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.