Related papers: Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching

Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching

URL: http://arxiv.org/abs/2509.00336v1
Date: Sat, 30 Aug 2025 03:30:22 GMT
Title: Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching
Authors: An B. Vuong, Michael T. McCann, Javier E. Santos, Yen Ting Lin,
Abstract summary: We present evidence that trained diffusion networks violate both integral and differential constraints required of true score functions.<n>We argue that diffusion training is better understood as flow matching to the velocity field of a Wasserstein Gradient Flow (WGF)<n>Our results advocate for adopting the WGF perspective as a principled, elegant, and theoretically grounded framework for understanding diffusion generative models.
Score: 6.821102133726069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models are commonly interpreted as learning the score function, i.e., the gradient of the log-density of noisy data. However, this assumption implies that the target of learning is a conservative vector field, which is not enforced by the neural network architectures used in practice. We present numerical evidence that trained diffusion networks violate both integral and differential constraints required of true score functions, demonstrating that the learned vector fields are not conservative. Despite this, the models perform remarkably well as generative mechanisms. To explain this apparent paradox, we advocate a new theoretical perspective: diffusion training is better understood as flow matching to the velocity field of a Wasserstein Gradient Flow (WGF), rather than as score learning for a reverse-time stochastic differential equation. Under this view, the "probability flow" arises naturally from the WGF framework, eliminating the need to invoke reverse-time SDE theory and clarifying why generative sampling remains successful even when the neural vector field is not a true score. We further show that non-conservative errors from neural approximation do not necessarily harm density transport. Our results advocate for adopting the WGF perspective as a principled, elegant, and theoretically grounded framework for understanding diffusion generative models.

Related papers

Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation [58.19676004192321]
Diffusion models (DMs), which enable both image generation from noise and inversion from data, have inspired powerful unpaired image-to-image (I2I) translation algorithms. We tackle this problem with Schrodinger Bridges (SBs), which are differential equations (SDEs) between distributions with minimal transport cost. Inspired by this observation, we propose Latent Schrodinger Bridges (LSBs) that approximate the SB ODE via pre-trained Stable Diffusion. We demonstrate that our algorithm successfully conduct competitive I2I translation in unsupervised setting with only a fraction of cost required by previous DM-
arXiv Detail & Related papers (2024-11-22T11:24:14Z)
Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.<n>We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z)
Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory [87.00653989457834]
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. Despite the empirical success, theory of conditional diffusion models is largely missing. This paper bridges the gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models.
arXiv Detail & Related papers (2024-03-18T17:08:24Z)
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z)
On gauge freedom, conservativity and intrinsic dimensionality estimation in diffusion models [13.597551064547503]
Diffusion models are generative models that have recently demonstrated impressive performances in terms of sampling quality and density estimation in high dimensions. In the original formulation of the diffusion model, this vector field is assumed to be the score function. We show that exact density estimation and exact sampling is achieved when the conservative component is exactly equals to the true score.
arXiv Detail & Related papers (2024-02-06T09:41:43Z)
Neural Sinkhorn Gradient Flow [11.4522103360875]
We introduce the Neural Sinkhorn Gradient Flow (NSGF) model, which parametrizes the time-varying velocity field of the Wasserstein gradient flow. Our theoretical analyses show that as the sample size increases to infinity, the mean-field limit of the empirical approximation converges to the true underlying velocity field. To further enhance model efficiency on high-dimensional tasks, a two-phase NSGF++ model is devised.
arXiv Detail & Related papers (2024-01-25T10:44:50Z)
Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling. We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z)
MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows [34.795115757545915]
We introduce a unified generative modeling framework - MonoFlow. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training.
arXiv Detail & Related papers (2023-02-02T13:05:27Z)
Negative Flux Aggregation to Estimate Feature Attributions [15.411534490483495]
There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. To enhance the explainability of DNNs, we estimate the input feature's attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map.
arXiv Detail & Related papers (2023-01-17T16:19:41Z)
Fast Sampling of Diffusion Models via Operator Learning [74.37531458470086]
We use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
arXiv Detail & Related papers (2022-11-24T07:30:27Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
A Variational Perspective on Diffusion-Based Generative Models and Score Matching [8.93483643820767]
We derive a variational framework for likelihood estimation for continuous-time generative diffusion. We show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE.
arXiv Detail & Related papers (2021-06-05T05:50:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.