Related papers: Convergence of flow-based generative models via proximal gradient descent in Wasserstein space

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space

URL: http://arxiv.org/abs/2310.17582v3
Date: Wed, 3 Jul 2024 20:05:43 GMT
Title: Convergence of flow-based generative models via proximal gradient descent in Wasserstein space
Authors: Xiuyuan Cheng, Jianfeng Lu, Yixin Tan, Yao Xie,
Abstract summary: Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood. We provide a theoretical guarantee of generating data distribution by a progressive flow model.
Score: 20.771897445580723
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon^2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.

Related papers

Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients [0.0]
Score-based Generative Models (SGMs) approximate a data distribution by perturbing it with Gaussian noise and subsequently denoising it via a learned diffusion process.<n>We establish the first non-asymotic Wasserstein-2 convergence guarantees for SGMs targeting semi-one order with potentially discontinuous gradients.
arXiv Detail & Related papers (2025-05-06T11:17:15Z)
Gaussian Mixture Flow Matching Models [51.976452482535954]
Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models re parameterize the Gaussian mean as flow velocity. They underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG) We introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality.
arXiv Detail & Related papers (2025-04-07T17:59:42Z)
Advancing Wasserstein Convergence Analysis of Score-Based Models: Insights from Discretization and Second-Order Acceleration [5.548787731232499]
We focus on the Wasserstein convergence analysis of score-based diffusion models. We compare various discretization schemes, including Euler discretization, exponential midpoint and randomization methods. We propose an accelerated sampler based on the local linearization method.
arXiv Detail & Related papers (2025-02-07T11:37:51Z)
Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence [54.580605276017096]
Diffusion models have emerged as a powerful tool for image generation and denoising. Recently, Liu et al. designed a novel alternative generative model Rectified Flow (RF) RF aims to learn straight flow trajectories from noise to data using a sequence of convex optimization problems.
arXiv Detail & Related papers (2024-10-19T02:36:11Z)
O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions [6.76974373198208]
We establish a fast convergence theory for the denoising diffusion probabilistic model (DDPM) under minimal assumptions. We show that the convergence rate improves to $O(k/T)$, where $k$ is the intrinsic dimension of the target data distribution. This highlights the ability of DDPM to automatically adapt to unknown low-dimensional structures.
arXiv Detail & Related papers (2024-09-27T17:59:10Z)
A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler. We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance. Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z)
Generative Modeling by Minimizing the Wasserstein-2 Loss [1.2277343096128712]
This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W$ loss) through a distribution-dependent ordinary differential equation (ODE) A main result shows that the time-marginal laws of the ODE form a gradient flow for the $W$ loss, which converges exponentially to the true data distribution. An algorithm is designed by following the scheme and applying persistent training, which naturally fits our gradient-flow approach.
arXiv Detail & Related papers (2024-06-19T15:15:00Z)
Closed-form Filtering for Non-linear Systems [83.91296397912218]
We propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency. We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models. Our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities.
arXiv Detail & Related papers (2024-02-15T08:51:49Z)
Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z)
Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain. We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
Wasserstein Distributional Learning [5.830831796910439]
Wasserstein Distributional Learning (WDL) is a flexible density-on-scalar regression modeling framework. We show that WDL better characterizes and uncovers the nonlinear dependence of the conditional densities. We demonstrate the effectiveness of the WDL framework through simulations and real-world applications.
arXiv Detail & Related papers (2022-09-12T02:32:17Z)
Discrete Denoising Flows [87.44537620217673]
We introduce a new discrete flow-based model for categorical random variables: Discrete Denoising Flows (DDFs) In contrast with other discrete flow-based models, our model can be locally trained without introducing gradient bias. We show that DDFs outperform Discrete Flows on modeling a toy example, binary MNIST and Cityscapes segmentation maps, measured in log-likelihood.
arXiv Detail & Related papers (2021-07-24T14:47:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.