On Kinetic Optimal Probability Paths for Generative Models
- URL: http://arxiv.org/abs/2306.06626v1
- Date: Sun, 11 Jun 2023 08:54:12 GMT
- Title: On Kinetic Optimal Probability Paths for Generative Models
- Authors: Neta Shaul, Ricky T. Q. Chen, Maximilian Nickel, Matt Le, Yaron Lipman
- Abstract summary: Recent successful generative models are trained by fitting a neural network to an a-priori defined tractable probability density path taking noise to training examples.
In this paper we investigate the space of Gaussian probability paths, which includes diffusion paths as an instance, and look for an optimal member in some useful sense.
- Score: 42.12806492124782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent successful generative models are trained by fitting a neural network
to an a-priori defined tractable probability density path taking noise to
training examples. In this paper we investigate the space of Gaussian
probability paths, which includes diffusion paths as an instance, and look for
an optimal member in some useful sense. In particular, minimizing the Kinetic
Energy (KE) of a path is known to make particles' trajectories simple, hence
easier to sample, and empirically improve performance in terms of likelihood of
unseen data and sample generation quality. We investigate Kinetic Optimal (KO)
Gaussian paths and offer the following observations: (i) We show the KE takes a
simplified form on the space of Gaussian paths, where the data is incorporated
only through a single, one dimensional scalar function, called the \emph{data
separation function}. (ii) We characterize the KO solutions with a one
dimensional ODE. (iii) We approximate data-dependent KO paths by approximating
the data separation function and minimizing the KE. (iv) We prove that the data
separation function converges to $1$ in the general case of arbitrary
normalized dataset consisting of $n$ samples in $d$ dimension as
$n/\sqrt{d}\rightarrow 0$. A consequence of this result is that the Conditional
Optimal Transport (Cond-OT) path becomes \emph{kinetic optimal} as
$n/\sqrt{d}\rightarrow 0$. We further support this theory with empirical
experiments on ImageNet.
Related papers
- Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence [54.580605276017096]
Rectified Flow (RF) aims to learn straight flow trajectories from noise to data using a sequence of convex optimization problems.
RF theoretically straightens the trajectory through successive rectifications, reducing the number of evaluations function (NFEs) while sampling.
We provide the first theoretical analysis of the Wasserstein distance between the sampling distribution of RF and the target distribution.
arXiv Detail & Related papers (2024-10-19T02:36:11Z) - A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler.
We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance.
Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Achieving the Asymptotically Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach [36.88301225561535]
offline reinforcement learning aims to learn from pre-collected datasets without active exploration.
Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively.
We show that the distributionally robust optimization (DRO) based approach can also address these challenges and is asymptotically minimax optimal
arXiv Detail & Related papers (2023-05-22T17:50:18Z) - Multisample Flow Matching: Straightening Flows with Minibatch Couplings [38.82598694134521]
Simulation-free methods for training continuous-time generative models construct probability paths that go between noise distributions and individual data samples.
We propose Multisample Flow Matching, a more general framework that uses non-trivial couplings between data and noise samples.
We show that our proposed methods improve sample consistency on downsampled ImageNet data sets, and lead to better low-cost sample generation.
arXiv Detail & Related papers (2023-04-28T11:33:08Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - A Statistical Learning View of Simple Kriging [0.0]
We analyze the simple Kriging task from a statistical learning perspective.
The goal is to predict the unknown values it takes at any other location with minimum quadratic risk.
We prove non-asymptotic bounds of order $O_mathbbP (1/sqrtn)$ for the excess risk of a plug-in predictive rule mimicking the true minimizer.
arXiv Detail & Related papers (2022-02-15T12:46:43Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.