On Kinetic Optimal Probability Paths for Generative Models
- URL: http://arxiv.org/abs/2306.06626v1
- Date: Sun, 11 Jun 2023 08:54:12 GMT
- Title: On Kinetic Optimal Probability Paths for Generative Models
- Authors: Neta Shaul, Ricky T. Q. Chen, Maximilian Nickel, Matt Le, Yaron Lipman
- Abstract summary: Recent successful generative models are trained by fitting a neural network to an a-priori defined tractable probability density path taking noise to training examples.
In this paper we investigate the space of Gaussian probability paths, which includes diffusion paths as an instance, and look for an optimal member in some useful sense.
- Score: 42.12806492124782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent successful generative models are trained by fitting a neural network
to an a-priori defined tractable probability density path taking noise to
training examples. In this paper we investigate the space of Gaussian
probability paths, which includes diffusion paths as an instance, and look for
an optimal member in some useful sense. In particular, minimizing the Kinetic
Energy (KE) of a path is known to make particles' trajectories simple, hence
easier to sample, and empirically improve performance in terms of likelihood of
unseen data and sample generation quality. We investigate Kinetic Optimal (KO)
Gaussian paths and offer the following observations: (i) We show the KE takes a
simplified form on the space of Gaussian paths, where the data is incorporated
only through a single, one dimensional scalar function, called the \emph{data
separation function}. (ii) We characterize the KO solutions with a one
dimensional ODE. (iii) We approximate data-dependent KO paths by approximating
the data separation function and minimizing the KE. (iv) We prove that the data
separation function converges to $1$ in the general case of arbitrary
normalized dataset consisting of $n$ samples in $d$ dimension as
$n/\sqrt{d}\rightarrow 0$. A consequence of this result is that the Conditional
Optimal Transport (Cond-OT) path becomes \emph{kinetic optimal} as
$n/\sqrt{d}\rightarrow 0$. We further support this theory with empirical
experiments on ImageNet.
Related papers
- 2-Rectifications are Enough for Straight Flows: A Theoretical Insight into Wasserstein Convergence [54.580605276017096]
We provide the first theoretical analysis of the Wasserstein distance between the sampling distribution of Rectified Flow and the target distribution.
We show that for a rectified flow from a Gaussian to any general target distribution with finite first moment, two rectifications are sufficient to achieve a straight flow.
arXiv Detail & Related papers (2024-10-19T02:36:11Z) - A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler.
We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance.
Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Achieving the Asymptotically Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach [36.88301225561535]
offline reinforcement learning aims to learn from pre-collected datasets without active exploration.
Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively.
We show that the distributionally robust optimization (DRO) based approach can also address these challenges and is asymptotically minimax optimal
arXiv Detail & Related papers (2023-05-22T17:50:18Z) - Multisample Flow Matching: Straightening Flows with Minibatch Couplings [38.82598694134521]
Simulation-free methods for training continuous-time generative models construct probability paths that go between noise distributions and individual data samples.
We propose Multisample Flow Matching, a more general framework that uses non-trivial couplings between data and noise samples.
We show that our proposed methods improve sample consistency on downsampled ImageNet data sets, and lead to better low-cost sample generation.
arXiv Detail & Related papers (2023-04-28T11:33:08Z) - A Statistical Learning View of Simple Kriging [0.0]
We analyze the simple Kriging task from a statistical learning perspective.
The goal is to predict the unknown values it takes at any other location with minimum quadratic risk.
We prove non-asymptotic bounds of order $O_mathbbP (1/sqrtn)$ for the excess risk of a plug-in predictive rule mimicking the true minimizer.
arXiv Detail & Related papers (2022-02-15T12:46:43Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.