Distribution learning via neural differential equations: minimal energy regularization and approximation theory
- URL: http://arxiv.org/abs/2502.03795v1
- Date: Thu, 06 Feb 2025 05:50:21 GMT
- Title: Distribution learning via neural differential equations: minimal energy regularization and approximation theory
- Authors: Youssef Marzouk, Zhi Ren, Jakob Zech,
- Abstract summary: differential ordinary equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions.
We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line $(1-t)x + t(tTx)$, of the displacement induced by the map.
We show that such velocity fields are minimizers of a training objective containing a specific minimum-energy regularization.
- Score: 1.5771347525430774
- License:
- Abstract: Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line interpolation $(1-t)x + tT(x)$, $t \in [0,1]$, of the displacement induced by the map. Moreover, we show that such velocity fields are minimizers of a training objective containing a specific minimum-energy regularization. We then derive explicit upper bounds for the $C^k$ norm of the velocity field that are polynomial in the $C^k$ norm of the corresponding transport map $T$; in the case of triangular (Knothe--Rosenblatt) maps, we also show that these bounds are polynomial in the $C^k$ norms of the associated source and target densities. Combining these results with stability arguments for distribution approximation via ODEs, we show that Wasserstein or Kullback--Leibler approximation of the target distribution to any desired accuracy $\epsilon > 0$ can be achieved by a deep neural network representation of the velocity field whose size is bounded explicitly in terms of $\epsilon$, the dimension, and the smoothness of the source and target densities. The same neural network ansatz yields guarantees on the value of the regularized training objective.
Related papers
- Relative-Translation Invariant Wasserstein Distance [82.6068808353647]
We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$)
We show that $RW_p distances are also real distance metrics defined on the quotient set $mathcalP_p(mathbbRn)/sim$ invariant to distribution translations.
arXiv Detail & Related papers (2024-09-04T03:41:44Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Normalizing flows as approximations of optimal transport maps via linear-control neural ODEs [49.1574468325115]
We consider the problem of recovering the $Wimat$-optimal transport map T between absolutely continuous measures $mu,nuinmathcalP(mathbbRn)$ as the flow of a linear-control neural ODE.
arXiv Detail & Related papers (2023-11-02T17:17:03Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Matching Normalizing Flows and Probability Paths on Manifolds [57.95251557443005]
Continuous Normalizing Flows (CNFs) are generative models that transform a prior distribution to a model distribution by solving an ordinary differential equation (ODE)
We propose to train CNFs by minimizing probability path divergence (PPD), a novel family of divergences between the probability density path generated by the CNF and a target probability density path.
We show that CNFs learned by minimizing PPD achieve state-of-the-art results in likelihoods and sample quality on existing low-dimensional manifold benchmarks.
arXiv Detail & Related papers (2022-07-11T08:50:19Z) - Near-optimal estimation of smooth transport maps with kernel
sums-of-squares [81.02564078640275]
Under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.
The object of interest for applications such as generative modeling is the underlying optimal transport map.
We propose the first tractable algorithm for which the statistical $L2$ error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation.
arXiv Detail & Related papers (2021-12-03T13:45:36Z) - Finite speed of quantum information in models of interacting bosons at
finite density [0.22843885788439797]
We prove that quantum information propagates with a finite velocity in any model of interacting bosons whose Hamiltonian contains spatially local single-boson hopping terms.
Our bounds are relevant for physically realistic initial conditions in experimentally realized models of interacting bosons.
arXiv Detail & Related papers (2021-06-17T18:00:00Z) - Large-time asymptotics in deep learning [0.0]
We consider the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training.
For the classical $L2$--regularized empirical risk minimization problem, we show that the training error is at most of the order $mathcalOleft(frac1Tright)$.
In the setting of $ellp$--distance losses, we prove that both the training error and the optimal parameters are at most of the order $mathcalOleft(e-mu
arXiv Detail & Related papers (2020-08-06T07:33:17Z) - A Universal Approximation Theorem of Deep Neural Networks for Expressing
Probability Distributions [12.100913944042972]
We prove that there exists a deep neural network $g:mathbbRdrightarrow mathbbR$ with ReLU activation.
The size of neural network can grow exponentially in $d$ when $1$-Wasserstein distance is used as the discrepancy.
arXiv Detail & Related papers (2020-04-19T14:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.