Generative Modeling by Minimizing the Wasserstein-2 Loss
- URL: http://arxiv.org/abs/2406.13619v2
- Date: Sun, 14 Jul 2024 05:54:39 GMT
- Title: Generative Modeling by Minimizing the Wasserstein-2 Loss
- Authors: Yu-Jui Huang, Zachariah Malik,
- Abstract summary: This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W$ loss) through a distribution-dependent ordinary differential equation (ODE)
A main result shows that the time-marginal laws of the ODE form a gradient flow for the $W$ loss, which converges exponentially to the true data distribution.
An algorithm is designed by following the scheme and applying persistent training, which naturally fits our gradient-flow approach.
- Score: 1.2277343096128712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss) through a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential associated with the true data distribution and a current estimate of it. A main result shows that the time-marginal laws of the ODE form a gradient flow for the $W_2$ loss, which converges exponentially to the true data distribution. An Euler scheme for the ODE is proposed and it is shown to recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which naturally fits our gradient-flow approach. In both low- and high-dimensional experiments, our algorithm outperforms Wasserstein generative adversarial networks by increasing the level of persistent training appropriately.
Related papers
- Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence [54.580605276017096]
Rectified Flow (RF) aims to learn straight flow trajectories from noise to data using a sequence of convex optimization problems.
RF theoretically straightens the trajectory through successive rectifications, reducing the number of evaluations function (NFEs) while sampling.
We provide the first theoretical analysis of the Wasserstein distance between the sampling distribution of RF and the target distribution.
arXiv Detail & Related papers (2024-10-19T02:36:11Z) - Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows [6.799748192975493]
We formulate well-posed continuous-time generative flows for learning distributions supported on low-dimensional manifold.
We show that the Wasserstein-1 proximal operator regularize $f$-divergences so that singular distributions can be compared.
We also show that the Wasserstein-2 proximal operator regularize the paths of the generative flows by adding an optimal transport cost.
arXiv Detail & Related papers (2024-07-16T16:34:31Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations [0.6040014326756179]
We propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward differential equations.
The deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels.
arXiv Detail & Related papers (2024-04-12T13:05:35Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Convergence of flow-based generative models via proximal gradient descent in Wasserstein space [20.771897445580723]
Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood.
We provide a theoretical guarantee of generating data distribution by a progressive flow model.
arXiv Detail & Related papers (2023-10-26T17:06:23Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Second-order flows for computing the ground states of rotating
Bose-Einstein condensates [5.252966797394752]
Some artificial evolutionary differential equations involving second-order time derivatives are considered to be first-order.
The proposed artificial dynamics are novel second-order hyperbolic partial differential equations with dissipation.
New algorithms are superior to the state-of-the-art numerical methods based on the gradient flow.
arXiv Detail & Related papers (2022-05-02T10:45:49Z) - Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions.
By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic.
We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.