Variational Wasserstein gradient flow
- URL: http://arxiv.org/abs/2112.02424v1
- Date: Sat, 4 Dec 2021 20:27:31 GMT
- Title: Variational Wasserstein gradient flow
- Authors: Jiaojiao Fan, Amirhossein Taghvaei, Yongxin Chen
- Abstract summary: We propose a scalable proximal gradient type algorithm for Wasserstein gradient flow.
Our framework covers all the classical Wasserstein gradient flows including the heat equation and the porous medium equation.
- Score: 9.901677207027806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The gradient flow of a function over the space of probability densities with
respect to the Wasserstein metric often exhibits nice properties and has been
utilized in several machine learning applications. The standard approach to
compute the Wasserstein gradient flow is the finite difference which
discretizes the underlying space over a grid, and is not scalable. In this
work, we propose a scalable proximal gradient type algorithm for Wasserstein
gradient flow. The key of our method is a variational formulation of the
objective function, which makes it possible to realize the JKO proximal map
through a primal-dual optimization. This primal-dual problem can be efficiently
solved by alternatively updating the parameters in the inner and outer loops.
Our framework covers all the classical Wasserstein gradient flows including the
heat equation and the porous medium equation. We demonstrate the performance
and scalability of our algorithm with several numerical examples.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Bridging the Gap Between Variational Inference and Wasserstein Gradient
Flows [6.452626686361619]
We bridge the gap between variational inference and Wasserstein gradient flows.
Under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow.
We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow.
arXiv Detail & Related papers (2023-10-31T00:10:19Z) - Regularized Stein Variational Gradient Flow [22.69908798297709]
The Stein Variational Gradient Descent (SVGD) algorithm is a deterministic particle method for sampling.
We propose the Regularized Stein Variational Gradient Flow, which interpolates between the Stein Variational Gradient Flow and the Wasserstein Gradient Flow.
arXiv Detail & Related papers (2022-11-15T02:56:46Z) - Optimal Neural Network Approximation of Wasserstein Gradient Direction
via Convex Optimization [43.6961980403682]
The computation of Wasserstein gradient direction is essential for posterior sampling problems and scientific computing.
We study the variational problem in the family of two-layer networks with squared-ReLU activations, towards which we derive a semi-definite programming (SDP) relaxation.
This SDP can be viewed as an approximation of the Wasserstein gradient in a broader function family including two-layer networks.
arXiv Detail & Related papers (2022-05-26T00:51:12Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Sliced-Wasserstein Gradient Flows [15.048733056992855]
Minimizing functionals in the space of probability distributions can be done with Wasserstein gradient flows.
This work proposes to use gradient flows in the space of probability measures endowed with the sliced-Wasserstein distance.
arXiv Detail & Related papers (2021-10-21T08:34:26Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions.
By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic.
We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.