Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm
- URL: http://arxiv.org/abs/2307.16421v1
- Date: Mon, 31 Jul 2023 06:11:47 GMT
- Title: Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm
- Authors: Nabarun Deb, Young-Heon Kim, Soumik Pal, and Geoffrey Schiebinger
- Abstract summary: We prove that the sequence of marginals obtained from the iterations of the Sinkhorn algorithm converges to an absolutely continuous curve on the $2$-Wasserstein space.
This limit, which we call the Sinkhorn flow, is an example of a Wasserstein mirror gradient flow.
- Score: 0.15749416770494706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove that the sequence of marginals obtained from the iterations of the
Sinkhorn algorithm or the iterative proportional fitting procedure (IPFP) on
joint densities, converges to an absolutely continuous curve on the
$2$-Wasserstein space, as the regularization parameter $\varepsilon$ goes to
zero and the number of iterations is scaled as $1/\varepsilon$ (and other
technical assumptions). This limit, which we call the Sinkhorn flow, is an
example of a Wasserstein mirror gradient flow, a concept we introduce here
inspired by the well-known Euclidean mirror gradient flows. In the case of
Sinkhorn, the gradient is that of the relative entropy functional with respect
to one of the marginals and the mirror is half of the squared Wasserstein
distance functional from the other marginal. Interestingly, the norm of the
velocity field of this flow can be interpreted as the metric derivative with
respect to the linearized optimal transport (LOT) distance. An equivalent
description of this flow is provided by the parabolic Monge-Amp\`{e}re PDE
whose connection to the Sinkhorn algorithm was noticed by Berman (2020). We
derive conditions for exponential convergence for this limiting flow. We also
construct a Mckean-Vlasov diffusion whose marginal distributions follow the
Sinkhorn flow.
Related papers
- Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows [1.5561923713703105]
We introduce a novel discretization scheme for Wasserstein gradient flows that involves successively computing Schr"odinger bridges with the same marginals.
The proposed scheme has two advantages: one, it avoids the use of the score function, and, two, it is amenable to particle-based approximations using the Sinkhorn algorithm.
arXiv Detail & Related papers (2024-06-16T07:23:26Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach [10.568851068989972]
We focus on a relaxation of the original OT problem, the entropic OT problem, which allows to implement efficient and practical algorithmic solutions.
This formulation, also known as the Schr"odinger Bridge problem, notably connects with Optimal Control (SOC) and can be solved with the popular Sinkhorn algorithm.
arXiv Detail & Related papers (2023-04-13T13:58:25Z) - Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM [11.007661197604065]
This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting.
Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
arXiv Detail & Related papers (2022-06-17T16:19:47Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Fast Margin Maximization via Dual Acceleration [52.62944011696364]
We present and analyze a momentum-based method for training linear classifiers with an exponentially-tailed loss.
This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual.
arXiv Detail & Related papers (2021-07-01T16:36:39Z) - On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient
Flow [26.725412498545385]
We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN.
We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
arXiv Detail & Related papers (2020-11-04T16:55:00Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Debiased Sinkhorn barycenters [110.79706180350507]
Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning.
We show how this bias is tightly linked to the reference measure that defines the entropy regularizer.
We propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing.
arXiv Detail & Related papers (2020-06-03T23:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.