Faster Wasserstein Distance Estimation with the Sinkhorn Divergence
- URL: http://arxiv.org/abs/2006.08172v2
- Date: Thu, 29 Oct 2020 15:15:37 GMT
- Title: Faster Wasserstein Distance Estimation with the Sinkhorn Divergence
- Authors: Lenaic Chizat (LMO), Pierre Roussillon (DMA), Flavien L\'eger (DMA),
Fran\c{c}ois-Xavier Vialard (Univ Gustave Eiffel), Gabriel Peyr\'e (DMA)
- Abstract summary: The squared Wasserstein distance is a quantity to compare probability distributions in a non-parametric setting.
In this work, we propose instead to estimate it with the Sinkhorn divergence.
We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The squared Wasserstein distance is a natural quantity to compare probability
distributions in a non-parametric setting. This quantity is usually estimated
with the plug-in estimator, defined via a discrete optimal transport problem
which can be solved to $\epsilon$-accuracy by adding an entropic regularization
of order $\epsilon$ and using for instance Sinkhorn's algorithm. In this work,
we propose instead to estimate it with the Sinkhorn divergence, which is also
built on entropic regularization but includes debiasing terms. We show that,
for smooth densities, this estimator has a comparable sample complexity but
allows higher regularization levels, of order $\epsilon^{1/2}$, which leads to
improved computational complexity bounds and a strong speedup in practice. Our
theoretical analysis covers the case of both randomly sampled densities and
deterministic discretizations on uniform grids. We also propose and analyze an
estimator based on Richardson extrapolation of the Sinkhorn divergence which
enjoys improved statistical and computational efficiency guarantees, under a
condition on the regularity of the approximation error, which is in particular
satisfied for Gaussian densities. We finally demonstrate the efficiency of the
proposed estimators with numerical experiments.
Related papers
- Noise-Free Sampling Algorithms via Regularized Wasserstein Proximals [3.4240632942024685]
We consider the problem of sampling from a distribution governed by a potential function.
This work proposes an explicit score based MCMC method that is deterministic, resulting in a deterministic evolution for particles.
arXiv Detail & Related papers (2023-08-28T23:51:33Z) - Statistical, Robustness, and Computational Guarantees for Sliced
Wasserstein Distances [18.9717974398864]
Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions.
We quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods.
arXiv Detail & Related papers (2022-10-17T15:04:51Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Wasserstein Distributionally Robust Estimation in High Dimensions:
Performance Analysis and Optimal Hyperparameter Tuning [0.0]
We propose a Wasserstein distributionally robust estimation framework to estimate an unknown parameter from noisy linear measurements.
We focus on the task of analyzing the squared error performance of such estimators.
We show that the squared error can be recovered as the solution of a convex-concave optimization problem.
arXiv Detail & Related papers (2022-06-27T13:02:59Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Fast Approximation of the Sliced-Wasserstein Distance Using
Concentration of Random Projections [19.987683989865708]
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications.
We propose a new perspective to approximate SW by making use of the concentration of measure phenomenon.
Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation.
arXiv Detail & Related papers (2021-06-29T13:56:19Z) - Finite sample approximations of exact and entropic Wasserstein distances
between covariance operators and Gaussian processes [0.0]
We show that the Sinkhorn divergence between two centered Gaussian processes can be consistently and efficiently estimated.
For a fixed regularization parameter, the convergence rates are it dimension-independent and of the same order as those for the Hilbert-Schmidt distance.
If at least one of the RKHS is finite-dimensional, we obtain a it dimension-dependent sample complexity for the exact Wasserstein distance between the Gaussian processes.
arXiv Detail & Related papers (2021-04-26T06:57:14Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.