Doubly Regularized Entropic Wasserstein Barycenters
- URL: http://arxiv.org/abs/2303.11844v1
- Date: Tue, 21 Mar 2023 13:42:43 GMT
- Title: Doubly Regularized Entropic Wasserstein Barycenters
- Authors: L\'ena\"ic Chizat
- Abstract summary: We study a general formulation of regularized Wasserstein barycenters that enjoys favorable regularity, approximation, stability and (grid-free) optimization properties.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a general formulation of regularized Wasserstein barycenters that
enjoys favorable regularity, approximation, stability and (grid-free)
optimization properties. This barycenter is defined as the unique probability
measure that minimizes the sum of entropic optimal transport (EOT) costs with
respect to a family of given probability measures, plus an entropy term. We
denote it $(\lambda,\tau)$-barycenter, where $\lambda$ is the inner
regularization strength and $\tau$ the outer one. This formulation recovers
several previously proposed EOT barycenters for various choices of
$\lambda,\tau \geq 0$ and generalizes them. First, in spite of -- and in fact
owing to -- being \emph{doubly} regularized, we show that our formulation is
debiased for $\tau=\lambda/2$: the suboptimality in the (unregularized)
Wasserstein barycenter objective is, for smooth densities, of the order of the
strength $\lambda^2$ of entropic regularization, instead of
$\max\{\lambda,\tau\}$ in general. We discuss this phenomenon for isotropic
Gaussians where all $(\lambda,\tau)$-barycenters have closed form. Second, we
show that for $\lambda,\tau>0$, this barycenter has a smooth density and is
strongly stable under perturbation of the marginals. In particular, it can be
estimated efficiently: given $n$ samples from each of the probability measures,
it converges in relative entropy to the population barycenter at a rate
$n^{-1/2}$. And finally, this formulation lends itself naturally to a grid-free
optimization algorithm: we propose a simple \emph{noisy particle gradient
descent} which, in the mean-field limit, converges globally at an exponential
rate to the barycenter.
Related papers
- Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent [14.890609936348277]
We provide finite-particle convergence rates for the Stein Variational Gradient Descent algorithm in the Kernelized Stein Discrepancy ($mathsfKSD$) and Wasserstein-2 metrics.
Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations splits into a dominant negative part' proportional to $N$ times the expected $mathsfKSD2$ and a smaller positive part'
arXiv Detail & Related papers (2024-09-13T01:49:19Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Projected Langevin dynamics and a gradient flow for entropic optimal
transport [0.8057006406834466]
We introduce analogous diffusion dynamics that sample from an entropy-regularized optimal transport.
By studying the induced Wasserstein geometry of the submanifold $Pi(mu,nu)$, we argue that the SDE can be viewed as a Wasserstein gradient flow on this space of couplings.
arXiv Detail & Related papers (2023-09-15T17:55:56Z) - Properties of Discrete Sliced Wasserstein Losses [11.280151521887076]
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures.
Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW.
We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $mathcalE_p$.
arXiv Detail & Related papers (2023-07-19T21:21:18Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Estimating 2-Sinkhorn Divergence between Gaussian Processes from
Finite-Dimensional Marginals [4.416484585765028]
We study the convergence of estimating the 2-Sinkhorn divergence between emphGaussian processes (GPs) using their finite-dimensional marginal distributions.
We show almost sure convergence of the divergence when the marginals are sampled according to some base measure.
arXiv Detail & Related papers (2021-02-05T16:17:55Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Debiasing Distributed Second Order Optimization with Surrogate Sketching
and Scaled Regularization [101.5159744660701]
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods.
arXiv Detail & Related papers (2020-07-02T18:08:14Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Debiased Sinkhorn barycenters [110.79706180350507]
Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning.
We show how this bias is tightly linked to the reference measure that defines the entropy regularizer.
We propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing.
arXiv Detail & Related papers (2020-06-03T23:06:02Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.