Sinkhorn Barycenter via Functional Gradient Descent
- URL: http://arxiv.org/abs/2007.10449v1
- Date: Mon, 20 Jul 2020 20:16:47 GMT
- Title: Sinkhorn Barycenter via Functional Gradient Descent
- Authors: Zebang Shen, Zhenfu Wang, Alejandro Ribeiro, Hamed Hassani
- Abstract summary: We consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence.
This problem has recently found applications across various domains, including graphics, learning, and vision.
We develop a novel functional gradient descent method named Sinkhorn Descent.
- Score: 125.89871274202439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider the problem of computing the barycenter of a set
of probability distributions under the Sinkhorn divergence.
This problem has recently found applications across various domains,
including graphics, learning, and vision, as it provides a meaningful mechanism
to aggregate knowledge.
Unlike previous approaches which directly operate in the space of probability
measures, we recast the Sinkhorn barycenter problem as an instance of
unconstrained functional optimization and develop a novel functional gradient
descent method named Sinkhorn Descent (SD).
We prove that SD converges to a stationary point at a sublinear rate, and
under reasonable assumptions, we further show that it asymptotically finds a
global minimizer of the Sinkhorn barycenter problem. Moreover, by providing a
mean-field analysis, we show that SD preserves the weak convergence of
empirical measures.
Importantly, the computational complexity of SD scales linearly in the
dimension $d$ and we demonstrate its scalability by solving a $100$-dimensional
Sinkhorn barycenter problem.
Related papers
- Implicit Bias and Fast Convergence Rates for Self-attention [30.08303212679308]
Self-attention, the core mechanism of transformers, distinguishes them from traditional neural networks and drives their outstanding performance.
We investigate the implicit bias of gradient descent (GD) in training a self-attention layer with fixed linear decoder in binary.
We provide the first finite-time convergence rate for $W_t$ to $W_mm$, along with the rate of sparsification in the attention map.
arXiv Detail & Related papers (2024-02-08T15:15:09Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Faster high-accuracy log-concave sampling via algorithmic warm starts [6.117084972237769]
In practice, high-accuracy samplers such as the classical Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard.
We improve the dimension dependence of this sampling problem to $tildeO(d1/2)$, whereas the previous best result for MALA was $tildeO(d)$.
Our main technical contribution settles this problem by establishing the first $tildeO(d1/2)$ R'enyi mixing rates for the discretized underdamped Langevin diffusion.
arXiv Detail & Related papers (2023-02-20T19:27:21Z) - Estimating 2-Sinkhorn Divergence between Gaussian Processes from
Finite-Dimensional Marginals [4.416484585765028]
We study the convergence of estimating the 2-Sinkhorn divergence between emphGaussian processes (GPs) using their finite-dimensional marginal distributions.
We show almost sure convergence of the divergence when the marginals are sampled according to some base measure.
arXiv Detail & Related papers (2021-02-05T16:17:55Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence.
We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically.
In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Scalable Computations of Wasserstein Barycenter via Input Convex Neural
Networks [15.171726731041055]
Wasserstein Barycenter is a principled approach to represent the weighted mean of a given set of probability distributions.
We present a novel scalable algorithm to approximate the Wasserstein Barycenters aiming at high-dimensional applications in machine learning.
arXiv Detail & Related papers (2020-07-08T22:41:18Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Debiased Sinkhorn barycenters [110.79706180350507]
Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning.
We show how this bias is tightly linked to the reference measure that defines the entropy regularizer.
We propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing.
arXiv Detail & Related papers (2020-06-03T23:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.