Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach
- URL: http://arxiv.org/abs/2304.06549v2
- Date: Mon, 26 Jun 2023 18:18:18 GMT
- Title: Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach
- Authors: Giacomo Greco, Maxence Noble, Giovanni Conforti, Alain Durmus
- Abstract summary: We focus on a relaxation of the original OT problem, the entropic OT problem, which allows to implement efficient and practical algorithmic solutions.
This formulation, also known as the Schr"odinger Bridge problem, notably connects with Optimal Control (SOC) and can be solved with the popular Sinkhorn algorithm.
- Score: 10.568851068989972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational optimal transport (OT) has recently emerged as a powerful
framework with applications in various fields. In this paper we focus on a
relaxation of the original OT problem, the entropic OT problem, which allows to
implement efficient and practical algorithmic solutions, even in high
dimensional settings. This formulation, also known as the Schr\"odinger Bridge
problem, notably connects with Stochastic Optimal Control (SOC) and can be
solved with the popular Sinkhorn algorithm. In the case of discrete-state
spaces, this algorithm is known to have exponential convergence; however,
achieving a similar rate of convergence in a more general setting is still an
active area of research. In this work, we analyze the convergence of the
Sinkhorn algorithm for probability measures defined on the $d$-dimensional
torus $\mathbb{T}_L^d$, that admit densities with respect to the Haar measure
of $\mathbb{T}_L^d$. In particular, we prove pointwise exponential convergence
of Sinkhorn iterates and their gradient. Our proof relies on the connection
between these iterates and the evolution along the Hamilton-Jacobi-Bellman
equations of value functions obtained from SOC-problems. Our approach is novel
in that it is purely probabilistic and relies on coupling by reflection
techniques for controlled diffusions on the torus.
Related papers
- Accelerating Sinkhorn Algorithm with Sparse Newton Iterations [14.094908995798757]
We present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm.
SNS converges orders magnitude faster across a wide range of practical cases.
arXiv Detail & Related papers (2024-01-20T21:23:09Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Proximal denoiser for convergent plug-and-play optimization with
nonconvex regularization [7.0226402509856225]
Plug-and-Play () methods solve ill proximal-posed inverse problems through algorithms by replacing a neural network operator by a denoising operator.
We show that this denoiser actually correspond to a gradient function.
arXiv Detail & Related papers (2022-01-31T14:05:20Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [5.269633789700637]
We prove that textsf-SGD converges to a limit with the Cram'r-Rao cofactor, for a broad range of step-size choices.
We show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of textsf-SGD converges to a limit with the Cram'r-Rao cofactor, for a broad range of step-size choices.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z) - S-ADDOPT: Decentralized stochastic first-order optimization over
directed graphs [16.96562173221624]
Decentralized convex optimization is proposed to minimize a sum of smooth and strongly cost functions when the functions are distributed over a directed network nodes.
In particular, we propose thetextbftextttS-ADDOPT algorithm that assumes a first-order oracle at each node.
For decaying step-sizes$mathcalO (1/k)$, we show thattextbfttS-ADDOPT reaches the exact solution subly at$mathcalO (1/k)$ and its convergence is networkally-independent
arXiv Detail & Related papers (2020-05-15T21:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.