Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach
- URL: http://arxiv.org/abs/2304.06549v2
- Date: Mon, 26 Jun 2023 18:18:18 GMT
- Title: Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach
- Authors: Giacomo Greco, Maxence Noble, Giovanni Conforti, Alain Durmus
- Abstract summary: We focus on a relaxation of the original OT problem, the entropic OT problem, which allows to implement efficient and practical algorithmic solutions.
This formulation, also known as the Schr"odinger Bridge problem, notably connects with Optimal Control (SOC) and can be solved with the popular Sinkhorn algorithm.
- Score: 10.568851068989972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational optimal transport (OT) has recently emerged as a powerful
framework with applications in various fields. In this paper we focus on a
relaxation of the original OT problem, the entropic OT problem, which allows to
implement efficient and practical algorithmic solutions, even in high
dimensional settings. This formulation, also known as the Schr\"odinger Bridge
problem, notably connects with Stochastic Optimal Control (SOC) and can be
solved with the popular Sinkhorn algorithm. In the case of discrete-state
spaces, this algorithm is known to have exponential convergence; however,
achieving a similar rate of convergence in a more general setting is still an
active area of research. In this work, we analyze the convergence of the
Sinkhorn algorithm for probability measures defined on the $d$-dimensional
torus $\mathbb{T}_L^d$, that admit densities with respect to the Haar measure
of $\mathbb{T}_L^d$. In particular, we prove pointwise exponential convergence
of Sinkhorn iterates and their gradient. Our proof relies on the connection
between these iterates and the evolution along the Hamilton-Jacobi-Bellman
equations of value functions obtained from SOC-problems. Our approach is novel
in that it is purely probabilistic and relies on coupling by reflection
techniques for controlled diffusions on the torus.
Related papers
- Accelerating Sinkhorn Algorithm with Sparse Newton Iterations [14.094908995798757]
We present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm.
SNS converges orders magnitude faster across a wide range of practical cases.
arXiv Detail & Related papers (2024-01-20T21:23:09Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Proximal denoiser for convergent plug-and-play optimization with
nonconvex regularization [7.0226402509856225]
Plug-and-Play () methods solve ill proximal-posed inverse problems through algorithms by replacing a neural network operator by a denoising operator.
We show that this denoiser actually correspond to a gradient function.
arXiv Detail & Related papers (2022-01-31T14:05:20Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z) - S-ADDOPT: Decentralized stochastic first-order optimization over
directed graphs [16.96562173221624]
Decentralized convex optimization is proposed to minimize a sum of smooth and strongly cost functions when the functions are distributed over a directed network nodes.
In particular, we propose thetextbftextttS-ADDOPT algorithm that assumes a first-order oracle at each node.
For decaying step-sizes$mathcalO (1/k)$, we show thattextbfttS-ADDOPT reaches the exact solution subly at$mathcalO (1/k)$ and its convergence is networkally-independent
arXiv Detail & Related papers (2020-05-15T21:14:22Z) - Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.