Accelerating Sinkhorn Algorithm with Sparse Newton Iterations
- URL: http://arxiv.org/abs/2401.12253v1
- Date: Sat, 20 Jan 2024 21:23:09 GMT
- Title: Accelerating Sinkhorn Algorithm with Sparse Newton Iterations
- Authors: Xun Tang, Michael Shavlovsky, Holakou Rahmanian, Elisa Tardini, Kiran
Koshy Thekumparampil, Tesi Xiao, Lexing Ying
- Abstract summary: We present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm.
SNS converges orders magnitude faster across a wide range of practical cases.
- Score: 14.094908995798757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computing the optimal transport distance between statistical distributions is
a fundamental task in machine learning. One remarkable recent advancement is
entropic regularization and the Sinkhorn algorithm, which utilizes only matrix
scaling and guarantees an approximated solution with near-linear runtime.
Despite the success of the Sinkhorn algorithm, its runtime may still be slow
due to the potentially large number of iterations needed for convergence. To
achieve possibly super-exponential convergence, we present
Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by
introducing early stopping for the matrix scaling steps and a second stage
featuring a Newton-type subroutine. Adopting the variational viewpoint that the
Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight
that the Hessian matrix of the potential function is approximately sparse.
Sparsification of the Hessian results in a fast $O(n^2)$ per-iteration
complexity, the same as the Sinkhorn algorithm. In terms of total iteration
count, we observe that the SNS algorithm converges orders of magnitude faster
across a wide range of practical cases, including optimal transportation
between empirical distributions and calculating the Wasserstein $W_1, W_2$
distance of discretized densities. The empirical performance is corroborated by
a rigorous bound on the approximate sparsity of the Hessian matrix.
Related papers
- A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Compressed online Sinkhorn [3.2534959204741085]
We revisit the recently introduced online Sinkhorn algorithm of [Mensch and Peyr'e, 2020].
We improve the convergence analysis for the online Sinkhorn algorithm, the new rate that we obtain is faster than the previous rate under certain parameter choices.
Secondly, we propose the compressed online Sinkhorn algorithm which combines measure compression techniques with the online Sinkhorn algorithm.
arXiv Detail & Related papers (2023-10-08T05:33:32Z) - Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach [10.568851068989972]
We focus on a relaxation of the original OT problem, the entropic OT problem, which allows to implement efficient and practical algorithmic solutions.
This formulation, also known as the Schr"odinger Bridge problem, notably connects with Optimal Control (SOC) and can be solved with the popular Sinkhorn algorithm.
arXiv Detail & Related papers (2023-04-13T13:58:25Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Efficient Optimal Transport Algorithm by Accelerated Gradient descent [20.614477547939845]
We propose a novel algorithm to further improve the efficiency and accuracy based on Nesterov's smoothing technique.
The proposed method achieves faster convergence and better accuracy with the same parameter.
arXiv Detail & Related papers (2021-04-12T20:23:29Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.