Rethinking Initialization of the Sinkhorn Algorithm
- URL: http://arxiv.org/abs/2206.07630v2
- Date: Wed, 5 Apr 2023 08:32:20 GMT
- Title: Rethinking Initialization of the Sinkhorn Algorithm
- Authors: James Thornton, Marco Cuturi
- Abstract summary: We show that data-dependent initializers result in dramatic speed-ups, with no effect on differentiability as long as implicit differentiation is used.
Ours rely on closed-forms for exact or approximate OT solutions known in the 1D, Gaussian or GMM settings.
- Score: 36.72281550968301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the optimal transport (OT) problem was originally formulated as a
linear program, the addition of entropic regularization has proven beneficial
both computationally and statistically, for many applications. The Sinkhorn
fixed-point algorithm is the most popular approach to solve this regularized
problem, and, as a result, multiple attempts have been made to reduce its
runtime using, e.g., annealing in the regularization parameter, momentum or
acceleration. The premise of this work is that initialization of the Sinkhorn
algorithm has received comparatively little attention, possibly due to two
preconceptions: since the regularized OT problem is convex, it may not be worth
crafting a good initialization, since any is guaranteed to work; secondly,
because the outputs of the Sinkhorn algorithm are often unrolled in end-to-end
pipelines, a data-dependent initialization would bias Jacobian computations. We
challenge this conventional wisdom, and show that data-dependent initializers
result in dramatic speed-ups, with no effect on differentiability as long as
implicit differentiation is used. Our initializations rely on closed-forms for
exact or approximate OT solutions that are known in the 1D, Gaussian or GMM
settings. They can be used with minimal tuning, and result in consistent
speed-ups for a wide variety of OT problems.
Related papers
- Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - SCORE: Approximating Curvature Information under Self-Concordant
Regularization [0.0]
We propose a generalized Gauss-Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization speed each time it receives a new input.
The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead.
arXiv Detail & Related papers (2021-12-14T13:03:04Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - Zeroth and First Order Stochastic Frank-Wolfe Algorithms for Constrained
Optimization [13.170519806372075]
Problems of convex optimization with two sets of constraints arise frequently in the context of semidefinite programming.
Since projection onto the first set of constraints is difficult, it becomes necessary to explore projection-free algorithms.
The efficacy of the proposed algorithms is tested on relevant applications of sparse matrix estimation, clustering via semidefinite relaxation, and uniform sparsest cut problem.
arXiv Detail & Related papers (2021-07-14T08:01:30Z) - Warm-starting quantum optimization [6.832341432995627]
We discuss how to warm-start quantum optimization with an initial state corresponding to the solution of a relaxation of an optimization problem.
This allows the quantum algorithm to inherit the performance guarantees of the classical algorithm.
It is straightforward to apply the same ideas to other randomized-rounding schemes and optimization problems.
arXiv Detail & Related papers (2020-09-21T18:00:09Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.