Differentiable Top-k Operator with Optimal Transport
- URL: http://arxiv.org/abs/2002.06504v2
- Date: Tue, 18 Feb 2020 18:56:09 GMT
- Title: Differentiable Top-k Operator with Optimal Transport
- Authors: Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha,
Wei Wei, Tomas Pfister
- Abstract summary: The SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem.
We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.
- Score: 135.36099648554054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The top-k operation, i.e., finding the k largest or smallest elements from a
collection of scores, is an important model component, which is widely used in
information retrieval, machine learning, and data mining. However, if the top-k
operation is implemented in an algorithmic way, e.g., using bubble algorithm,
the resulting model cannot be trained in an end-to-end way using prevalent
gradient descent algorithms. This is because these implementations typically
involve swapping indices, whose gradient cannot be computed. Moreover, the
corresponding mapping from the input scores to the indicator vector of whether
this element belongs to the top-k set is essentially discontinuous. To address
the issue, we propose a smoothed approximation, namely the SOFT (Scalable
Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT
top-k operator approximates the output of the top-k operation as the solution
of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT
operator can then be efficiently approximated based on the optimality
conditions of EOT problem. We apply the proposed operator to the k-nearest
neighbors and beam search algorithms, and demonstrate improved performance.
Related papers
- Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching [35.77769905072651]
We propose an iterative algorithm to approximate the MAP estimator efficiently to solve a variety of linear inverse problems.
Our algorithm is mathematically justified by the observation that the MAP objective can be approximated by a sum of $N$ local MAP'' objectives.
We validate our approach for various linear inverse problems, such as super-resolution, deblurring, inpainting, and compressed sensing.
arXiv Detail & Related papers (2024-05-29T06:56:12Z) - Practical First-Order Bayesian Optimization Algorithms [0.0]
We propose a class of practical FOBO algorithms that efficiently utilize the information from the gradient GP to identify potential query points with zero gradients.
We validate the performance of our proposed algorithms on several test functions and show that our algorithms outperform state-of-the-art FOBO algorithms.
arXiv Detail & Related papers (2023-06-19T10:05:41Z) - Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective [21.6146047576295]
The top-k operator returns a sparse vector, where the non-zero values correspond to the k largest values of the input.
We view the top-k operator as a linear program over the permutahedron, the convex hull of permutations.
Our framework is significantly more general than the existing one and allows for example to express top-k operators that select values in magnitude.
arXiv Detail & Related papers (2023-02-02T21:32:13Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Entropic Neural Optimal Transport via Diffusion Processes [105.34822201378763]
We propose a novel neural algorithm for the fundamental problem of computing the entropic optimal transport (EOT) plan between continuous probability distributions.
Our algorithm is based on the saddle point reformulation of the dynamic version of EOT which is known as the Schr"odinger Bridge problem.
In contrast to the prior methods for large-scale EOT, our algorithm is end-to-end and consists of a single learning step.
arXiv Detail & Related papers (2022-11-02T14:35:13Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Efficient Optimal Transport Algorithm by Accelerated Gradient descent [20.614477547939845]
We propose a novel algorithm to further improve the efficiency and accuracy based on Nesterov's smoothing technique.
The proposed method achieves faster convergence and better accuracy with the same parameter.
arXiv Detail & Related papers (2021-04-12T20:23:29Z) - Efficient Nonnegative Tensor Factorization via Saturating Coordinate
Descent [16.466065626950424]
We propose a novel fast and efficient NTF algorithm using the element selection approach.
Empirical analysis reveals that the proposed algorithm is scalable in terms of tensor size, density, and rank.
arXiv Detail & Related papers (2020-03-07T12:51:52Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.