Related papers: An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance

An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance

URL: http://arxiv.org/abs/2005.06727v3
Date: Mon, 22 Mar 2021 20:35:08 GMT
Title: An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance
Authors: Jesmin Jahan Tithi and Fabrizio Petrini
Abstract summary: The Word Mover's Distance (WMD) is a metric that measures the semantic dissimilarity between two text documents. We present a shared-memory parallel algorithm to compute WMD of one document against many other documents.
Score: 0.18275108630751835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Word Mover's Distance (WMD) is a metric that measures the semantic dissimilarity between two text documents by computing the cost of moving all words of a source/query document to the most similar words of a target document optimally. Computing WMD between two documents is costly because it requires solving an optimization problem that costs $O(V^3log(V))$ where $V$ is the number of unique words in the document. Fortunately, the WMD can be framed as the Earth Mover's Distance (EMD) (also known as the Optimal Transportation Distance) for which it has been shown that the algorithmic complexity can be reduced to $O(V^2)$ by adding an entropy penalty to the optimization problem and a similar idea can be adapted to compute WMD efficiently. Additionally, the computation can be made highly parallel by computing WMD of a single query document against multiple target documents at once (e.g., finding whether a given tweet is similar to any other tweets happened in a day). In this paper, we present a shared-memory parallel Sinkhorn-Knopp Algorithm to compute the WMD of one document against many other documents by adopting the $O(V^2)$ EMD algorithm. We used algorithmic transformations to change the original dense compute-heavy kernel to a sparse compute kernel and obtained $67\times$ speedup using $96$ cores on the state-of-the-art of Intel\textregistered{} 4-sockets Cascade Lake machine w.r.t. its sequential run. Our parallel algorithm is over $700\times$ faster than the naive parallel python code that internally uses optimized matrix library calls.

Related papers

Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue. We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues. Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z)
Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras. Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation. We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z)
A Push-Relabel Based Additive Approximation for Optimal Transport [5.111364864495785]
Exact algorithms for computing Optimal Transport can be slow. We introduce a new and very simple approach to find an $varepsilon$approximation of the OT distance. Our algorithm achieves a near-optimal execution time of $O(n2/varepsilon2)$ for computing OT distance.
arXiv Detail & Related papers (2022-03-07T21:40:14Z)
Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA) A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z)
Sublinear Time Approximation of Text Similarity Matrices [50.73398637380375]
We introduce a generalization of the popular Nystr"om method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
arXiv Detail & Related papers (2021-12-17T17:04:34Z)
A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU [0.3655021726150367]
The Word Movers Distance (WMD) measures the semantic dissimilarity between two text documents. We present a shared-memory parallel Sinkhorn-Knopp algorithm to compute the WMD of one document against many other documents. The parallel implementation achieves 67x speedup on 96 cores across 4 NUMA sockets of an Intel Cascade Lake system.
arXiv Detail & Related papers (2021-07-14T00:29:18Z)
Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering [6.4805900740861]
We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN$*$. We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time) Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13--55.89x, and existing parallel algorithms by at least an order of magnitude.
arXiv Detail & Related papers (2021-04-02T16:05:00Z)
Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z)
Exact, Parallelizable Dynamic Time Warping Alignment with Linear Memory [0.0]
We present a divide and conquer algorithm that computes the exact globally optimal DTW alignment using O(M+N) memory. Our algorithm can be parallelized up to a factor of min(M, N) with the same memory constraints, so it can still run more efficiently than the textbook version with an adequate GPU.
arXiv Detail & Related papers (2020-08-04T15:00:33Z)
A Study of Performance of Optimal Transport [16.847501106437534]
We show that network simplex and augmenting path based algorithms can consistently outperform numerical matrix-scaling based methods. We present a new algorithm that improves upon the classical Kuhn-Munkres algorithm.
arXiv Detail & Related papers (2020-05-03T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.