Sublinear Time Approximation of Text Similarity Matrices
- URL: http://arxiv.org/abs/2112.09631v1
- Date: Fri, 17 Dec 2021 17:04:34 GMT
- Title: Sublinear Time Approximation of Text Similarity Matrices
- Authors: Archan Ray, Nicholas Monath, Andrew McCallum, Cameron Musco
- Abstract summary: We introduce a generalization of the popular Nystr"om method to the indefinite setting.
Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix.
We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
- Score: 50.73398637380375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study algorithms for approximating pairwise similarity matrices that arise
in natural language processing. Generally, computing a similarity matrix for
$n$ data points requires $\Omega(n^2)$ similarity computations. This quadratic
scaling is a significant bottleneck, especially when similarities are computed
via expensive functions, e.g., via transformer models. Approximation methods
reduce this quadratic complexity, often by using a small subset of exactly
computed similarities to approximate the remainder of the complete pairwise
similarity matrix.
Significant work focuses on the efficient approximation of positive
semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods.
However, much less is understood about indefinite (non-PSD) similarity
matrices, which often arise in NLP. Motivated by the observation that many of
these matrices are still somewhat close to PSD, we introduce a generalization
of the popular Nystr\"{o}m method to the indefinite setting. Our algorithm can
be applied to any similarity matrix and runs in sublinear time in the size of
the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity
computations.
We show that our method, along with a simple variant of CUR decomposition,
performs very well in approximating a variety of similarity matrices arising in
NLP tasks. We demonstrate high accuracy of the approximated similarity matrices
in the downstream tasks of document classification, sentence similarity, and
cross-document coreference.
Related papers
- An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Efficient distributed representations with linear-time attention scores normalization [3.8673630752805437]
We propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms.
The accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude.
The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem.
arXiv Detail & Related papers (2023-03-30T15:48:26Z) - High-Dimensional Sparse Bayesian Learning without Covariance Matrices [66.60078365202867]
We introduce a new inference scheme that avoids explicit construction of the covariance matrix.
Our approach couples a little-known diagonal estimation result from numerical linear algebra with the conjugate gradient algorithm.
On several simulations, our method scales better than existing approaches in computation time and memory.
arXiv Detail & Related papers (2022-02-25T16:35:26Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Fast Projected Newton-like Method for Precision Matrix Estimation under
Total Positivity [15.023842222803058]
Current algorithms are designed using the block coordinate descent method or the proximal point algorithm.
We propose a novel algorithm based on the two-metric projection method, incorporating a carefully designed search direction and variable partitioning scheme.
Experimental results on synthetic and real-world datasets demonstrate that our proposed algorithm provides a significant improvement in computational efficiency compared to the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-03T14:39:10Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Householder Dice: A Matrix-Free Algorithm for Simulating Dynamics on
Gaussian and Random Orthogonal Ensembles [12.005731086591139]
Householder Dice (HD) is an algorithm for simulating dynamics on dense random matrix ensembles with translation-invariant properties.
The memory and costs of the HD algorithm are $mathcalO(nT)$ and $mathcalO(nT2)$, respectively.
Numerical results demonstrate the promise of the HD algorithm as a new computational tool in the study of high-dimensional random systems.
arXiv Detail & Related papers (2021-01-19T04:50:53Z) - Estimating Multiple Precision Matrices with Cluster Fusion
Regularization [0.90238471756546]
We propose a penalized likelihood estimating multiple precision matrices from different classes.
Most existing methods either incorporate no information on relationships between the precision matrices, or require this information be a priori.
arXiv Detail & Related papers (2020-03-01T01:03:22Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.