Sketching Transformed Matrices with Applications to Natural Language
Processing
- URL: http://arxiv.org/abs/2002.09812v1
- Date: Sun, 23 Feb 2020 03:07:31 GMT
- Title: Sketching Transformed Matrices with Applications to Natural Language
Processing
- Authors: Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
- Abstract summary: We propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix.
We show that our approach obtains small error and is efficient in both space and time.
- Score: 76.6222695417524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Suppose we are given a large matrix $A=(a_{i,j})$ that cannot be stored in
memory but is in a disk or is presented in a data stream. However, we need to
compute a matrix decomposition of the entry-wisely transformed matrix,
$f(A):=(f(a_{i,j}))$ for some function $f$. Is it possible to do it in a space
efficient way? Many machine learning applications indeed need to deal with such
large transformed matrices, for example word embedding method in NLP needs to
work with the pointwise mutual information (PMI) matrix, while the entrywise
transformation makes it difficult to apply known linear algebraic tools.
Existing approaches for this problem either need to store the whole matrix and
perform the entry-wise transformation afterwards, which is space consuming or
infeasible, or need to redesign the learning method, which is application
specific and requires substantial remodeling.
In this paper, we first propose a space-efficient sketching algorithm for
computing the product of a given small matrix with the transformed matrix. It
works for a general family of transformations with provable small error bounds
and thus can be used as a primitive in downstream learning tasks. We then apply
this primitive to a concrete application: low-rank approximation. We show that
our approach obtains small error and is efficient in both space and time. We
complement our theoretical results with experiments on synthetic and real data.
Related papers
- Optimal Quantization for Matrix Multiplication [35.007966885532724]
We present a universal quantizer based on nested lattices with an explicit guarantee of approximation error for any (non-random) pair of matrices $A$, $B$ in terms of only Frobenius norms $|A|_F, |B|_F$ and $|Atop B|_F$.
arXiv Detail & Related papers (2024-10-17T17:19:48Z) - Kissing to Find a Match: Efficient Low-Rank Permutation Representation [33.880247068298374]
We propose to tackle the curse of dimensionality of large permutation matrices by approximating them using low-rank matrix factorization, followed by a nonlinearity.
We demonstrate the applicability and merits of the proposed approach through a series of experiments on a range of problems.
arXiv Detail & Related papers (2023-08-25T08:59:03Z) - Multiresolution kernel matrix algebra [0.0]
We show the compression of kernel matrices by means of samplets produces optimally sparse matrices in a certain S-format.
The inverse of a kernel matrix (if it exists) is compressible in the S-format as well.
The matrix algebra is justified mathematically by pseudo differential calculus.
arXiv Detail & Related papers (2022-11-21T17:50:22Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Fast Differentiable Matrix Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP)
The other method is to use Matrix Pad'e Approximants (MPA)
arXiv Detail & Related papers (2022-01-21T12:18:06Z) - Sublinear Time Approximation of Text Similarity Matrices [50.73398637380375]
We introduce a generalization of the popular Nystr"om method to the indefinite setting.
Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix.
We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
arXiv Detail & Related papers (2021-12-17T17:04:34Z) - Sparse Factorization of Large Square Matrices [10.94053598642913]
In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices.
In the approximation, our method needs only $N(log N)2$ non-zero numbers for an $Ntimes N$ full matrix.
We show that our method gives a better approximation when the approximated matrix is sparse and high-rank.
arXiv Detail & Related papers (2021-09-16T18:42:21Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Spectral Learning on Matrices and Tensors [74.88243719463053]
We show that tensor decomposition can pick up latent effects that are missed by matrix methods.
We also outline computational techniques to design efficient tensor decomposition methods.
arXiv Detail & Related papers (2020-04-16T22:53:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.