Sparse Factorization of Large Square Matrices
- URL: http://arxiv.org/abs/2109.08184v1
- Date: Thu, 16 Sep 2021 18:42:21 GMT
- Title: Sparse Factorization of Large Square Matrices
- Authors: Ruslan Khalitov, Tong Yu, Lei Cheng, Zhirong Yang
- Abstract summary: In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices.
In the approximation, our method needs only $N(log N)2$ non-zero numbers for an $Ntimes N$ full matrix.
We show that our method gives a better approximation when the approximated matrix is sparse and high-rank.
- Score: 10.94053598642913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Square matrices appear in many machine learning problems and models.
Optimization over a large square matrix is expensive in memory and in time.
Therefore an economic approximation is needed. Conventional approximation
approaches factorize the square matrix into a number matrices of much lower
ranks. However, the low-rank constraint is a performance bottleneck if the
approximated matrix is intrinsically high-rank or close to full rank. In this
paper, we propose to approximate a large square matrix with a product of sparse
full-rank matrices. In the approximation, our method needs only $N(\log N)^2$
non-zero numbers for an $N\times N$ full matrix. We present both non-parametric
and parametric ways to find the factorization. In the former, we learn the
factorizing matrices directly, and in the latter, we train neural networks to
map input data to the non-zero matrix entries. The sparse factorization method
is tested for a variety of synthetic and real-world square matrices. The
experimental results demonstrate that our method gives a better approximation
when the approximated matrix is sparse and high-rank. Based on this finding, we
use our parametric method as a scalable attention architecture that performs
strongly in learning tasks for long sequential data and defeats Transformer and
its several variants.
Related papers
- Multiresolution kernel matrix algebra [0.0]
We show the compression of kernel matrices by means of samplets produces optimally sparse matrices in a certain S-format.
The inverse of a kernel matrix (if it exists) is compressible in the S-format as well.
The matrix algebra is justified mathematically by pseudo differential calculus.
arXiv Detail & Related papers (2022-11-21T17:50:22Z) - A Structured Sparse Neural Network and Its Matrix Calculations Algorithm [0.0]
We introduce a nonsymmetric, tridiagonal matrix with offdiagonal sparse entries and offset sub and super-diagonals.
For the cases where the matrix inverse does not exist, a least square type pseudoinverse is provided.
Results show significant improvement in computational costs specially when the size of matrix increases.
arXiv Detail & Related papers (2022-07-02T19:38:48Z) - Low-Rank Updates of Matrix Square Roots [7.832944895330117]
We consider the matrix square root and inverse square root operations.
Given a low rank perturbation to a matrix, we argue that a low-rank approximate correction to the (inverse) square root exists.
We then discuss how a low-rank solution to that equation can be computed.
arXiv Detail & Related papers (2022-01-31T12:05:33Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Fast Differentiable Matrix Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP)
The other method is to use Matrix Pad'e Approximants (MPA)
arXiv Detail & Related papers (2022-01-21T12:18:06Z) - Sublinear Time Approximation of Text Similarity Matrices [50.73398637380375]
We introduce a generalization of the popular Nystr"om method to the indefinite setting.
Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix.
We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
arXiv Detail & Related papers (2021-12-17T17:04:34Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Sketching Transformed Matrices with Applications to Natural Language
Processing [76.6222695417524]
We propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix.
We show that our approach obtains small error and is efficient in both space and time.
arXiv Detail & Related papers (2020-02-23T03:07:31Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.