Batch-efficient EigenDecomposition for Small and Medium Matrices
- URL: http://arxiv.org/abs/2207.04228v1
- Date: Sat, 9 Jul 2022 09:14:12 GMT
- Title: Batch-efficient EigenDecomposition for Small and Medium Matrices
- Authors: Yue Song, Nicu Sebe, Wei Wang
- Abstract summary: EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
- Score: 65.67315418971688
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: EigenDecomposition (ED) is at the heart of many computer vision algorithms
and applications. One crucial bottleneck limiting its usage is the expensive
computation cost, particularly for a mini-batch of matrices in the deep neural
networks. In this paper, we propose a QR-based ED method dedicated to the
application scenarios of computer vision. Our proposed method performs the ED
entirely by batched matrix/vector multiplication, which processes all the
matrices simultaneously and thus fully utilizes the power of GPUs. Our
technique is based on the explicit QR iterations by Givens rotation with double
Wilkinson shifts. With several acceleration techniques, the time complexity of
QR iterations is reduced from $O{(}n^5{)}$ to $O{(}n^3{)}$. The numerical test
shows that for small and medium batched matrices (\emph{e.g.,} $dim{<}32$) our
method can be much faster than the Pytorch SVD function. Experimental results
on visual recognition and image generation demonstrate that our methods also
achieve competitive performances.
Related papers
- Giga-scale Kernel Matrix Vector Multiplication on GPU [9.106412307976067]
Kernel matrix vector multiplication (KMVM) is a ubiquitous operation in machine learning and scientific computing, spanning from the kernel literature to signal processing.
We propose a novel approximation procedure coined Faster-Fast and Free Memory Method ($textF3$M) to address these scaling issues for KMVM.
We show that $textF3$M can compute a full KMVM for a billion points emphin under one minute on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods.
arXiv Detail & Related papers (2022-02-02T15:28:15Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - A Deep Learning Inference Scheme Based on Pipelined Matrix
Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology.
Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z) - Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms.
Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time.
We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and
Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning.
FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices.
We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z) - What if Neural Networks had SVDs? [66.91160214071088]
Various Neural Networks employ time-consuming matrix operations like matrix inversion.
We present an algorithm that is fast enough to speed up several matrix operations.
arXiv Detail & Related papers (2020-09-29T12:58:52Z) - QR and LQ Decomposition Matrix Backpropagation Algorithms for Square,
Wide, and Deep -- Real or Complex -- Matrices and Their Software
Implementation [0.0]
This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_m, n$, that are either square (m = n), wide (m n), or deep (m > n), with rank $k = min(m, n)$.
We derive novel matrix backpropagation results for the pivoted (full-rank) QR decomposition and for the LQ decomposition of deep input matrices.
arXiv Detail & Related papers (2020-09-19T21:03:37Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.