Related papers: Batch-efficient EigenDecomposition for Small and Medium Matrices

Batch-efficient EigenDecomposition for Small and Medium Matrices

URL: http://arxiv.org/abs/2207.04228v1
Date: Sat, 9 Jul 2022 09:14:12 GMT
Title: Batch-efficient EigenDecomposition for Small and Medium Matrices
Authors: Yue Song, Nicu Sebe, Wei Wang
Abstract summary: EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
Score: 65.67315418971688
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in the deep neural networks. In this paper, we propose a QR-based ED method dedicated to the application scenarios of computer vision. Our proposed method performs the ED entirely by batched matrix/vector multiplication, which processes all the matrices simultaneously and thus fully utilizes the power of GPUs. Our technique is based on the explicit QR iterations by Givens rotation with double Wilkinson shifts. With several acceleration techniques, the time complexity of QR iterations is reduced from $O{(}n^5{)}$ to $O{(}n^3{)}$. The numerical test shows that for small and medium batched matrices (\emph{e.g.,} $dim{<}32$) our method can be much faster than the Pytorch SVD function. Experimental results on visual recognition and image generation demonstrate that our methods also achieve competitive performances.

Related papers

Improving Algorithmic Efficiency using Cryptography [11.496343300483904]
We show how to use cryptography to improve the time complexity of solving computational problems. We show that under standard cryptographic assumptions, we can design algorithms that are determinantally faster than existing ones.
arXiv Detail & Related papers (2025-02-18T17:08:59Z)
An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks [8.779871128906787]
Large Language Models (LLMs) suffer from inference inefficiency while relying on advanced computational infrastructure. We propose algorithms to improve the inference time and memory efficiency of 1.58-bit LLMs with ternary weight matrices. Our results confirm the superiority of the approach both with respect to time and memory, as we observed a reduction in inference time up to 29x and memory usage up to 6x.
arXiv Detail & Related papers (2024-11-10T04:56:14Z)
Giga-scale Kernel Matrix Vector Multiplication on GPU [9.106412307976067]
Kernel matrix vector multiplication (KMVM) is a ubiquitous operation in machine learning and scientific computing, spanning from the kernel literature to signal processing. We propose a novel approximation procedure coined Faster-Fast and Free Memory Method ($textF3$M) to address these scaling issues for KMVM. We show that $textF3$M can compute a full KMVM for a billion points emphin under one minute on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods.
arXiv Detail & Related papers (2022-02-02T15:28:15Z)
Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA) A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z)
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent. We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z)
A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z)
Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms. Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time. We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)
Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning. FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices. We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z)
What if Neural Networks had SVDs? [66.91160214071088]
Various Neural Networks employ time-consuming matrix operations like matrix inversion. We present an algorithm that is fast enough to speed up several matrix operations.
arXiv Detail & Related papers (2020-09-29T12:58:52Z)
QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep -- Real or Complex -- Matrices and Their Software Implementation [0.0]
This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_m, n$, that are either square (m = n), wide (m n), or deep (m > n), with rank $k = min(m, n)$. We derive novel matrix backpropagation results for the pivoted (full-rank) QR decomposition and for the LQ decomposition of deep input matrices.
arXiv Detail & Related papers (2020-09-19T21:03:37Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.