Efficient GPU implementation of randomized SVD and its applications
- URL: http://arxiv.org/abs/2110.03423v2
- Date: Tue, 12 Mar 2024 12:12:29 GMT
- Title: Efficient GPU implementation of randomized SVD and its applications
- Authors: {\L}ukasz Struski, Pawe{\l} Morkisz, Przemys{\l}aw Spurek, Samuel
Rodriguez Bernabeu, Tomasz Trzci\'nski
- Abstract summary: Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms.
Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time.
We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
- Score: 17.71779625877989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Matrix decompositions are ubiquitous in machine learning, including
applications in dimensionality reduction, data compression and deep learning
algorithms. Typical solutions for matrix decompositions have polynomial
complexity which significantly increases their computational cost and time. In
this work, we leverage efficient processing operations that can be run in
parallel on modern Graphical Processing Units (GPUs), predominant computing
architecture used e.g. in deep learning, to reduce the computational burden of
computing matrix decompositions. More specifically, we reformulate the
randomized decomposition problem to incorporate fast matrix multiplication
operations (BLAS-3) as building blocks. We show that this formulation, combined
with fast random number generators, allows to fully exploit the potential of
parallel processing implemented in GPUs. Our extensive evaluation confirms the
superiority of this approach over the competing methods and we release the
results of this research as a part of the official CUDA implementation
(https://docs.nvidia.com/cuda/cusolver/index.html).
Related papers
- An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - CoLA: Exploiting Compositional Structure for Automatic and Efficient
Numerical Linear Algebra [62.37017125812101]
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA.
By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms.
We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
arXiv Detail & Related papers (2023-09-06T14:59:38Z) - Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations.
We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery.
We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and
Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning.
FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices.
We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z) - Concurrent Alternating Least Squares for multiple simultaneous Canonical
Polyadic Decompositions [2.3513645401551333]
We introduce the Concurrent ALS algorithm and library, which offers an interface to Matlab.
We show how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity.
Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.
arXiv Detail & Related papers (2020-10-09T16:55:46Z) - QR and LQ Decomposition Matrix Backpropagation Algorithms for Square,
Wide, and Deep -- Real or Complex -- Matrices and Their Software
Implementation [0.0]
This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_m, n$, that are either square (m = n), wide (m n), or deep (m > n), with rank $k = min(m, n)$.
We derive novel matrix backpropagation results for the pivoted (full-rank) QR decomposition and for the LQ decomposition of deep input matrices.
arXiv Detail & Related papers (2020-09-19T21:03:37Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.