Giga-scale Kernel Matrix Vector Multiplication on GPU
- URL: http://arxiv.org/abs/2202.01085v1
- Date: Wed, 2 Feb 2022 15:28:15 GMT
- Title: Giga-scale Kernel Matrix Vector Multiplication on GPU
- Authors: Robert Hu, Dino Sejdinovic, Joan Alexis Glaun\`es
- Abstract summary: Kernel matrix vector multiplication (KMVM) is a ubiquitous operation in machine learning and scientific computing, spanning from the kernel literature to signal processing.
We propose a novel approximation procedure coined Faster-Fast and Free Memory Method ($textF3$M) to address these scaling issues for KMVM.
We show that $textF3$M can compute a full KMVM for a billion points emphin under one minute on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods.
- Score: 9.106412307976067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel matrix vector multiplication (KMVM) is a ubiquitous operation in
machine learning and scientific computing, spanning from the kernel literature
to signal processing. As kernel matrix vector multiplication tends to scale
quadratically in both memory and time, applications are often limited by these
computational scaling constraints. We propose a novel approximation procedure
coined Faster-Fast and Free Memory Method ($\text{F}^3$M) to address these
scaling issues for KMVM. Extensive experiments demonstrate that $\text{F}^3$M
has empirical \emph{linear time and memory} complexity with a relative error of
order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under
one minute} on a high-end GPU, leading to a significant speed-up in comparison
to existing CPU methods. We further demonstrate the utility of our procedure by
applying it as a drop-in for the state-of-the-art GPU-based linear solver
FALKON, \emph{improving speed 3-5 times} at the cost of $<$1\% drop in
accuracy.
Related papers
- vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives [0.5735035463793009]
We present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis.
We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated.
We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets.
arXiv Detail & Related papers (2024-04-26T11:50:16Z) - Snacks: a fast large-scale kernel SVM solver [0.8602553195689513]
Snacks is a new large-scale solver for Kernel Support Vector Machines.
Snacks relies on a Nystr"om approximation of the kernel matrix and an accelerated variant of the subgradient method.
arXiv Detail & Related papers (2023-04-17T04:19:20Z) - Sub-quadratic Algorithms for Kernel Matrices via Kernel Density
Estimation [24.166833799353476]
We develop efficient reductions from $textitweighted edge sampling$ on kernel graphs, $textitsimulating random walks$ on kernel graphs, and $textitweighted sampling$ on matrices to Kernel Density Estimation.
Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest.
arXiv Detail & Related papers (2022-12-01T16:42:56Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - Fast Sketching of Polynomial Kernels of Polynomial Degree [61.83993156683605]
kernel is especially important as other kernels can often be approximated by the kernel via a Taylor series expansion.
Recent techniques in sketching reduce the dependence in the running time on the degree oblivious $q$ of the kernel.
We give a new sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term.
arXiv Detail & Related papers (2021-08-21T02:14:55Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - GPU-Accelerated Primal Learning for Extremely Fast Large-Scale
Classification [10.66048003460524]
One of the most efficient methods to solve L2-regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON.
We show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced.
arXiv Detail & Related papers (2020-08-08T03:40:27Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.