Related papers: Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation

Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation

URL: http://arxiv.org/abs/2212.00642v1
Date: Thu, 1 Dec 2022 16:42:56 GMT
Title: Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation
Authors: Ainesh Bakshi, Piotr Indyk, Praneeth Kacham, Sandeep Silwal and Samson Zhou
Abstract summary: We develop efficient reductions from $textitweighted edge sampling$ on kernel graphs, $textitsimulating random walks$ on kernel graphs, and $textitweighted sampling$ on matrices to Kernel Density Estimation. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest.
Score: 24.166833799353476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $\Omega(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.

Related papers

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach [0.9453554184019107]
A kernel matrix may be defined as $K_ij = kappa(x_i,y_j)$ where $kappa(x,y)$ is a kernel function. We seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large.
arXiv Detail & Related papers (2022-12-24T07:15:00Z)
Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation [4.061135251278187]
We show that a wide range of kernels gives rise to structured matrices, enabling an exact $mathcalO(n2d)$ matrix-vector multiply for gradient observations and $mathcalO(n2d2)$ for Hessian observations. Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels.
arXiv Detail & Related papers (2022-06-16T17:59:48Z)
Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z)
Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms. We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$. In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering [43.50783459690612]
The method consists in randomly "puncturing" both the data matrix $XinmathbbCptimes n$ and its corresponding kernel (Gram) matrix $K$ through Bernoulli masks. We empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains.
arXiv Detail & Related papers (2021-02-24T14:01:58Z)
Faster Kernel Matrix Algebra via Density Estimation [46.253698241653254]
We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K in mathbbRn times n$ corresponding to $n$ points.
arXiv Detail & Related papers (2021-02-16T18:25:47Z)
Quantum algorithms for spectral sums [50.045011844765185]
We propose new quantum algorithms for estimating spectral sums of positive semi-definite (PSD) matrices. We show how the algorithms and techniques used in this work can be applied to three problems in spectral graph theory.
arXiv Detail & Related papers (2020-11-12T16:29:45Z)
Learning Sparse Graph Laplacian with K Eigenvector Prior via Iterative GLASSO and Projection [58.5350491065936]
We consider a structural assumption on the graph Laplacian matrix $L$. The first $K$ eigenvectors of $L$ are pre-selected, e.g., based on domain-specific criteria. We design an efficient hybrid graphical lasso/projection algorithm to compute the most suitable graph Laplacian matrix $L* in H_u+$ given $barC$.
arXiv Detail & Related papers (2020-10-25T18:12:50Z)
Fourier Sparse Leverage Scores and Approximate Kernel Learning [29.104055676527913]
We prove new explicit upper bounds on the leverage scores of Fourier functions under both the Gaussian and Laplace measures. Our bounds are motivated by two important applications in machine learning.
arXiv Detail & Related papers (2020-06-12T17:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.