Related papers: Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach

URL: http://arxiv.org/abs/2212.12674v2
Date: Wed, 28 Jun 2023 19:03:48 GMT
Title: Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach
Authors: Difeng Cai, Edmond Chow, Yuanzhe Xi
Abstract summary: A kernel matrix may be defined as $K_ij = kappa(x_i,y_j)$ where $kappa(x,y)$ is a kernel function. We seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large.
Score: 0.9453554184019107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.

Related papers

Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$. In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$. Our main algorithm can be viewed as a randomized block coordinate descent method.
arXiv Detail & Related papers (2023-12-14T12:53:34Z)
A General Algorithm for Solving Rank-one Matrix Sensing [15.543065204102714]
The goal of matrix sensing is to recover a matrix $A_star in mathbbRn times n$, based on a sequence of measurements. In this paper, we relax that rank-$k$ assumption and solve a much more general matrix sensing problem.
arXiv Detail & Related papers (2023-03-22T04:07:26Z)
Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis. We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$. Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z)
Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation [4.061135251278187]
We show that a wide range of kernels gives rise to structured matrices, enabling an exact $mathcalO(n2d)$ matrix-vector multiply for gradient observations and $mathcalO(n2d2)$ for Hessian observations. Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels.
arXiv Detail & Related papers (2022-06-16T17:59:48Z)
Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression [3.997680012976965]
kernel matrix is well approximated by its degree-$ell$ approximation. We show that the spectrum of the matrix converges in distribution to a Marchenko-Pastur law.
arXiv Detail & Related papers (2022-04-21T22:20:52Z)
Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z)
Faster Kernel Matrix Algebra via Density Estimation [46.253698241653254]
We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K in mathbbRn times n$ corresponding to $n$ points.
arXiv Detail & Related papers (2021-02-16T18:25:47Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.