Data-Driven Linear Complexity Low-Rank Approximation of General Kernel
Matrices: A Geometric Approach
- URL: http://arxiv.org/abs/2212.12674v2
- Date: Wed, 28 Jun 2023 19:03:48 GMT
- Title: Data-Driven Linear Complexity Low-Rank Approximation of General Kernel
Matrices: A Geometric Approach
- Authors: Difeng Cai, Edmond Chow, Yuanzhe Xi
- Abstract summary: A kernel matrix may be defined as $K_ij = kappa(x_i,y_j)$ where $kappa(x,y)$ is a kernel function.
We seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large.
- Score: 0.9453554184019107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A general, {\em rectangular} kernel matrix may be defined as $K_{ij} =
\kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where
$X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this
paper, we seek a low-rank approximation to a kernel matrix where the sets of
points $X$ and $Y$ are large and are arbitrarily distributed, such as away from
each other, ``intermingled'', identical, etc. Such rectangular kernel matrices
may arise, for example, in Gaussian process regression where $X$ corresponds to
the training data and $Y$ corresponds to the test data. In this case, the
points are often high-dimensional. Since the point sets are large, we must
exploit the fact that the matrix arises from a kernel function, and avoid
forming the matrix, and thus ruling out most algebraic techniques. In
particular, we seek methods that can scale linearly or nearly linear with
respect to the size of data for a fixed approximation rank. The main idea in
this paper is to {\em geometrically} select appropriate subsets of points to
construct a low rank approximation. An analysis in this paper guides how this
selection should be performed.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - A General Algorithm for Solving Rank-one Matrix Sensing [15.543065204102714]
The goal of matrix sensing is to recover a matrix $A_star in mathbbRn times n$, based on a sequence of measurements.
In this paper, we relax that rank-$k$ assumption and solve a much more general matrix sensing problem.
arXiv Detail & Related papers (2023-03-22T04:07:26Z) - Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis.
We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$.
Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z) - Scalable First-Order Bayesian Optimization via Structured Automatic
Differentiation [4.061135251278187]
We show that a wide range of kernels gives rise to structured matrices, enabling an exact $mathcalO(n2d)$ matrix-vector multiply for gradient observations and $mathcalO(n2d2)$ for Hessian observations.
Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels.
arXiv Detail & Related papers (2022-06-16T17:59:48Z) - Spectrum of inner-product kernel matrices in the polynomial regime and
multiple descent phenomenon in kernel ridge regression [3.997680012976965]
kernel matrix is well approximated by its degree-$ell$ approximation.
We show that the spectrum of the matrix converges in distribution to a Marchenko-Pastur law.
arXiv Detail & Related papers (2022-04-21T22:20:52Z) - Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Faster Kernel Matrix Algebra via Density Estimation [46.253698241653254]
We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K in mathbbRn times n$ corresponding to $n$ points.
arXiv Detail & Related papers (2021-02-16T18:25:47Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.