An Equivalence Principle for the Spectrum of Random Inner-Product Kernel
Matrices with Polynomial Scalings
- URL: http://arxiv.org/abs/2205.06308v2
- Date: Fri, 5 May 2023 21:23:54 GMT
- Title: An Equivalence Principle for the Spectrum of Random Inner-Product Kernel
Matrices with Polynomial Scalings
- Authors: Yue M. Lu and Horng-Tzer Yau
- Abstract summary: This study is motivated by applications in machine learning and statistics.
We establish the weak limit of the empirical distribution of these random matrices in a scaling regime.
Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
- Score: 21.727073594338297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate random matrices whose entries are obtained by applying a
nonlinear kernel function to pairwise inner products between $n$ independent
data vectors, drawn uniformly from the unit sphere in $\mathbb{R}^d$. This
study is motivated by applications in machine learning and statistics, where
these kernel random matrices and their spectral properties play significant
roles. We establish the weak limit of the empirical spectral distribution of
these matrices in a polynomial scaling regime, where $d, n \to \infty$ such
that $n / d^\ell \to \kappa$, for some fixed $\ell \in \mathbb{N}$ and $\kappa
\in (0, \infty)$. Our findings generalize an earlier result by Cheng and
Singer, who examined the same model in the linear scaling regime (with $\ell =
1$).
Our work reveals an equivalence principle: the spectrum of the random kernel
matrix is asymptotically equivalent to that of a simpler matrix model,
constructed as a linear combination of a (shifted) Wishart matrix and an
independent matrix sampled from the Gaussian orthogonal ensemble. The aspect
ratio of the Wishart matrix and the coefficients of the linear combination are
determined by $\ell$ and the expansion of the kernel function in the orthogonal
Hermite polynomial basis. Consequently, the limiting spectrum of the random
kernel matrix can be characterized as the free additive convolution between a
Marchenko-Pastur law and a semicircle law. We also extend our results to cases
with data vectors sampled from isotropic Gaussian distributions instead of
spherical distributions.
Related papers
- Universality of kernel random matrices and kernel regression in the quadratic regime [18.51014786894174]
In this work, we extend the study of kernel kernel regression to the quadratic regime.
We establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix.
We characterize the precise training and generalization errors for KRR in the quadratic regime when $n/d2$ converges to a nonzero constant.
arXiv Detail & Related papers (2024-08-02T07:29:49Z) - Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - Universality for the global spectrum of random inner-product kernel
matrices in the polynomial regime [12.221087476416056]
In this paper, we show that this phenomenon is universal, holding as soon as $X$ has i.i.d. entries with all finite moments.
In the case of non-integer $ell$, the Marvcenko-Pastur term disappears.
arXiv Detail & Related papers (2023-10-27T17:15:55Z) - Quantitative deterministic equivalent of sample covariance matrices with
a general dependence structure [0.0]
We prove quantitative bounds involving both the dimensions and the spectral parameter, in particular allowing it to get closer to the real positive semi-line.
As applications, we obtain a new bound for the convergence in Kolmogorov distance of the empirical spectral distributions of these general models.
arXiv Detail & Related papers (2022-11-23T15:50:31Z) - Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z) - Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural
Networks [0.0]
We show that the largest eigenvalue has the same limit (in probability) as that of some well-known linear random matrix ensembles.
This may be of interest for applications to machine learning.
arXiv Detail & Related papers (2022-01-13T00:48:20Z) - Eigenvalue Distribution of Large Random Matrices Arising in Deep Neural
Networks: Orthogonal Case [1.6244541005112747]
The paper deals with the distribution of singular values of the input-output Jacobian of deep untrained neural networks in the limit of their infinite width.
It was claimed that in these cases the singular value distribution of the Jacobian in the limit of infinite width coincides with that of the analog of the Jacobian with special random but weight independent diagonal matrices.
arXiv Detail & Related papers (2022-01-12T16:33:47Z) - When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.
We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$.
We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.