Universality for the global spectrum of random inner-product kernel
matrices in the polynomial regime
- URL: http://arxiv.org/abs/2310.18280v1
- Date: Fri, 27 Oct 2023 17:15:55 GMT
- Title: Universality for the global spectrum of random inner-product kernel
matrices in the polynomial regime
- Authors: Sofiia Dubova, Yue M. Lu, Benjamin McKenna, Horng-Tzer Yau
- Abstract summary: In this paper, we show that this phenomenon is universal, holding as soon as $X$ has i.i.d. entries with all finite moments.
In the case of non-integer $ell$, the Marvcenko-Pastur term disappears.
- Score: 12.221087476416056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider certain large random matrices, called random inner-product kernel
matrices, which are essentially given by a nonlinear function $f$ applied
entrywise to a sample-covariance matrix, $f(X^TX)$, where $X \in \mathbb{R}^{d
\times N}$ is random and normalized in such a way that $f$ typically has
order-one arguments. We work in the polynomial regime, where $N \asymp d^\ell$
for some $\ell > 0$, not just the linear regime where $\ell = 1$. Earlier work
by various authors showed that, when the columns of $X$ are either uniform on
the sphere or standard Gaussian vectors, and when $\ell$ is an integer (the
linear regime $\ell = 1$ is particularly well-studied), the bulk eigenvalues of
such matrices behave in a simple way: They are asymptotically given by the free
convolution of the semicircular and Mar\v{c}enko-Pastur distributions, with
relative weights given by expanding $f$ in the Hermite basis. In this paper, we
show that this phenomenon is universal, holding as soon as $X$ has i.i.d.
entries with all finite moments. In the case of non-integer $\ell$, the
Mar\v{c}enko-Pastur term disappears (its weight in the free convolution
vanishes), and the spectrum is just semicircular.
Related papers
- The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - An Equivalence Principle for the Spectrum of Random Inner-Product Kernel
Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics.
We establish the weak limit of the empirical distribution of these random matrices in a scaling regime.
Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z) - Spectrum of inner-product kernel matrices in the polynomial regime and
multiple descent phenomenon in kernel ridge regression [3.997680012976965]
kernel matrix is well approximated by its degree-$ell$ approximation.
We show that the spectrum of the matrix converges in distribution to a Marchenko-Pastur law.
arXiv Detail & Related papers (2022-04-21T22:20:52Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Algebraic and geometric structures inside the Birkhoff polytope [0.0]
Birkhoff polytope $mathcalB_d$ consists of all bistochastic matrices of order $d$.
We prove that $mathcalL_d$ and $mathcalF_d$ are star-shaped with respect to the flat matrix.
arXiv Detail & Related papers (2021-01-27T09:51:24Z) - Emergent universality in critical quantum spin chains: entanglement
Virasoro algebra [1.9336815376402714]
Entanglement entropy and entanglement spectrum have been widely used to characterize quantum entanglement in extended many-body systems.
We show that the Schmidt vectors $|v_alpharangle$ display an emergent universal structure, corresponding to a realization of the Virasoro algebra of a boundary CFT.
arXiv Detail & Related papers (2020-09-23T21:22:51Z) - Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss [76.02734481158458]
It is known that in the worst case, to obtain a good rank-$k$ approximation to a matrix, one needs an arbitrarily large $nOmega(1)$ number of columns.
We show that under certain minimal and realistic distributional settings, it is possible to obtain a $(k/epsilon)$-approximation with a nearly linear running time and poly$(k/epsilon)+O(klog n)$ columns.
This is the first algorithm of any kind for achieving a $(k/epsilon)$-approximation for entrywise
arXiv Detail & Related papers (2020-04-16T22:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.