Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns
- URL: http://arxiv.org/abs/2109.02644v1
- Date: Mon, 6 Sep 2021 14:21:43 GMT
- Title: Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns
- Authors: Cosme Louart and Romain Couillet
- Abstract summary: It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
- Score: 50.053491972003656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a random matrix $X= (x_1,\ldots, x_n)\in \mathcal M_{p,n}$ with
independent columns and satisfying concentration of measure hypotheses and a
parameter $z$ whose distance to the spectrum of $\frac{1}{n} XX^T$ should not
depend on $p,n$, it was previously shown that the functionals
$\text{tr}(AR(z))$, for $R(z) = (\frac{1}{n}XX^T- zI_p)^{-1}$ and $A\in
\mathcal M_{p}$ deterministic, have a standard deviation of order $O(\|A\|_* /
\sqrt n)$. Here, we show that $\|\mathbb E[R(z)] - \tilde R(z)\|_F \leq
O(1/\sqrt n)$, where $\tilde R(z)$ is a deterministic matrix depending only on
$z$ and on the means and covariances of the column vectors $x_1,\ldots, x_n$
(that do not have to be identically distributed). This estimation is key to
providing accurate fluctuation rates of functionals of $X$ of interest (mostly
related to its spectral properties) and is proved thanks to the introduction of
a semi-metric $d_s$ defined on the set $\mathcal D_n(\mathbb H)$ of diagonal
matrices with complex entries and positive imaginary part and satisfying, for
all $D,D' \in \mathcal D_n(\mathbb H)$: $d_s(D,D') = \max_{i\in[n]} |D_i -
D_i'|/ (\Im(D_i) \Im(D_i'))^{1/2}$. Possibly most importantly, the underlying
concentration of measure assumption on the columns of $X$ finds an extremely
natural ground for application in modern statistical machine learning
algorithms where non-linear Lipschitz mappings and high number of classes form
the base ingredients.
Related papers
- The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - Universality for the global spectrum of random inner-product kernel
matrices in the polynomial regime [12.221087476416056]
In this paper, we show that this phenomenon is universal, holding as soon as $X$ has i.i.d. entries with all finite moments.
In the case of non-integer $ell$, the Marvcenko-Pastur term disappears.
arXiv Detail & Related papers (2023-10-27T17:15:55Z) - Learning linear dynamical systems under convex constraints [4.4351901934764975]
We consider the problem of identification of linear dynamical systems from $T$ samples of a single trajectory.
$A*$ can be reliably estimated for values $T$ smaller than what is needed for unconstrained setting.
arXiv Detail & Related papers (2023-03-27T11:49:40Z) - A General Algorithm for Solving Rank-one Matrix Sensing [15.543065204102714]
The goal of matrix sensing is to recover a matrix $A_star in mathbbRn times n$, based on a sequence of measurements.
In this paper, we relax that rank-$k$ assumption and solve a much more general matrix sensing problem.
arXiv Detail & Related papers (2023-03-22T04:07:26Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Two-way kernel matrix puncturing: towards resource-efficient PCA and
spectral clustering [43.50783459690612]
The method consists in randomly "puncturing" both the data matrix $XinmathbbCptimes n$ and its corresponding kernel (Gram) matrix $K$ through Bernoulli masks.
We empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains.
arXiv Detail & Related papers (2021-02-24T14:01:58Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - The Average-Case Time Complexity of Certifying the Restricted Isometry
Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors.
We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.