Spectral Statistics of the Sample Covariance Matrix for High Dimensional
Linear Gaussians
- URL: http://arxiv.org/abs/2312.05794v1
- Date: Sun, 10 Dec 2023 06:55:37 GMT
- Title: Spectral Statistics of the Sample Covariance Matrix for High Dimensional
Linear Gaussians
- Authors: Muhammad Abdullah Naeem, Miroslav Pajic
- Abstract summary: Performance of ordinary least squares(OLS) method for the emphestimation of high dimensional stable state transition matrix.
OLS estimator incurs a emphphase transition and becomes emphtransient: increasing only worsens estimation error.
- Score: 12.524855369455421
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance of ordinary least squares(OLS) method for the \emph{estimation of
high dimensional stable state transition matrix} $A$(i.e., spectral radius
$\rho(A)<1$) from a single noisy observed trajectory of the linear time
invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system
$X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation}
x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim
N(0,I_{n}), \end{equation}
heavily rely on negative moments of the sample covariance matrix:
$(X_{-}X_{-}^{*})=\sum_{i=0}^{N-1}x_{i}x_{i}^{*}$ and singular values of
$EX_{-}^{*}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots,
w_{N-1}]$. Negative moments requires sharp estimates on all the eigenvalues
$\lambda_{1}\big(X_{-}X_{-}^{*}\big) \geq \ldots \geq
\lambda_{n}\big(X_{-}X_{-}^{*}\big) \geq 0$. Leveraging upon recent results on
spectral theorem for non-Hermitian operators in \cite{naeem2023spectral}, along
with concentration of measure phenomenon and perturbation theory(Gershgorins'
and Cauchys' interlacing theorem) we show that only when $A=A^{*}$, typical
order of $\lambda_{j}\big(X_{-}X_{-}^{*}\big) \in \big[N-n\sqrt{N},
N+n\sqrt{N}\big]$ for all $j \in [n]$. However, in \emph{high dimensions} when
$A$ has only one distinct eigenvalue $\lambda$ with geometric multiplicity of
one, then as soon as eigenvalue leaves \emph{complex half unit disc}, largest
eigenvalue suffers from curse of dimensionality:
$\lambda_{1}\big(X_{-}X_{-}^{*}\big)=\Omega\big( \lfloor\frac{N}{n}\rfloor
e^{\alpha_{\lambda}n} \big)$, while smallest eigenvalue
$\lambda_{n}\big(X_{-}X_{-}^{*}\big) \in (0, N+\sqrt{N}]$. Consequently, OLS
estimator incurs a \emph{phase transition} and becomes \emph{transient:
increasing iteration only worsens estimation error}, all of this happening when
the dynamics are generated from stable systems.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Dimension Independent Disentanglers from Unentanglement and Applications [55.86191108738564]
We construct a dimension-independent k-partite disentangler (like) channel from bipartite unentangled input.
We show that to capture NEXP, it suffices to have unentangled proofs of the form $| psi rangle = sqrta | sqrt1-a | psi_+ rangle where $| psi_+ rangle has non-negative amplitudes.
arXiv Detail & Related papers (2024-02-23T12:22:03Z) - Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting.
We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin.
Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Randomized and Deterministic Attention Sparsification Algorithms for
Over-parameterized Feature Dimension [18.57735939471469]
We consider the sparsification of the attention problem.
For any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence.
arXiv Detail & Related papers (2023-04-10T05:52:38Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - On the speed of uniform convergence in Mercer's theorem [6.028247638616059]
A continuous positive definite kernel $K(mathbf x, mathbf y)$ on a compact set can be represented as $sum_i=1infty lambda_iphi_i(mathbf x)phi_i(mathbf y)$ where $(lambda_i,phi_i)$ are eigenvalue-eigenvector pairs of the corresponding integral operator.
We estimate the speed of this convergence in terms of the decay rate of eigenvalues and demonstrate that for $3m
arXiv Detail & Related papers (2022-05-01T15:07:57Z) - The EM Algorithm is Adaptively-Optimal for Unbalanced Symmetric Gaussian
Mixtures [36.91281862322494]
We show that the EM algorithm adaptively achieves the minimax error rate $tildeOBig(minBigfrac1(1-2delta_*)sqrtfracdn,frac1|theta_*|sqrtfracdn,left(fracdnright)1/4BigBig)$ in no more than $OBig(frac1|theta_*|2Big
arXiv Detail & Related papers (2021-03-29T14:28:17Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.