A Concentration of Measure and Random Matrix Approach to Large
Dimensional Robust Statistics
- URL: http://arxiv.org/abs/2006.09728v2
- Date: Sat, 9 Apr 2022 03:10:21 GMT
- Title: A Concentration of Measure and Random Matrix Approach to Large
Dimensional Robust Statistics
- Authors: Cosme Louart and Romain Couillet
- Abstract summary: This article studies the emphrobust covariance matrix estimation of a data collection $X = (x_1,ldots,x_n)$ with $x_i = sqrt tau_i z_i + m$.
We exploit this semi-metric along with concentration of measure arguments to prove the existence and uniqueness of the robust estimator as well as evaluate its limiting spectral distribution.
- Score: 45.24358490877106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article studies the \emph{robust covariance matrix estimation} of a data
collection $X = (x_1,\ldots,x_n)$ with $x_i = \sqrt \tau_i z_i + m$, where $z_i
\in \mathbb R^p$ is a \textit{concentrated vector} (e.g., an elliptical random
vector), $m\in \mathbb R^p$ a deterministic signal and $\tau_i\in \mathbb R$ a
scalar perturbation of possibly large amplitude, under the assumption where
both $n$ and $p$ are large. This estimator is defined as the fixed point of a
function which we show is contracting for a so-called \textit{stable
semi-metric}. We exploit this semi-metric along with concentration of measure
arguments to prove the existence and uniqueness of the robust estimator as well
as evaluate its limiting spectral distribution.
Related papers
- Near-Optimal and Tractable Estimation under Shift-Invariance [0.21756081703275998]
Class of all such signals is but extremely rich: it contains all exponential oscillations over $mathbbCn$ with total degree $s$.
We show that the statistical complexity of this class, as measured by the radius squared minimax frequencies of the $(delta)$-confidence $ell$-ball, is nearly the same as for the class of $s$-sparse signals, namely $Oleft(slog(en) + log(delta-1)right) cdot log(en/s)
arXiv Detail & Related papers (2024-11-05T18:11:23Z) - Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Beyond Independent Measurements: General Compressed Sensing with GNN
Application [4.924126492174801]
We consider the problem of recovering a structured signal $mathbfx in mathbbRn$ from noisy cone observations.
We show that the effective rank of $mathbfB$ may be used as a surrogate for the number of measurements.
arXiv Detail & Related papers (2021-10-30T20:35:56Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.