Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians
- URL: http://arxiv.org/abs/2301.09024v1
- Date: Sat, 21 Jan 2023 23:28:55 GMT
- Title: Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians
- Authors: Arshak Minasyan and Nikita Zhivotovskiy
- Abstract summary: In the strong $varepsilon$-contamination model we assume that the adversary replaced an $varepsilon$ fraction of vectors in the original Gaussian sample by any other vectors.
We construct an estimator $widehat Sigma of the cofrac matrix $Sigma that satisfies, with probability at least $1 - delta.
- Score: 3.5788754401889014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assume that $X_{1}, \ldots, X_{N}$ is an $\varepsilon$-contaminated sample of
$N$ independent Gaussian vectors in $\mathbb{R}^d$ with mean $\mu$ and
covariance $\Sigma$. In the strong $\varepsilon$-contamination model we assume
that the adversary replaced an $\varepsilon$ fraction of vectors in the
original Gaussian sample by any other vectors. We show that there is an
estimator $\widehat \mu$ of the mean satisfying, with probability at least $1 -
\delta$, a bound of the form \[ \|\widehat{\mu} - \mu\|_2 \le
c\left(\sqrt{\frac{\operatorname{Tr}(\Sigma)}{N}} +
\sqrt{\frac{\|\Sigma\|\log(1/\delta)}{N}} +
\varepsilon\sqrt{\|\Sigma\|}\right), \] where $c > 0$ is an absolute constant
and $\|\Sigma\|$ denotes the operator norm of $\Sigma$. In the same
contaminated Gaussian setup, we construct an estimator $\widehat \Sigma$ of the
covariance matrix $\Sigma$ that satisfies, with probability at least $1 -
\delta$, \[ \left\|\widehat{\Sigma} - \Sigma\right\| \le
c\left(\sqrt{\frac{\|\Sigma\|\operatorname{Tr}(\Sigma)}{N}} +
\|\Sigma\|\sqrt{\frac{\log(1/\delta)}{N}} + \varepsilon\|\Sigma\|\right). \]
Both results are optimal up to multiplicative constant factors. Despite the
recent significant interest in robust statistics, achieving both dimension-free
bounds in the canonical Gaussian case remained open. In fact, several
previously known results were either dimension-dependent and required $\Sigma$
to be close to identity, or had a sub-optimal dependence on the contamination
level $\varepsilon$.
As a part of the analysis, we derive sharp concentration inequalities for
central order statistics of Gaussian, folded normal, and chi-squared
distributions.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Nonasymptotic one-and two-sample tests in high dimension with unknown
covariance structure [0.0]
We consider the problem of testing if $mu is $eta-close to zero, i.e. $|mu| leq eta against $|mu| geq (eta + delta)$.
The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distancedelta$ such that we can control both the Type I and Type II errors at a given level.
arXiv Detail & Related papers (2021-09-01T06:22:53Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication
Time [14.990725929840892]
We show an algorithm that runs in time $widetildeO(T(N, d) log kappa / mathrmpoly (varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d times N$ matrix by its transpose.
Our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors.
arXiv Detail & Related papers (2020-06-23T20:21:27Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - On the robustness of the minimum $\ell_2$ interpolator [2.918940961856197]
We analyse the interpolator with minimal $ell$-norm $hatbeta$ in a general high dimensional linear regression framework.
We prove that, with high probability, the prediction loss of this estimator is bounded from above by $(|beta*|2r_cn(Sigma)vee |xi|2)/n$, where $r_k(Sigma)sum_igeq klambda_i(Sigma)$ are the rests of the
arXiv Detail & Related papers (2020-03-12T15:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.