The Sample Complexity of Robust Covariance Testing
- URL: http://arxiv.org/abs/2012.15802v1
- Date: Thu, 31 Dec 2020 18:24:41 GMT
- Title: The Sample Complexity of Robust Covariance Testing
- Authors: Ilias Diakonikolas and Daniel M. Kane
- Abstract summary: We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
- Score: 56.98280399449707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of testing the covariance matrix of a high-dimensional
Gaussian in a robust setting, where the input distribution has been corrupted
in Huber's contamination model. Specifically, we are given i.i.d. samples from
a distribution of the form $Z = (1-\epsilon) X + \epsilon B$, where $X$ is a
zero-mean and unknown covariance Gaussian $\mathcal{N}(0, \Sigma)$, $B$ is a
fixed but unknown noise distribution, and $\epsilon>0$ is an arbitrarily small
constant representing the proportion of contamination. We want to distinguish
between the cases that $\Sigma$ is the identity matrix versus $\gamma$-far from
the identity in Frobenius norm.
In the absence of contamination, prior work gave a simple tester for this
hypothesis testing task that uses $O(d)$ samples. Moreover, this sample upper
bound was shown to be best possible, within constant factors. Our main result
is that the sample complexity of covariance testing dramatically increases in
the contaminated setting. In particular, we prove a sample complexity lower
bound of $\Omega(d^2)$ for $\epsilon$ an arbitrarily small constant and $\gamma
= 1/2$. This lower bound is best possible, as $O(d^2)$ samples suffice to even
robustly {\em learn} the covariance. The conceptual implication of our result
is that, for the natural setting we consider, robust hypothesis testing is at
least as hard as robust estimation.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset.
Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z) - The Sample Complexity of Simple Binary Hypothesis Testing [7.127829790714167]
The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d. samples required to distinguish between two distributions $p$ and $q$ in either setting.
This problem has only been studied when $alpha = beta$ (prior-free) or $alpha = 1/2$ (Bayesian)
arXiv Detail & Related papers (2024-03-25T17:42:32Z) - Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance
Estimation for Subgaussian Distributions [8.40077201352607]
We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation.
Our algorithm produces $tildemu$ such that $|mu|_Sigma leq alpha$ as long as $n gtrsim tfrac d alpha2 + tfracd sqrtlog 1/deltaalpha varepsilon+fracdlog 1/deltavarepsilon$.
arXiv Detail & Related papers (2023-01-28T16:57:46Z) - Gaussian Mean Testing Made Simple [46.03021473600576]
Given i.i.d. samples from a distribution $p$ on $mathbbRd$, the task is to distinguish, with high probability, between the following cases.
We give an extremely simple algorithm for Gaussian mean testing with a one-page analysis.
arXiv Detail & Related papers (2022-10-25T01:59:13Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.