Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication
Time
- URL: http://arxiv.org/abs/2006.13312v1
- Date: Tue, 23 Jun 2020 20:21:27 GMT
- Title: Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication
Time
- Authors: Jerry Li, Guanghao Ye
- Abstract summary: We show an algorithm that runs in time $widetildeO(T(N, d) log kappa / mathrmpoly (varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d times N$ matrix by its transpose.
Our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors.
- Score: 14.990725929840892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust covariance estimation is the following, well-studied problem in high
dimensional statistics: given $N$ samples from a $d$-dimensional Gaussian
$\mathcal{N}(\boldsymbol{0}, \Sigma)$, but where an $\varepsilon$-fraction of
the samples have been arbitrarily corrupted, output $\widehat{\Sigma}$
minimizing the total variation distance between $\mathcal{N}(\boldsymbol{0},
\Sigma)$ and $\mathcal{N}(\boldsymbol{0}, \widehat{\Sigma})$. This corresponds
to learning $\Sigma$ in a natural affine-invariant variant of the Frobenius
norm known as the \emph{Mahalanobis norm}. Previous work of Cheng et al
demonstrated an algorithm that, given $N = \Omega (d^2 / \varepsilon^2)$
samples, achieved a near-optimal error of $O(\varepsilon \log 1 /
\varepsilon)$, and moreover, their algorithm ran in time $\widetilde{O}(T(N, d)
\log \kappa / \mathrm{poly} (\varepsilon))$, where $T(N, d)$ is the time it
takes to multiply a $d \times N$ matrix by its transpose, and $\kappa$ is the
condition number of $\Sigma$. When $\varepsilon$ is relatively small, their
polynomial dependence on $1/\varepsilon$ in the runtime is prohibitively large.
In this paper, we demonstrate a novel algorithm that achieves the same
statistical guarantees, but which runs in time $\widetilde{O} (T(N, d) \log
\kappa)$. In particular, our runtime has no dependence on $\varepsilon$. When
$\Sigma$ is reasonably conditioned, our runtime matches that of the fastest
algorithm for covariance estimation without outliers, up to poly-logarithmic
factors, showing that we can get robustness essentially "for free."
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians [7.04316974339151]
We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set.
We develop tools that may be of independent interest, including a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples.
arXiv Detail & Related papers (2024-10-02T15:21:07Z) - Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting.
We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin.
Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians [3.5788754401889014]
In the strong $varepsilon$-contamination model we assume that the adversary replaced an $varepsilon$ fraction of vectors in the original Gaussian sample by any other vectors.
We construct an estimator $widehat Sigma of the cofrac matrix $Sigma that satisfies, with probability at least $1 - delta.
arXiv Detail & Related papers (2023-01-21T23:28:55Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.