Robust Scatter Matrix Estimation for Elliptical Distributions in Polynomial Time
- URL: http://arxiv.org/abs/2502.06564v1
- Date: Mon, 10 Feb 2025 15:31:57 GMT
- Title: Robust Scatter Matrix Estimation for Elliptical Distributions in Polynomial Time
- Authors: Gleb Novikov,
- Abstract summary: We design time algorithms that achieve dimension-independent error in Frobenius norm.
Under a mild assumption on the eigenvalues of the scatter matrix $Sigma$, for every $t in mathbbN$, we design an estimator that, given $n = dO(t)$ samples, in time $nO(t)$ finds $hatSigma.
- Score: 2.311583680973075
- License:
- Abstract: We study the problem of computationally efficient robust estimation of scatter matrices of elliptical distributions under the strong contamination model. We design polynomial time algorithms that achieve dimension-independent error in Frobenius norm. Our first result is a sequence of efficient algorithms that approaches nearly optimal error. Specifically, under a mild assumption on the eigenvalues of the scatter matrix $\Sigma$, for every $t \in \mathbb{N}$, we design an estimator that, given $n = d^{O(t)}$ samples, in time $n^{O(t)}$ finds $\hat{\Sigma}$ such that $ \Vert{\Sigma^{-1/2}\, ({\hat{\Sigma} - \Sigma})\, \Sigma^{-1/2}}\Vert_{\text{F}} \le O(t \cdot \varepsilon^{1-\frac{1}{t}})$, where $\varepsilon$ is the fraction of corruption. We do not require any assumptions on the moments of the distribution, while all previously known computationally efficient algorithms for robust covariance/scatter estimation with dimension-independent error rely on strong assumptions on the moments, such as sub-Gaussianity or (certifiable) hypercontractivity. Furthermore, under a stronger assumption on the eigenvalues of $\Sigma$ (that, in particular, is satisfied by all matrices with constant condition number), we provide a fast (sub-quadratic in the input size) algorithm that, given nearly optimal number of samples $n = \tilde{O}(d^2/\varepsilon)$, in time $\tilde{O}({nd^2 poly(1/\varepsilon)})$ finds $\hat{\Sigma}$ such that $\Vert\hat{\Sigma} - \Sigma\Vert_{\text{F}} \le O(\Vert{\Sigma}\Vert \cdot \sqrt{\varepsilon})$. Our approach is based on robust covariance estimation of the spatial sign (the projection onto the sphere of radius $\sqrt{d}$) of elliptical distributions.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation [6.853165736531941]
We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers.
Our main contribution is an algorithm for robust sparse mean estimation which runs in emphsubquadratic time using $mathrmpoly(k,log d,1/epsilon)$ samples.
arXiv Detail & Related papers (2024-03-07T18:23:51Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians [3.5788754401889014]
In the strong $varepsilon$-contamination model we assume that the adversary replaced an $varepsilon$ fraction of vectors in the original Gaussian sample by any other vectors.
We construct an estimator $widehat Sigma of the cofrac matrix $Sigma that satisfies, with probability at least $1 - delta.
arXiv Detail & Related papers (2023-01-21T23:28:55Z) - A Fast Algorithm for Adaptive Private Mean Estimation [5.090363690988394]
We design an $(varepsilon, delta)$-differentially private algorithm that is adaptive to $Sigma$.
The estimator achieves optimal rates of convergence with respect to the induced Mahalanobis norm $||cdot||_Sigma$.
arXiv Detail & Related papers (2023-01-17T18:44:41Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication
Time [14.990725929840892]
We show an algorithm that runs in time $widetildeO(T(N, d) log kappa / mathrmpoly (varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d times N$ matrix by its transpose.
Our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors.
arXiv Detail & Related papers (2020-06-23T20:21:27Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.