List-Decodable Sparse Mean Estimation
- URL: http://arxiv.org/abs/2205.14337v1
- Date: Sat, 28 May 2022 05:13:45 GMT
- Title: List-Decodable Sparse Mean Estimation
- Authors: Shiwei Zeng and Jie Shen
- Abstract summary: A surge of recent research interest has been focusing on the list-decodable setting where $alpha in (0, frac12]$, and the goal is to output a finite number of estimates among which at least one approximates the target mean.
In this paper, we consider that the underlying target is the mean distribution $k$ and the first contribution is the first sample $Obig(mathrmpoly(k, log dbig)$, i.e. poly-logarithmic in the dimension dimension.
- Score: 7.594050968868919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust mean estimation is one of the most important problems in statistics:
given a set of samples $\{x_1, \dots, x_n\} \subset \mathbb{R}^d$ where an
$\alpha$ fraction are drawn from some distribution $D$ and the rest are
adversarially corrupted, it aims to estimate the mean of $D$. A surge of recent
research interest has been focusing on the list-decodable setting where $\alpha
\in (0, \frac12]$, and the goal is to output a finite number of estimates among
which at least one approximates the target mean. In this paper, we consider
that the underlying distribution is Gaussian and the target mean is $k$-sparse.
Our main contribution is the first polynomial-time algorithm that enjoys sample
complexity $O\big(\mathrm{poly}(k, \log d)\big)$, i.e. poly-logarithmic in the
dimension. One of the main algorithmic ingredients is using low-degree sparse
polynomials to filter outliers, which may be of independent interest.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset.
Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z) - Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering [42.526664955704746]
We develop a novel, conceptually simpler technique for list-decodable sparse mean estimation.
In particular, for distributions with "certifiably bounded" $t-th moments in $k$-sparse directions, our algorithm achieves error of $(1/alpha)O (1/t)$ with sample complexity $m = (klog(n))O(t)/alpha(mnt)$.
For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Theta (sqrtlog
arXiv Detail & Related papers (2022-06-10T17:38:18Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - List-Decodable Mean Estimation via Iterative Multi-Filtering [44.805549762166926]
We are given a set $T$ of points in $mathbbRd$ with the promise that an unknown $alpha$-fraction of points in $T$ are drawn from an unknown mean and bounded covariance distribution $D$.
We output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$.
In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error.
arXiv Detail & Related papers (2020-06-18T17:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.