Consistent Estimation for PCA and Sparse Regression with Oblivious
Outliers
- URL: http://arxiv.org/abs/2111.02966v1
- Date: Thu, 4 Nov 2021 15:59:44 GMT
- Title: Consistent Estimation for PCA and Sparse Regression with Oblivious
Outliers
- Authors: Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser, Gleb Novikov, David
Steurer, Stefan Tiegel
- Abstract summary: We develop machinery to design efficiently computable and consistent estimators.
For sparse regression, we achieve consistency for optimal sample size $ngsim (klog d)/alpha2$.
In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix.
- Score: 13.244654316770815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop machinery to design efficiently computable and consistent
estimators, achieving estimation error approaching zero as the number of
observations grows, when facing an oblivious adversary that may corrupt
responses in all but an $\alpha$ fraction of the samples. As concrete examples,
we investigate two problems: sparse regression and principal component analysis
(PCA). For sparse regression, we achieve consistency for optimal sample size
$n\gtrsim (k\log d)/\alpha^2$ and optimal error rate $O(\sqrt{(k\log d)/(n\cdot
\alpha^2)})$ where $n$ is the number of observations, $d$ is the number of
dimensions and $k$ is the sparsity of the parameter vector, allowing the
fraction of inliers to be inverse-polynomial in the number of samples. Prior to
this work, no estimator was known to be consistent when the fraction of inliers
$\alpha$ is $o(1/\log \log n)$, even for (non-spherical) Gaussian design
matrices. Results holding under weak design assumptions and in the presence of
such general noise have only been shown in dense setting (i.e., general linear
regression) very recently by d'Orsi et al. [dNS21]. In the context of PCA, we
attain optimal error guarantees under broad spikiness assumptions on the
parameter matrix (usually used in matrix completion). Previous works could
obtain non-trivial guarantees only under the assumptions that the measurement
noise corresponding to the inliers is polynomially small in $n$ (e.g., Gaussian
with variance $1/n^2$).
To devise our estimators, we equip the Huber loss with non-smooth
regularizers such as the $\ell_1$ norm or the nuclear norm, and extend d'Orsi
et al.'s approach [dNS21] in a novel way to analyze the loss function. Our
machinery appears to be easily applicable to a wide range of estimation
problems.
Related papers
- Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination [42.526664955704746]
We study sparse estimation tasks in Huber's contamination model with a focus on mean estimation, PCA, and linear regression.
For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees.
At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.
arXiv Detail & Related papers (2024-03-15T15:51:27Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples.
Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Outlier-robust sparse/low-rank least-squares regression and robust
matrix completion [1.0878040851637998]
We study high-dimensional least-squares regression within a subgaussian statistical learning framework with heterogeneous noise.
We also present a novel theory of trace-regression with matrix decomposition based on a new application of the product process.
arXiv Detail & Related papers (2020-12-12T07:42:47Z) - Computationally and Statistically Efficient Truncated Regression [36.3677715543994]
We provide a computationally and statistically efficient estimator for the classical problem of truncated linear regression.
Our estimator uses Projected Descent Gradient (PSGD) without replacement on the negative log-likelihood of the truncated sample.
As a corollary, we show that SGD learns the parameters of single-layer neural networks with noisy activation functions.
arXiv Detail & Related papers (2020-10-22T19:31:30Z) - Consistent regression when oblivious outliers overwhelm [8.873449722727026]
Prior to our work, even for Gaussian $X$, no estimator for $beta*$ was known to be consistent in this model.
We show that consistent estimation is possible with nearly linear sample size and inverse-polynomial inlier fraction.
The model studied here also captures heavy-tailed noise distributions that may not even have a first moment.
arXiv Detail & Related papers (2020-09-30T16:21:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.