Consistent regression when oblivious outliers overwhelm
- URL: http://arxiv.org/abs/2009.14774v2
- Date: Tue, 25 May 2021 17:20:11 GMT
- Title: Consistent regression when oblivious outliers overwhelm
- Authors: Tommaso d'Orsi, Gleb Novikov, David Steurer
- Abstract summary: Prior to our work, even for Gaussian $X$, no estimator for $beta*$ was known to be consistent in this model.
We show that consistent estimation is possible with nearly linear sample size and inverse-polynomial inlier fraction.
The model studied here also captures heavy-tailed noise distributions that may not even have a first moment.
- Score: 8.873449722727026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a robust linear regression model $y=X\beta^* + \eta$, where an
adversary oblivious to the design $X\in \mathbb{R}^{n\times d}$ may choose
$\eta$ to corrupt all but an $\alpha$ fraction of the observations $y$ in an
arbitrary way. Prior to our work, even for Gaussian $X$, no estimator for
$\beta^*$ was known to be consistent in this model except for quadratic sample
size $n \gtrsim (d/\alpha)^2$ or for logarithmic inlier fraction $\alpha\ge
1/\log n$. We show that consistent estimation is possible with nearly linear
sample size and inverse-polynomial inlier fraction. Concretely, we show that
the Huber loss estimator is consistent for every sample size $n=
\omega(d/\alpha^2)$ and achieves an error rate of $O(d/\alpha^2n)^{1/2}$. Both
bounds are optimal (up to constant factors). Our results extend to designs far
beyond the Gaussian case and only require the column span of $X$ to not contain
approximately sparse vectors). (similar to the kind of assumption commonly made
about the kernel space for compressed sensing). We provide two technically
similar proofs. One proof is phrased in terms of strong convexity, extending
work of [Tsakonas et al.'14], and particularly short. The other proof
highlights a connection between the Huber loss estimator and high-dimensional
median computations. In the special case of Gaussian designs, this connection
leads us to a strikingly simple algorithm based on computing coordinate-wise
medians that achieves optimal guarantees in nearly-linear time, and that can
exploit sparsity of $\beta^*$. The model studied here also captures
heavy-tailed noise distributions that may not even have a first moment.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Better and Simpler Lower Bounds for Differentially Private Statistical
Estimation [7.693388437377614]
We prove that for any $alpha le O(1)$, estimating the covariance of a Gaussian up to spectral error $alpha$ requires $tildeOmegaleft(fracd3/2alpha varepsilon + fracdalpha2right)$ samples.
Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $tildeOmegaleft(fracdalphak/(k-1) varepsilon +
arXiv Detail & Related papers (2023-10-10T04:02:43Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Robust Testing in High-Dimensional Sparse Models [0.0]
We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models.
We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Omegaleft(min(slog d,1/gamma4)right) samples.
arXiv Detail & Related papers (2022-05-16T07:47:22Z) - Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods.
Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z) - Consistent Estimation for PCA and Sparse Regression with Oblivious
Outliers [13.244654316770815]
We develop machinery to design efficiently computable and consistent estimators.
For sparse regression, we achieve consistency for optimal sample size $ngsim (klog d)/alpha2$.
In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix.
arXiv Detail & Related papers (2021-11-04T15:59:44Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.