Out-of-sample error estimate for robust M-estimators with convex penalty
- URL: http://arxiv.org/abs/2008.11840v5
- Date: Thu, 30 Mar 2023 17:41:03 GMT
- Title: Out-of-sample error estimate for robust M-estimators with convex penalty
- Authors: Pierre C Bellec
- Abstract summary: A generic out-of-sample error estimate is proposed for robust $M$-estimators regularized with a convex penalty.
General differentiable loss functions $psi$ are allowed provided that $psi=rho'$ is 1-Lipschitz.
- Score: 5.33024001730262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A generic out-of-sample error estimate is proposed for robust $M$-estimators
regularized with a convex penalty in high-dimensional linear regression where
$(X,y)$ is observed and $p,n$ are of the same order. If $\psi$ is the
derivative of the robust data-fitting loss $\rho$, the estimate depends on the
observed data only through the quantities $\hat\psi = \psi(y-X\hat\beta)$,
$X^\top \hat\psi$ and the derivatives $(\partial/\partial y) \hat\psi$ and
$(\partial/\partial y) X\hat\beta$ for fixed $X$.
The out-of-sample error estimate enjoys a relative error of order $n^{-1/2}$
in a linear model with Gaussian covariates and independent noise, either
non-asymptotically when $p/n\le \gamma$ or asymptotically in the
high-dimensional asymptotic regime $p/n\to\gamma'\in(0,\infty)$. General
differentiable loss functions $\rho$ are allowed provided that $\psi=\rho'$ is
1-Lipschitz. The validity of the out-of-sample error estimate holds either
under a strong convexity assumption, or for the $\ell_1$-penalized Huber
M-estimator if the number of corrupted observations and sparsity of the true
$\beta$ are bounded from above by $s_*n$ for some small enough constant
$s_*\in(0,1)$ independent of $n,p$.
For the square loss and in the absence of corruption in the response, the
results additionally yield $n^{-1/2}$-consistent estimates of the noise
variance and of the generalization error. This generalizes, to arbitrary convex
penalty, estimates that were previously known for the Lasso.
Related papers
- A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods.
Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - Consistent Estimation for PCA and Sparse Regression with Oblivious
Outliers [13.244654316770815]
We develop machinery to design efficiently computable and consistent estimators.
For sparse regression, we achieve consistency for optimal sample size $ngsim (klog d)/alpha2$.
In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix.
arXiv Detail & Related papers (2021-11-04T15:59:44Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Consistent regression when oblivious outliers overwhelm [8.873449722727026]
Prior to our work, even for Gaussian $X$, no estimator for $beta*$ was known to be consistent in this model.
We show that consistent estimation is possible with nearly linear sample size and inverse-polynomial inlier fraction.
The model studied here also captures heavy-tailed noise distributions that may not even have a first moment.
arXiv Detail & Related papers (2020-09-30T16:21:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.