Asymptotic Characterisation of Robust Empirical Risk Minimisation
Performance in the Presence of Outliers
- URL: http://arxiv.org/abs/2305.18974v2
- Date: Wed, 27 Sep 2023 09:50:48 GMT
- Title: Asymptotic Characterisation of Robust Empirical Risk Minimisation
Performance in the Presence of Outliers
- Authors: Matteo Vilucchio, Emanuele Troiani, Vittorio Erba, Florent Krzakala
- Abstract summary: We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $alpha=n/d$, and study a data model that includes outliers.
We provide exacts for the performances of the empirical risk minimisation (ERM) using $ell$-regularised $ell$, $ell_$, and Huber losses.
- Score: 18.455890316339595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study robust linear regression in high-dimension, when both the dimension
$d$ and the number of data points $n$ diverge with a fixed ratio $\alpha=n/d$,
and study a data model that includes outliers. We provide exact asymptotics for
the performances of the empirical risk minimisation (ERM) using
$\ell_2$-regularised $\ell_2$, $\ell_1$, and Huber losses, which are the
standard approach to such problems. We focus on two metrics for the
performance: the generalisation error to similar datasets with outliers, and
the estimation error of the original, unpolluted function. Our results are
compared with the information theoretic Bayes-optimal estimation bound. For the
generalization error, we find that optimally-regularised ERM is asymptotically
consistent in the large sample complexity limit if one perform a simple
calibration, and compute the rates of convergence. For the estimation error
however, we show that due to a norm calibration mismatch, the consistency of
the estimator requires an oracle estimate of the optimal norm, or the presence
of a cross-validation set not corrupted by the outliers. We examine in detail
how performance depends on the loss function and on the degree of outlier
corruption in the training set and identify a region of parameters where the
optimal performance of the Huber loss is identical to that of the $\ell_2$
loss, offering insights into the use cases of different loss functions.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks.
We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously.
A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
arXiv Detail & Related papers (2024-06-10T12:25:13Z) - Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Optimal convex $M$-estimation via score matching [6.115859302936817]
We construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal variance in the downstream estimation of the regression coefficients.
Our semiparametric approach targets the best decreasing approximation of the derivative of the derivative of the log-density of the noise distribution.
arXiv Detail & Related papers (2024-03-25T12:23:19Z) - On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z) - The Adaptive $τ$-Lasso: Robustness and Oracle Properties [12.06248959194646]
This paper introduces a new regularized version of the robust $tau$-regression estimator for analyzing high-dimensional datasets.
The resulting estimator, termed adaptive $tau$-Lasso, is robust to outliers and high-leverage points.
In the face of outliers and high-leverage points, the adaptive $tau$-Lasso and $tau$-Lasso estimators achieve the best performance or close-to-best performance.
arXiv Detail & Related papers (2023-04-18T21:34:14Z) - A Huber loss-based super learner with applications to healthcare
expenditures [0.0]
We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to downweight.
We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings.
arXiv Detail & Related papers (2022-05-13T19:57:50Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Robust Algorithms for GMM Estimation: A Finite Sample Viewpoint [30.839245814393724]
A generic method of solving moment conditions is the Generalized Method of Moments (GMM)
We develop a GMM estimator that can tolerate a constant $ell$ recovery guarantee of $O(sqrtepsilon)$.
Our algorithm and assumptions apply to instrumental variables linear and logistic regression.
arXiv Detail & Related papers (2021-10-06T21:06:22Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.