The Adaptive $\tau$-Lasso: Robustness and Oracle Properties
- URL: http://arxiv.org/abs/2304.09310v2
- Date: Thu, 19 Oct 2023 16:18:50 GMT
- Title: The Adaptive $\tau$-Lasso: Robustness and Oracle Properties
- Authors: Emadaldin Mozafari-Majd, Visa Koivunen
- Abstract summary: This paper introduces a new regularized version of the robust $tau$-regression estimator for analyzing high-dimensional datasets.
The resulting estimator, termed adaptive $tau$-Lasso, is robust to outliers and high-leverage points.
In the face of outliers and high-leverage points, the adaptive $tau$-Lasso and $tau$-Lasso estimators achieve the best performance or close-to-best performance.
- Score: 14.250233515645782
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper introduces a new regularized version of the robust
$\tau$-regression estimator for analyzing high-dimensional datasets subject to
gross contamination in the response variables and covariates (explanatory
variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to
outliers and high-leverage points. It also incorporates an adaptive
$\ell_1$-norm penalty term, which enables the selection of relevant variables
and reduces the bias associated with large true regression coefficients. More
specifically, this adaptive $\ell_1$-norm penalty term assigns a weight to each
regression coefficient. For a fixed number of predictors $p$, we show that the
adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection
consistency and asymptotic normality. Asymptotic normality applies only to the
entries of the regression vector corresponding to the true support, assuming
knowledge of the true regression vector support. We characterize its robustness
via the finite-sample breakdown point and the influence function. We carry out
extensive simulations and observe that the class of $\tau$-Lasso estimators
exhibits robustness and reliable performance in both contaminated and
uncontaminated data settings. We also validate our theoretical findings on
robustness properties through simulation experiments. In the face of outliers
and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators
achieve the best performance or close-to-best performance in terms of
prediction and variable selection accuracy compared to other competing
regularized estimators for all scenarios considered in this study. Therefore,
the adaptive $\tau$-Lasso and $\tau$-Lasso estimators can be effectively
employed for a variety of sparse linear regression problems, particularly in
high-dimensional settings and when the data is contaminated by outliers and
high-leverage points.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution [0.0]
We introduce an iterative strategy to correct bias effectively when the dimension $p$ is less than the sample size $n$.
For $p>n$, our method optimally mitigates the bias such that any remaining bias in the proposed de-biased estimator is unattainable.
Our method offers a transformative solution to the bias challenge in ridge regression inferences across various disciplines.
arXiv Detail & Related papers (2024-05-01T10:05:19Z) - Asymptotic Characterisation of Robust Empirical Risk Minimisation
Performance in the Presence of Outliers [18.455890316339595]
We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $alpha=n/d$, and study a data model that includes outliers.
We provide exacts for the performances of the empirical risk minimisation (ERM) using $ell$-regularised $ell$, $ell_$, and Huber losses.
arXiv Detail & Related papers (2023-05-30T12:18:39Z) - Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data.
We propose and study (penalized) robust expectile regression (retire)
We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - High-dimensional regression with potential prior information on variable
importance [0.0]
We propose a simple scheme involving fitting a sequence of models indicated by the ordering.
We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression.
We describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models.
arXiv Detail & Related papers (2021-09-23T10:34:37Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions.
We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Computationally and Statistically Efficient Truncated Regression [36.3677715543994]
We provide a computationally and statistically efficient estimator for the classical problem of truncated linear regression.
Our estimator uses Projected Descent Gradient (PSGD) without replacement on the negative log-likelihood of the truncated sample.
As a corollary, we show that SGD learns the parameters of single-layer neural networks with noisy activation functions.
arXiv Detail & Related papers (2020-10-22T19:31:30Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.