Related papers: The Adaptive $\tau$-Lasso: Robustness and Oracle Properties

The Adaptive $\tau$-Lasso: Robustness and Oracle Properties

URL: http://arxiv.org/abs/2304.09310v2
Date: Thu, 19 Oct 2023 16:18:50 GMT
Title: The Adaptive $\tau$-Lasso: Robustness and Oracle Properties
Authors: Emadaldin Mozafari-Majd, Visa Koivunen
Abstract summary: This paper introduces a new regularized version of the robust $tau$-regression estimator for analyzing high-dimensional datasets. The resulting estimator, termed adaptive $tau$-Lasso, is robust to outliers and high-leverage points. In the face of outliers and high-leverage points, the adaptive $tau$-Lasso and $tau$-Lasso estimators achieve the best performance or close-to-best performance.
Score: 14.250233515645782
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper introduces a new regularized version of the robust $\tau$-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive $\ell_1$-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive $\ell_1$-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors $p$, we show that the adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness via the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of $\tau$-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulation experiments. In the face of outliers and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators achieve the best performance or close-to-best performance in terms of prediction and variable selection accuracy compared to other competing regularized estimators for all scenarios considered in this study. Therefore, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators can be effectively employed for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points.

Related papers

Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding [52.1068936424622]
We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention. In settings without selection bias or confounding, $E[Y|do(X)] = E[Y|X]$, which can be estimated using standard regression methods. We propose a framework that incorporates both selection bias and confounding.
arXiv Detail & Related papers (2025-03-26T13:43:37Z)
Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables. We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure. We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs. We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint. We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z)
Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution [0.0]
We introduce an iterative strategy to correct bias effectively when the dimension $p$ is less than the sample size $n$. For $p>n$, our method optimally mitigates the bias such that any remaining bias in the proposed de-biased estimator is unattainable. Our method offers a transformative solution to the bias challenge in ridge regression inferences across various disciplines.
arXiv Detail & Related papers (2024-05-01T10:05:19Z)
Asymptotic Characterisation of Robust Empirical Risk Minimisation Performance in the Presence of Outliers [18.455890316339595]
We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $alpha=n/d$, and study a data model that includes outliers. We provide exacts for the performances of the empirical risk minimisation (ERM) using $ell$-regularised $ell$, $ell_$, and Huber losses.
arXiv Detail & Related papers (2023-05-30T12:18:39Z)
Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. We propose and study (penalized) robust expectile regression (retire) We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z)
High-dimensional regression with potential prior information on variable importance [0.0]
We propose a simple scheme involving fitting a sequence of models indicated by the ordering. We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression. We describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models.
arXiv Detail & Related papers (2021-09-23T10:34:37Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Robust Variable Selection and Estimation Via Adaptive Elastic Net S-Estimators for Linear Regression [0.0]
We propose a new robust regularized estimator for simultaneous variable selection and coefficient estimation. adaptive PENSE possesses the oracle property without prior knowledge of the scale of the residuals. Numerical studies on simulated and real data sets highlight superior finite-sample performance in a vast range of settings.
arXiv Detail & Related papers (2021-07-07T16:04:08Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Support estimation in high-dimensional heteroscedastic mean regression [2.07180164747172]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors. We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem. For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z)
Computationally and Statistically Efficient Truncated Regression [36.3677715543994]
We provide a computationally and statistically efficient estimator for the classical problem of truncated linear regression. Our estimator uses Projected Descent Gradient (PSGD) without replacement on the negative log-likelihood of the truncated sample. As a corollary, we show that SGD learns the parameters of single-layer neural networks with noisy activation functions.
arXiv Detail & Related papers (2020-10-22T19:31:30Z)
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.