Imputation for High-Dimensional Linear Regression
- URL: http://arxiv.org/abs/2001.09180v1
- Date: Fri, 24 Jan 2020 19:54:09 GMT
- Title: Imputation for High-Dimensional Linear Regression
- Authors: Kabir Aladin Chandrasekher, Ahmed El Alaoui, Andrea Montanari
- Abstract summary: We show that LASSO retains the minimax estimation rate in the random setting.
We show that the co-root remains mphmph in this setting.
- Score: 8.841513006680886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study high-dimensional regression with missing entries in the covariates.
A common strategy in practice is to \emph{impute} the missing entries with an
appropriate substitute and then implement a standard statistical procedure
acting as if the covariates were fully observed. Recent literature on this
subject proposes instead to design a specific, often complicated or non-convex,
algorithm tailored to the case of missing covariates. We investigate a simpler
approach where we fill-in the missing entries with their conditional mean given
the observed covariates. We show that this imputation scheme coupled with
standard off-the-shelf procedures such as the LASSO and square-root LASSO
retains the minimax estimation rate in the random-design setting where the
covariates are i.i.d.\ sub-Gaussian. We further show that the square-root LASSO
remains \emph{pivotal} in this setting.
It is often the case that the conditional expectation cannot be computed
exactly and must be approximated from data. We study two cases where the
covariates either follow an autoregressive (AR) process, or are jointly
Gaussian with sparse precision matrix. We propose tractable estimators for the
conditional expectation and then perform linear regression via LASSO, and show
similar estimation rates in both cases. We complement our theoretical results
with simulations on synthetic and semi-synthetic examples, illustrating not
only the sharpness of our bounds, but also the broader utility of this strategy
beyond our theoretical assumptions.
Related papers
- Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - On Model Identification and Out-of-Sample Prediction of Principal
Component Regression: Applications to Synthetic Controls [20.96904429337912]
We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design.
We establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates.
arXiv Detail & Related papers (2020-10-27T17:07:36Z) - Robust High Dimensional Expectation Maximization Algorithm via Trimmed
Hard Thresholding [24.184520829631587]
We study the problem of estimating latent variable models with arbitrarily corrupted samples in high dimensional space.
We propose a method called Trimmed (Gradient) Expectation Maximization which adds a trimming gradient step.
We show that the algorithm is corruption-proofing and converges to the (near) optimal statistical rate geometrically.
arXiv Detail & Related papers (2020-10-19T15:00:35Z) - Robust regression with covariate filtering: Heavy tails and adversarial
contamination [6.939768185086755]
We show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators simultaneously computationally and statistically efficient in the stronger contamination model.
We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.
arXiv Detail & Related papers (2020-09-27T22:48:48Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.