Lasso with Latents: Efficient Estimation, Covariate Rescaling, and
Computational-Statistical Gaps
- URL: http://arxiv.org/abs/2402.15409v1
- Date: Fri, 23 Feb 2024 16:16:38 GMT
- Title: Lasso with Latents: Efficient Estimation, Covariate Rescaling, and
Computational-Statistical Gaps
- Authors: Jonathan Kelner, Frederic Koehler, Raghu Meka, Dhruv Rohatgi
- Abstract summary: We propose a natural sparse linear regression setting where strong correlations arise from unobserved latent variables.
In this setting, we analyze the problem caused by strong correlations and design a surprisingly simple fix.
The sample complexity of the resulting "rescaled Lasso" algorithm incurs (in the worst case) quadratic dependence on the sparsity of the underlying signal.
- Score: 29.13944209460543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well-known that the statistical performance of Lasso can suffer
significantly when the covariates of interest have strong correlations. In
particular, the prediction error of Lasso becomes much worse than
computationally inefficient alternatives like Best Subset Selection. Due to a
large conjectured computational-statistical tradeoff in the problem of sparse
linear regression, it may be impossible to close this gap in general.
In this work, we propose a natural sparse linear regression setting where
strong correlations between covariates arise from unobserved latent variables.
In this setting, we analyze the problem caused by strong correlations and
design a surprisingly simple fix. While Lasso with standard normalization of
covariates fails, there exists a heterogeneous scaling of the covariates with
which Lasso will suddenly obtain strong provable guarantees for estimation.
Moreover, we design a simple, efficient procedure for computing such a "smart
scaling."
The sample complexity of the resulting "rescaled Lasso" algorithm incurs (in
the worst case) quadratic dependence on the sparsity of the underlying signal.
While this dependence is not information-theoretically necessary, we give
evidence that it is optimal among the class of polynomial-time algorithms, via
the method of low-degree polynomials. This argument reveals a new connection
between sparse linear regression and a special version of sparse PCA with a
near-critical negative spike. The latter problem can be thought of as a
real-valued analogue of learning a sparse parity. Using it, we also establish
the first computational-statistical gap for the closely related problem of
learning a Gaussian Graphical Model.
Related papers
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Feature Adaptation for Sparse Linear Regression [20.923321050404827]
Sparse linear regression is a central problem in high-dimensional statistics.
We provide an algorithm that adapts to tolerate a small number of approximate dependencies.
Our framework fits into a broader framework of feature adaptation for sparse linear regression.
arXiv Detail & Related papers (2023-05-26T12:53:13Z) - Efficient Graph Laplacian Estimation by Proximal Newton [12.05527862797306]
A graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix.
We develop a second-order approach to obtain an efficient solver utilizing several algorithmic features.
arXiv Detail & Related papers (2023-02-13T15:13:22Z) - Scalable Gaussian-process regression and variable selection using
Vecchia approximations [3.4163060063961255]
Vecchia-based mini-batch subsampling provides unbiased gradient estimators.
We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators.
arXiv Detail & Related papers (2022-02-25T21:22:38Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Robust regression with covariate filtering: Heavy tails and adversarial
contamination [6.939768185086755]
We show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators simultaneously computationally and statistically efficient in the stronger contamination model.
We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.
arXiv Detail & Related papers (2020-09-27T22:48:48Z) - Ridge Regression Revisited: Debiasing, Thresholding and Bootstrap [4.142720557665472]
ridge regression may be worth another look since -- after debiasing and thresholding -- it may offer some advantages over the Lasso.
In this paper, we define a debiased and thresholded ridge regression method, and prove a consistency result and a Gaussian approximation theorem.
In addition to estimation, we consider the problem of prediction, and present a novel, hybrid bootstrap algorithm tailored for prediction intervals.
arXiv Detail & Related papers (2020-09-17T05:04:10Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.