Related papers: High-dimensional regression with potential prior information on variable importance

High-dimensional regression with potential prior information on variable importance

URL: http://arxiv.org/abs/2109.11281v1
Date: Thu, 23 Sep 2021 10:34:37 GMT
Title: High-dimensional regression with potential prior information on variable importance
Authors: Benjamin G. Stokell, Rajen D. Shah
Abstract summary: We propose a simple scheme involving fitting a sequence of models indicated by the ordering. We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression. We describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There are a variety of settings where vague prior information may be available on the importance of predictors in high-dimensional regression settings. Examples include ordering on the variables offered by their empirical variances (which is typically discarded through standardisation), the lag of predictors when fitting autoregressive models in time series settings, or the level of missingness of the variables. Whilst such orderings may not match the true importance of variables, we argue that there is little to be lost, and potentially much to be gained, by using them. We propose a simple scheme involving fitting a sequence of models indicated by the ordering. We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression, and describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models. We propose to select a final estimator by cross-validation and provide a general result on the quality of the best performing estimator on a test set selected from among a number $M$ of competing estimators in a high-dimensional linear regression setting. Our result requires no sparsity assumptions and shows that only a $\log M$ price is incurred compared to the unknown best estimator. We demonstrate the effectiveness of our approach when applied to missing or corrupted data, and time series settings. An R package is available on github.

Related papers

Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously. In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z)
The Adaptive $τ$-Lasso: Robustness and Oracle Properties [12.06248959194646]
This paper introduces a new regularized version of the robust $tau$-regression estimator for analyzing high-dimensional datasets. The resulting estimator, termed adaptive $tau$-Lasso, is robust to outliers and high-leverage points. In the face of outliers and high-leverage points, the adaptive $tau$-Lasso and $tau$-Lasso estimators achieve the best performance or close-to-best performance.
arXiv Detail & Related papers (2023-04-18T21:34:14Z)
ecpc: An R-package for generic co-data models for high-dimensional prediction [0.0]
R-package ecpc originally accommodated various and possibly multiple co-data sources. We present an extension to the method and software for generic co-data models. We show how ridge penalties may be transformed to elastic net penalties with the R-package squeezy.
arXiv Detail & Related papers (2022-05-16T12:55:19Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z)
Bayesian Regression Approach for Building and Stacking Predictive Models in Time Series Analytics [0.0]
The paper describes the use of Bayesian regression for building time series models and stacking different predictive models for time series. It makes it possible to estimate an uncertainty of time series prediction and calculate value at risk characteristics.
arXiv Detail & Related papers (2022-01-06T12:58:23Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions. We investigate methods for aggregating any number of conditional quantile models. All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z)
Ridge Regression Revisited: Debiasing, Thresholding and Bootstrap [4.142720557665472]
ridge regression may be worth another look since -- after debiasing and thresholding -- it may offer some advantages over the Lasso. In this paper, we define a debiased and thresholded ridge regression method, and prove a consistency result and a Gaussian approximation theorem. In addition to estimation, we consider the problem of prediction, and present a novel, hybrid bootstrap algorithm tailored for prediction intervals.
arXiv Detail & Related papers (2020-09-17T05:04:10Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.