Least Squares Estimation Using Sketched Data with Heteroskedastic Errors
- URL: http://arxiv.org/abs/2007.07781v3
- Date: Wed, 22 Jun 2022 10:14:25 GMT
- Title: Least Squares Estimation Using Sketched Data with Heteroskedastic Errors
- Authors: Sokbae Lee, Serena Ng
- Abstract summary: We show that estimates using data sketched by random projections will behave as if the errors were homoskedastic.
Inference, including first-stage F tests for instrument relevance, can be simpler than the full sample case if the sketching scheme is appropriately chosen.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers may perform regressions using a sketch of data of size $m$
instead of the full sample of size $n$ for a variety of reasons. This paper
considers the case when the regression errors do not have constant variance and
heteroskedasticity robust standard errors would normally be needed for test
statistics to provide accurate inference. We show that estimates using data
sketched by random projections will behave `as if' the errors were
homoskedastic. Estimation by random sampling would not have this property. The
result arises because the sketched estimates in the case of random projections
can be expressed as degenerate $U$-statistics, and under certain conditions,
these statistics are asymptotically normal with homoskedastic variance. We
verify that the conditions hold not only in the case of least squares
regression when the covariates are exogenous, but also in instrumental
variables estimation when the covariates are endogenous. The result implies
that inference, including first-stage F tests for instrument relevance, can be
simpler than the full sample case if the sketching scheme is appropriately
chosen.
Related papers
- Doubly Robust Conditional Independence Testing with Generative Neural Networks [8.323172773256449]
This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$.
We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions.
arXiv Detail & Related papers (2024-07-25T01:28:59Z) - High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile [0.0]
We study the predictive risk of the ridge estimator for linear regression with a variance profile.
For certain class of variance profile, our work highlights the emergence of the well-known double descent phenomenon.
We also investigate the similarities and differences that exist with the standard setting of independent and identically distributed data.
arXiv Detail & Related papers (2024-03-29T14:24:49Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - The out-of-sample $R^2$: estimation and inference [0.0]
We define the out-of-sample $R2$ as a comparison of two predictive models.
We exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $hatR2$.
arXiv Detail & Related papers (2023-02-10T09:29:57Z) - Efficient Truncated Linear Regression with Unknown Noise Variance [26.870279729431328]
We provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown.
Our estimator is based on an efficient implementation of Projected Gradient Descent on the negative-likelihood of the truncated sample.
arXiv Detail & Related papers (2022-08-25T12:17:37Z) - Robust Variable Selection and Estimation Via Adaptive Elastic Net
S-Estimators for Linear Regression [0.0]
We propose a new robust regularized estimator for simultaneous variable selection and coefficient estimation.
adaptive PENSE possesses the oracle property without prior knowledge of the scale of the residuals.
Numerical studies on simulated and real data sets highlight superior finite-sample performance in a vast range of settings.
arXiv Detail & Related papers (2021-07-07T16:04:08Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.