Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data
- URL: http://arxiv.org/abs/2106.03344v2
- Date: Wed, 28 Jun 2023 20:15:52 GMT
- Title: Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data
- Authors: Fei Xue, Rong Ma, Hongzhe Li
- Abstract summary: Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information.
We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations.
Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
- Score: 13.48481978963297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Blockwise missing data occurs frequently when we integrate multisource or
multimodality data where different sources or modalities contain complementary
information. In this paper, we consider a high-dimensional linear regression
model with blockwise missing covariates and a partially observed response
variable. Under this framework, we propose a computationally efficient
estimator for the regression coefficient vector based on carefully constructed
unbiased estimating equations and a blockwise imputation procedure, and obtain
its rate of convergence. Furthermore, building upon an innovative projected
estimating equation technique that intrinsically achieves bias-correction of
the initial estimator, we propose a nearly unbiased estimator for each
individual regression coefficient, which is asymptotically normally distributed
under mild conditions. Based on these debiased estimators, asymptotically valid
confidence intervals and statistical tests about each regression coefficient
are constructed. Numerical studies and application analysis of the Alzheimer's
Disease Neuroimaging Initiative data show that the proposed method performs
better and benefits more from unsupervised samples than existing methods.
Related papers
- RieszBoost: Gradient Boosting for Riesz Regression [49.737777802061984]
We propose a novel gradient boosting algorithm to directly estimate the Riesz representer without requiring its explicit analytical form.
We show that our algorithm performs on par with or better than indirect estimation techniques across a range of functionals.
arXiv Detail & Related papers (2025-01-08T23:04:32Z) - Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness [10.470114319701576]
We introduce a model-free debiasing method for smooth nonparametric estimators derived from any nonparametric regression approach.
We obtain a debiased estimator with proven pointwise normality and uniform convergence.
arXiv Detail & Related papers (2024-12-28T15:01:19Z) - Debiased Regression for Root-N-Consistent Conditional Mean Estimation [10.470114319701576]
We introduce a debiasing method for regression estimators, including high-dimensional and nonparametric regression estimators.
Our theoretical analysis demonstrates that the proposed estimator achieves $sqrtn$-consistency and normality under a mild convergence rate condition.
The proposed method offers several advantages, including improved estimation accuracy and simplified construction of confidence intervals.
arXiv Detail & Related papers (2024-11-18T17:25:06Z) - Progression: an extrapolation principle for regression [0.0]
We propose a novel statistical extrapolation principle.
It assumes a simple relationship between predictors and the response at the boundary of the training predictor samples.
Our semi-parametric method, progression, leverages this extrapolation principle and offers guarantees on the approximation error beyond the training data range.
arXiv Detail & Related papers (2024-10-30T17:29:51Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.
We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery [0.0]
We focus on distributed estimation and support recovery for high-dimensional linear quantile regression.
We transform the original quantile regression into the least-squares optimization.
An efficient algorithm is developed, which enjoys high computation and communication efficiency.
arXiv Detail & Related papers (2024-05-13T08:32:22Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for
High-Dimensional Mixed Linear Regression [9.815103550891463]
This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion.
We propose an iterative procedure for estimating the two regression vectors and establish their rates of convergence.
A large-scale multiple testing procedure is proposed for testing the regression coefficients and is shown to control the false discovery rate (FDR) algorithmally.
arXiv Detail & Related papers (2020-11-06T21:17:41Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.