Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data
- URL: http://arxiv.org/abs/2106.03344v2
- Date: Wed, 28 Jun 2023 20:15:52 GMT
- Title: Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data
- Authors: Fei Xue, Rong Ma, Hongzhe Li
- Abstract summary: Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information.
We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations.
Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
- Score: 13.48481978963297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Blockwise missing data occurs frequently when we integrate multisource or
multimodality data where different sources or modalities contain complementary
information. In this paper, we consider a high-dimensional linear regression
model with blockwise missing covariates and a partially observed response
variable. Under this framework, we propose a computationally efficient
estimator for the regression coefficient vector based on carefully constructed
unbiased estimating equations and a blockwise imputation procedure, and obtain
its rate of convergence. Furthermore, building upon an innovative projected
estimating equation technique that intrinsically achieves bias-correction of
the initial estimator, we propose a nearly unbiased estimator for each
individual regression coefficient, which is asymptotically normally distributed
under mild conditions. Based on these debiased estimators, asymptotically valid
confidence intervals and statistical tests about each regression coefficient
are constructed. Numerical studies and application analysis of the Alzheimer's
Disease Neuroimaging Initiative data show that the proposed method performs
better and benefits more from unsupervised samples than existing methods.
Related papers
- Debiased Regression for Root-N-Consistent Conditional Mean Estimation [10.470114319701576]
We introduce a debiasing method for regression estimators, including high-dimensional and nonparametric regression estimators.
Our theoretical analysis demonstrates that the proposed estimator achieves $sqrtn$-consistency and normality under a mild convergence rate condition.
The proposed method offers several advantages, including improved estimation accuracy and simplified construction of confidence intervals.
arXiv Detail & Related papers (2024-11-18T17:25:06Z) - Progression: an extrapolation principle for regression [0.0]
We propose a novel statistical extrapolation principle.
It assumes a simple relationship between predictors and the response at the boundary of the training predictor samples.
Our semi-parametric method, progression, leverages this extrapolation principle and offers guarantees on the approximation error beyond the training data range.
arXiv Detail & Related papers (2024-10-30T17:29:51Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery [0.0]
We focus on distributed estimation and support recovery for high-dimensional linear quantile regression.
We transform the original quantile regression into the least-squares optimization.
An efficient algorithm is developed, which enjoys high computation and communication efficiency.
arXiv Detail & Related papers (2024-05-13T08:32:22Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Statistical Inference after Kernel Ridge Regression Imputation under
item nonresponse [0.76146285961466]
We consider a nonparametric approach to imputation using the kernel ridge regression technique and propose consistent variance estimation.
The proposed variance estimator is based on a linearization approach which employs the entropy method to estimate the density ratio.
arXiv Detail & Related papers (2021-01-29T20:46:33Z) - Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for
High-Dimensional Mixed Linear Regression [9.815103550891463]
This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion.
We propose an iterative procedure for estimating the two regression vectors and establish their rates of convergence.
A large-scale multiple testing procedure is proposed for testing the regression coefficients and is shown to control the false discovery rate (FDR) algorithmally.
arXiv Detail & Related papers (2020-11-06T21:17:41Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.