Related papers: Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

URL: http://arxiv.org/abs/2106.03344v2
Date: Wed, 28 Jun 2023 20:15:52 GMT
Title: Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data
Authors: Fei Xue, Rong Ma, Hongzhe Li
Abstract summary: Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information. We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
Score: 13.48481978963297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building upon an innovative projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.

Related papers

RieszBoost: Gradient Boosting for Riesz Regression [49.737777802061984]
We propose a novel gradient boosting algorithm to directly estimate the Riesz representer without requiring its explicit analytical form. We show that our algorithm performs on par with or better than indirect estimation techniques across a range of functionals.
arXiv Detail & Related papers (2025-01-08T23:04:32Z)
Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness [10.470114319701576]
We introduce a model-free debiasing method for smooth nonparametric regression estimators. We obtain a debiased estimator that satisfies pointwise and uniform risk convergence, along with smoothness, under mild conditions.
arXiv Detail & Related papers (2024-12-28T15:01:19Z)
Debiased Regression for Root-N-Consistent Conditional Mean Estimation [10.470114319701576]
We introduce a debiasing method for regression estimators, including high-dimensional and nonparametric regression estimators. Our theoretical analysis demonstrates that the proposed estimator achieves $sqrtn$-consistency and normality under a mild convergence rate condition. The proposed method offers several advantages, including improved estimation accuracy and simplified construction of confidence intervals.
arXiv Detail & Related papers (2024-11-18T17:25:06Z)
Progression: an extrapolation principle for regression [0.0]
We propose a novel statistical extrapolation principle. It assumes a simple relationship between predictors and the response at the boundary of the training predictor samples. Our semi-parametric method, progression, leverages this extrapolation principle and offers guarantees on the approximation error beyond the training data range.
arXiv Detail & Related papers (2024-10-30T17:29:51Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk. We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery [0.0]
We focus on distributed estimation and support recovery for high-dimensional linear quantile regression. We transform the original quantile regression into the least-squares optimization. An efficient algorithm is developed, which enjoys high computation and communication efficiency.
arXiv Detail & Related papers (2024-05-13T08:32:22Z)
Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
Statistical Inference after Kernel Ridge Regression Imputation under item nonresponse [0.76146285961466]
We consider a nonparametric approach to imputation using the kernel ridge regression technique and propose consistent variance estimation. The proposed variance estimator is based on a linearization approach which employs the entropy method to estimate the density ratio.
arXiv Detail & Related papers (2021-01-29T20:46:33Z)
Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for High-Dimensional Mixed Linear Regression [9.815103550891463]
This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion. We propose an iterative procedure for estimating the two regression vectors and establish their rates of convergence. A large-scale multiple testing procedure is proposed for testing the regression coefficients and is shown to control the false discovery rate (FDR) algorithmally.
arXiv Detail & Related papers (2020-11-06T21:17:41Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.