Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for
High-Dimensional Mixed Linear Regression
- URL: http://arxiv.org/abs/2011.03598v1
- Date: Fri, 6 Nov 2020 21:17:41 GMT
- Title: Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for
High-Dimensional Mixed Linear Regression
- Authors: Linjun Zhang, Rong Ma, T. Tony Cai and Hongzhe Li
- Abstract summary: This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion.
We propose an iterative procedure for estimating the two regression vectors and establish their rates of convergence.
A large-scale multiple testing procedure is proposed for testing the regression coefficients and is shown to control the false discovery rate (FDR) algorithmally.
- Score: 9.815103550891463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the high-dimensional mixed linear regression (MLR) where
the output variable comes from one of the two linear regression models with an
unknown mixing proportion and an unknown covariance structure of the random
covariates. Building upon a high-dimensional EM algorithm, we propose an
iterative procedure for estimating the two regression vectors and establish
their rates of convergence. Based on the iterative estimators, we further
construct debiased estimators and establish their asymptotic normality. For
individual coordinates, confidence intervals centered at the debiased
estimators are constructed.
Furthermore, a large-scale multiple testing procedure is proposed for testing
the regression coefficients and is shown to control the false discovery rate
(FDR) asymptotically. Simulation studies are carried out to examine the
numerical performance of the proposed methods and their superiority over
existing methods. The proposed methods are further illustrated through an
analysis of a dataset of multiplex image cytometry, which investigates the
interaction networks among the cellular phenotypes that include the expression
levels of 20 epitopes or combinations of markers.
Related papers
- Statistical Inference in Classification of High-dimensional Gaussian Mixture [1.2354076490479515]
We investigate the behavior of a general class of regularized convex classifiers in the high-dimensional limit.
Our focus is on the generalization error and variable selection properties of the estimators.
arXiv Detail & Related papers (2024-10-25T19:58:36Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression [5.883916678819683]
We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR)
Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings.
arXiv Detail & Related papers (2024-05-28T14:46:20Z) - Estimation of multiple mean vectors in high dimension [4.2466572124753]
We endeavour to estimate numerous multi-dimensional means of various probability distributions on a common space based on independent samples.
Our approach involves forming estimators through convex combinations of empirical means derived from these samples.
arXiv Detail & Related papers (2024-03-22T08:42:41Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry [0.0]
We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold.
We develop an optimal sample allocation scheme that minimizes the mean-squared error of the estimator given a fixed budget.
Evaluations of our approach using data from physical applications demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks.
arXiv Detail & Related papers (2023-01-31T16:33:46Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data [13.48481978963297]
Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information.
We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations.
Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
arXiv Detail & Related papers (2021-06-07T05:12:42Z) - Estimation of Switched Markov Polynomial NARX models [75.91002178647165]
We identify a class of models for hybrid dynamical systems characterized by nonlinear autoregressive (NARX) components.
The proposed approach is demonstrated on a SMNARX problem composed by three nonlinear sub-models with specific regressors.
arXiv Detail & Related papers (2020-09-29T15:00:47Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.