Debiased high-dimensional regression calibration for errors-in-variables log-contrast models
- URL: http://arxiv.org/abs/2409.07568v1
- Date: Wed, 11 Sep 2024 18:47:28 GMT
- Title: Debiased high-dimensional regression calibration for errors-in-variables log-contrast models
- Authors: Huali Zhao, Tianying Wang,
- Abstract summary: Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models.
This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data.
- Score: 0.999726509256195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts.
Related papers
- Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks [16.064233621959538]
We propose a query-efficient and computation-efficient MIA that directly textbfRe-levertextbfAges the original membershitextbfP scores to mtextbfItigate the errors in textbfDifficulty calibration.
arXiv Detail & Related papers (2024-08-31T11:59:42Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Perturbation-based Effect Measures for Compositional Data [3.9543275888781224]
Existing effect measures for compositional features are inadequate for many modern applications.
We propose a framework based on hypothetical data perturbations that addresses both issues.
We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization.
arXiv Detail & Related papers (2023-11-30T12:27:15Z) - Errors-in-variables Fr\'echet Regression with Low-rank Covariate
Approximation [2.1756081703276]
Fr'echet regression has emerged as a promising approach for regression analysis involving non-Euclidean response variables.
Our proposed framework combines the concepts of global Fr'echet regression and principal component regression, aiming to improve the efficiency and accuracy of the regression estimator.
arXiv Detail & Related papers (2023-05-16T08:37:54Z) - High-dimensional Measurement Error Models for Lipschitz Loss [2.6415509201394283]
We develop high-dimensional measurement error models for a class of Lipschitz loss functions.
Our estimator is designed to minimize the $L_1$ norm among all estimators belonging to suitable feasible sets.
We derive theoretical guarantees in terms of finite sample statistical error bounds and sign consistency, even when the dimensionality increases exponentially with the sample size.
arXiv Detail & Related papers (2022-10-26T20:06:05Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Measurement Error in Nutritional Epidemiology: A Survey [0.0]
This article reviews bias-correction models for measurement error of exposure variables in the field of nutritional epidemiology.
Due to the influence of measurement error, inference of parameter estimate is conservative and confidence interval of the slope parameter is too narrow.
arXiv Detail & Related papers (2020-04-14T12:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.