Assumption-Lean Post-Integrated Inference with Negative Control Outcomes
- URL: http://arxiv.org/abs/2410.04996v1
- Date: Mon, 7 Oct 2024 12:52:38 GMT
- Title: Assumption-Lean Post-Integrated Inference with Negative Control Outcomes
- Authors: Jin-Hong Du, Kathryn Roeder, Larry Wasserman,
- Abstract summary: We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes.
Our assumption-lean semi inference method extends robustness and generality to projected direct effect estimands that account for mediators, confounders, and moderators.
The proposed doubly robust estimators are consistent and efficient under minimal assumptions, facilitating data-adaptive estimation with machine learning algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data integration has become increasingly common in aligning multiple heterogeneous datasets. With high-dimensional outcomes, data integration methods aim to extract low-dimensional embeddings of observations to remove unwanted variations, such as batch effects and unmeasured covariates, inherent in data collected from different sources. However, multiple hypothesis testing after data integration can be substantially biased due to the data-dependent integration processes. To address this challenge, we introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes. By leveraging causal interpretations, we derive nonparametric identification conditions that form the basis of our PII approach. Our assumption-lean semiparametric inference method extends robustness and generality to projected direct effect estimands that account for mediators, confounders, and moderators. These estimands remain statistically meaningful under model misspecifications and with error-prone embeddings. We provide deterministic quantifications of the bias of target estimands induced by estimated embeddings and finite-sample linear expansions of the estimators with uniform concentration bounds on the residuals for all outcomes. The proposed doubly robust estimators are consistent and efficient under minimal assumptions, facilitating data-adaptive estimation with machine learning algorithms. Using random forests, we evaluate empirical statistical errors in simulations and analyze single-cell CRISPR perturbed datasets with potential unmeasured confounders.
Related papers
- Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data [1.0650780147044159]
We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources.
Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint target distribution.
We extend the aforementioned comprehensive theory to allow for the fusion of individual-level data from sources aligned with conditional distributions that do not correspond to a single factorization of the target distribution.
arXiv Detail & Related papers (2024-09-16T04:10:44Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Synthetic Tabular Data Validation: A Divergence-Based Approach [8.062368743143388]
Divergences quantify discrepancies between data distributions.
Traditional approaches calculate divergences independently for each feature.
We propose a novel approach that uses divergence estimation to overcome the limitations of marginal comparisons.
arXiv Detail & Related papers (2024-05-13T15:07:52Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Statistical Inference for High-Dimensional Linear Regression with
Blockwise Missing Data [13.48481978963297]
Blockwise missing data occurs when we integrate multisource or multimodality data where different sources or modalities contain complementary information.
We propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations.
Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.
arXiv Detail & Related papers (2021-06-07T05:12:42Z) - Robust Bayesian Inference for Discrete Outcomes with the Total Variation
Distance [5.139874302398955]
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination.
Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD)
We empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
arXiv Detail & Related papers (2020-10-26T09:53:06Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.