Perturbation-based Effect Measures for Compositional Data
- URL: http://arxiv.org/abs/2311.18501v2
- Date: Tue, 18 Jun 2024 12:21:48 GMT
- Title: Perturbation-based Effect Measures for Compositional Data
- Authors: Anton Rask Lundborg, Niklas Pfister,
- Abstract summary: Existing effect measures for compositional features are inadequate for many modern applications.
We propose a framework based on hypothetical data perturbations that addresses both issues.
We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization.
- Score: 3.9543275888781224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing effect measures for compositional features are inadequate for many modern applications for two reasons. First, modern datasets with compositional covariates, for example in microbiome research, display traits such as high-dimensionality and sparsity that can be poorly modelled with traditional parametric approaches. Second, assessing -- in an unbiased way -- how summary statistics of a composition (e.g., racial diversity) affect a response variable is not straightforward. In this work, we propose a framework based on hypothetical data perturbations that addresses both issues. Unlike many existing effect measures for compositional features, we do not define our effects based on a parametric model or a transformation of the data. Instead, we use perturbations to define interpretable statistical functionals on the compositions themselves, which we call average perturbation effects. These effects naturally account for confounding that biases frequently used marginal dependence analyses. We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization and applying semiparametric estimation techniques. We analyze the proposed estimators empirically on simulated and semi-synthetic data and demonstrate advantages over existing techniques on data from New York schools and microbiome data. For all proposed estimators, we provide confidence intervals with uniform asymptotic coverage guarantees.
Related papers
- Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes.
Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.
The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Debiased high-dimensional regression calibration for errors-in-variables log-contrast models [0.999726509256195]
Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models.
This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data.
arXiv Detail & Related papers (2024-09-11T18:47:28Z) - Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions [0.0]
Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks.
We show how we can adapt machine learning (DML) for panel data in the presence of unobserved heterogeneity.
We also show that the influence of the unobserved heterogeneity on the observed confounders plays a significant role for the performance of most alternative methods.
arXiv Detail & Related papers (2024-09-02T13:59:54Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Conditional Feature Importance for Mixed Data [1.6114012813668934]
We develop a conditional predictive impact (CPI) framework with knockoff sampling.
We show that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures.
Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.
arXiv Detail & Related papers (2022-10-06T16:52:38Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Nonparametric inference for interventional effects with multiple
mediators [0.0]
We provide theory that allows for more flexible, possibly machine learning-based, estimation techniques.
We demonstrate multiple robustness properties of the proposed estimators.
Our work thus provides a means of leveraging modern statistical learning techniques in estimation of interventional mediation effects.
arXiv Detail & Related papers (2020-01-16T19:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.