Perturbation-based Effect Measures for Compositional Data
- URL: http://arxiv.org/abs/2311.18501v4
- Date: Wed, 12 Feb 2025 19:17:20 GMT
- Title: Perturbation-based Effect Measures for Compositional Data
- Authors: Anton Rask Lundborg, Niklas Pfister,
- Abstract summary: We propose a framework based on hypothetical data perturbations which defines interpretable statistical functionals on compositions.
We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization.
We analyze the proposed estimators empirically on simulated and semi-synthetic data and demonstrate advantages over existing techniques on data from New York schools and microbiome data.
- Score: 3.9543275888781224
- License:
- Abstract: Existing effect measures for compositional features are inadequate for many modern applications, for example, in microbiome research, since they display traits such as high-dimensionality and sparsity that can be poorly modelled with traditional parametric approaches. Further, assessing -- in an unbiased way -- how summary statistics of a composition (e.g., racial diversity) affect a response variable is not straightforward. We propose a framework based on hypothetical data perturbations which defines interpretable statistical functionals on the compositions themselves, which we call average perturbation effects. These effects naturally account for confounding that biases frequently used marginal dependence analyses. We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization and applying semiparametric estimation techniques. We analyze the proposed estimators empirically on simulated and semi-synthetic data and demonstrate advantages over existing techniques on data from New York schools and microbiome data.
Related papers
- Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes.
Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.
The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Debiased high-dimensional regression calibration for errors-in-variables log-contrast models [0.999726509256195]
Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models.
This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data.
arXiv Detail & Related papers (2024-09-11T18:47:28Z) - Data-Driven Estimation of Heterogeneous Treatment Effects [15.140272661540655]
Estimating how a treatment affects different individuals, known as heterogeneous treatment effect estimation, is an important problem in empirical sciences.
We provide a survey of state-of-the-art data-driven methods for heterogeneous treatment effect estimation using machine learning.
arXiv Detail & Related papers (2023-01-16T21:36:49Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Instrumental Variable Estimation for Compositional Treatments [4.656302602746229]
compositional data include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research.
Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause.
arXiv Detail & Related papers (2021-06-21T16:29:41Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Nonparametric inference for interventional effects with multiple
mediators [0.0]
We provide theory that allows for more flexible, possibly machine learning-based, estimation techniques.
We demonstrate multiple robustness properties of the proposed estimators.
Our work thus provides a means of leveraging modern statistical learning techniques in estimation of interventional mediation effects.
arXiv Detail & Related papers (2020-01-16T19:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.