Instrumental Variable Estimation for Compositional Treatments
- URL: http://arxiv.org/abs/2106.11234v3
- Date: Tue, 28 May 2024 13:25:04 GMT
- Title: Instrumental Variable Estimation for Compositional Treatments
- Authors: Elisabeth Ailer, Christian L. Müller, Niki Kilbertus,
- Abstract summary: compositional data include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research.
Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause.
- Score: 4.656302602746229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.
Related papers
- Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis [1.361248247831476]
This paper proposes a hierarchical Bayesian multitask learning model that is applicable to the general multi-task binary classification learning problem.
We derive a computationally efficient inference algorithm based on variational inference to approximate the posterior distribution.
We demonstrate the potential of the new approach on various synthetic datasets and for predicting human health status based on microbiome profile.
arXiv Detail & Related papers (2025-02-04T18:23:22Z) - Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments [67.80453452949303]
Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine.
Here, we focus on the widespread setting where the observational data come from multiple environments.
We propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models.
arXiv Detail & Related papers (2024-06-04T16:31:43Z) - Perturbation-based Effect Measures for Compositional Data [3.9543275888781224]
We propose a framework based on hypothetical data perturbations which defines interpretable statistical functionals on compositions.
We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization.
We analyze the proposed estimators empirically on simulated and semi-synthetic data and demonstrate advantages over existing techniques on data from New York schools and microbiome data.
arXiv Detail & Related papers (2023-11-30T12:27:15Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Supervised Learning and Model Analysis with Compositional Data [4.082799056366927]
KernelBiome is a kernel-based non-parametric regression and classification framework for compositional data.
We demonstrate on par or improved performance compared with state-of-the-art machine learning methods.
arXiv Detail & Related papers (2022-05-15T12:33:43Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Statistical Analytics and Regional Representation Learning for COVID-19
Pandemic Understanding [4.731074162093199]
The rapid spread of the novel coronavirus (COVID-19) has severely impacted almost all countries around the world.
This paper combines and processes an extensive collection of publicly available datasets to provide a unified information source.
A specific RNN-based inference pipeline called DoubleWindowLSTM-CP is proposed in this work for predictive event modeling.
arXiv Detail & Related papers (2020-08-08T03:35:16Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.