Related papers: Perturbation-based Effect Measures for Compositional Data

Perturbation-based Effect Measures for Compositional Data

URL: http://arxiv.org/abs/2311.18501v2
Date: Tue, 18 Jun 2024 12:21:48 GMT
Title: Perturbation-based Effect Measures for Compositional Data
Authors: Anton Rask Lundborg, Niklas Pfister,
Abstract summary: Existing effect measures for compositional features are inadequate for many modern applications. We propose a framework based on hypothetical data perturbations that addresses both issues. We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization.
Score: 3.9543275888781224
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing effect measures for compositional features are inadequate for many modern applications for two reasons. First, modern datasets with compositional covariates, for example in microbiome research, display traits such as high-dimensionality and sparsity that can be poorly modelled with traditional parametric approaches. Second, assessing -- in an unbiased way -- how summary statistics of a composition (e.g., racial diversity) affect a response variable is not straightforward. In this work, we propose a framework based on hypothetical data perturbations that addresses both issues. Unlike many existing effect measures for compositional features, we do not define our effects based on a parametric model or a transformation of the data. Instead, we use perturbations to define interpretable statistical functionals on the compositions themselves, which we call average perturbation effects. These effects naturally account for confounding that biases frequently used marginal dependence analyses. We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization and applying semiparametric estimation techniques. We analyze the proposed estimators empirically on simulated and semi-synthetic data and demonstrate advantages over existing techniques on data from New York schools and microbiome data. For all proposed estimators, we provide confidence intervals with uniform asymptotic coverage guarantees.

Related papers

Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z)
Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z)
Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes. Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators. The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z)
Debiased high-dimensional regression calibration for errors-in-variables log-contrast models [0.999726509256195]
Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data.
arXiv Detail & Related papers (2024-09-11T18:47:28Z)
Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions [0.0]
Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. We show how we can adapt machine learning (DML) for panel data in the presence of unobserved heterogeneity. We also show that the influence of the unobserved heterogeneity on the observed confounders plays a significant role for the performance of most alternative methods.
arXiv Detail & Related papers (2024-09-02T13:59:54Z)
Data-Driven Estimation of Heterogeneous Treatment Effects [15.140272661540655]
Estimating how a treatment affects different individuals, known as heterogeneous treatment effect estimation, is an important problem in empirical sciences. We provide a survey of state-of-the-art data-driven methods for heterogeneous treatment effect estimation using machine learning.
arXiv Detail & Related papers (2023-01-16T21:36:49Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Conditional Feature Importance for Mixed Data [1.6114012813668934]
We develop a conditional predictive impact (CPI) framework with knockoff sampling. We show that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures. Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.
arXiv Detail & Related papers (2022-10-06T16:52:38Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains. Currently, most existing works rely exclusively on observational data. We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z)
Instrumental Variable Estimation for Compositional Treatments [4.656302602746229]
compositional data include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause.
arXiv Detail & Related papers (2021-06-21T16:29:41Z)
Efficient Causal Inference from Combined Observational and Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders. We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)
Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials. We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)
Performance metrics for intervention-triggering prediction models do not reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models. Standard metrics calculated from retrospective data are only related to model utility under certain assumptions. When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
Nonparametric inference for interventional effects with multiple mediators [0.0]
We provide theory that allows for more flexible, possibly machine learning-based, estimation techniques. We demonstrate multiple robustness properties of the proposed estimators. Our work thus provides a means of leveraging modern statistical learning techniques in estimation of interventional mediation effects.
arXiv Detail & Related papers (2020-01-16T19:05:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.