Multiple imputation and test-wise deletion for causal discovery with
incomplete cohort data
- URL: http://arxiv.org/abs/2108.13331v1
- Date: Mon, 30 Aug 2021 15:51:30 GMT
- Title: Multiple imputation and test-wise deletion for causal discovery with
incomplete cohort data
- Authors: Janine Witte, Ronja Foraita, Vanessa Didelez
- Abstract summary: Causal discovery algorithms estimate causal graphs from observational data.
Until recently, these algorithms have been unable to handle missing values.
We investigate two alternative solutions: Test-wise deletion and multiple imputation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal discovery algorithms estimate causal graphs from observational data.
This can provide a valuable complement to analyses focussing on the causal
relation between individual treatment-outcome pairs. Constraint-based causal
discovery algorithms rely on conditional independence testing when building the
graph. Until recently, these algorithms have been unable to handle missing
values. In this paper, we investigate two alternative solutions: Test-wise
deletion and multiple imputation. We establish necessary and sufficient
conditions for the recoverability of causal structures under test-wise
deletion, and argue that multiple imputation is more challenging in the context
of causal discovery than for estimation. We conduct an extensive comparison by
simulating from benchmark causal graphs: As one might expect, we find that
test-wise deletion and multiple imputation both clearly outperform list-wise
deletion and single imputation. Crucially, our results further suggest that
multiple imputation is especially useful in settings with a small number of
either Gaussian or discrete variables, but when the dataset contains a mix of
both neither method is uniformly best. The methods we compare include random
forest imputation and a hybrid procedure combining test-wise deletion and
multiple imputation. An application to data from the IDEFICS cohort study on
diet- and lifestyle-related diseases in European children serves as an
illustrating example.
Related papers
- Adaptive Online Experimental Design for Causal Discovery [9.447864414136905]
Causal discovery aims to uncover cause-and-effect relationships encoded in causal graphs.
We focus on data interventional efficiency and formalize causal discovery from the perspective of online learning.
We propose a track-and-stop causal discovery algorithm that adaptively selects interventions from the graph separating system.
arXiv Detail & Related papers (2024-05-19T13:26:33Z) - Membership Testing in Markov Equivalence Classes via Independence Query
Oracles [17.84347390320128]
We show that it is relatively easier to test causal relationships than to learn them.
In particular, it requires exponentially less independence tests in graphs featuring high in-degrees and small clique sizes.
arXiv Detail & Related papers (2024-03-09T02:10:08Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Improving Efficiency and Accuracy of Causal Discovery Using a
Hierarchical Wrapper [7.570246812206772]
Causal discovery from observational data is an important tool in many branches of science.
In the large sample limit, sound and complete causal discovery algorithms have been previously introduced.
However, only finite training data is available, which limits the power of statistical tests used by these algorithms.
arXiv Detail & Related papers (2021-07-11T09:24:49Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - On the Sample Complexity of Causal Discovery and the Value of Domain
Expertise [0.0]
Causal discovery methods seek to identify causal relations between random variables from purely observational data.
In this paper, we analyze the sample complexity of causal discovery algorithms without a CI oracle.
Our methods allow us to quantify the value of domain expertise in terms of data samples.
arXiv Detail & Related papers (2021-02-05T16:26:17Z) - Group Testing with a Graph Infection Spread Model [61.48558770435175]
Infection spreads via connections between individuals and this results in a probabilistic cluster formation structure as well as a non-i.i.d. infection status for individuals.
We propose a class of two-step sampled group testing algorithms where we exploit the known probabilistic infection spread model.
Our results imply that, by exploiting information on the connections of individuals, group testing can be used to reduce the number of required tests significantly even when infection rate is high.
arXiv Detail & Related papers (2021-01-14T18:51:32Z) - A Single Iterative Step for Anytime Causal Discovery [7.570246812206772]
We present a sound and complete algorithm for recovering causal graphs from observed, non-interventional data.
We rely on the causal Markov and faithfulness assumptions and recover the equivalence class of the underlying causal graph.
We demonstrate that our algorithm requires significantly fewer CI tests and smaller condition sets compared to the FCI algorithm.
arXiv Detail & Related papers (2020-12-14T13:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.