The Impact of Missing Data on Causal Discovery: A Multicentric Clinical
Study
- URL: http://arxiv.org/abs/2305.10050v2
- Date: Fri, 3 Nov 2023 14:37:39 GMT
- Title: The Impact of Missing Data on Causal Discovery: A Multicentric Clinical
Study
- Authors: Alessio Zanga, Alice Bernasconi, Peter J.F. Lucas, Hanny Pijnenborg,
Casper Reijnen, Marco Scutari, Fabio Stella
- Abstract summary: We use data from a multi-centric study on endometrial cancer to analyze the impact of different missingness mechanisms on the recovered causal graph.
We validate the recovered graph with expert physicians, showing that our approach finds clinically-relevant solutions.
- Score: 1.173358409934101
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal inference for testing clinical hypotheses from observational data
presents many difficulties because the underlying data-generating model and the
associated causal graph are not usually available. Furthermore, observational
data may contain missing values, which impact the recovery of the causal graph
by causal discovery algorithms: a crucial issue often ignored in clinical
studies. In this work, we use data from a multi-centric study on endometrial
cancer to analyze the impact of different missingness mechanisms on the
recovered causal graph. This is achieved by extending state-of-the-art causal
discovery algorithms to exploit expert knowledge without sacrificing
theoretical soundness. We validate the recovered graph with expert physicians,
showing that our approach finds clinically-relevant solutions. Finally, we
discuss the goodness of fit of our graph and its consistency from a clinical
decision-making perspective using graphical separation to validate causal
pathways.
Related papers
- Causal Graph Aided Causal Discovery in an Observational Aneurysmal Subarachnoid Hemorrhage Study [0.9217021281095907]
Causal inference methods for observational data are increasingly recognized as a valuable complement to randomized clinical trials (RCTs)
We present and illustrate methods that provide "midway insights" during study's course.
Concepts are illustrated through an analysis of data generated by patients with aneurysmal Subarachnoid Hemorrhage (aSAH)
In addition, we propose a method for multicenter studies, to monitor the impact of changes in practice at an individual center's level.
arXiv Detail & Related papers (2024-08-12T19:31:16Z) - Understanding Breast Cancer Survival: Using Causality and Language
Models on Multi-omics Data [23.850817918011863]
We exploit causal discovery algorithms to investigate how perturbations in the genome can affect the survival of patients diagnosed with breast cancer.
Our findings reveal important factors related to the vital status of patients using causal discovery algorithms.
Results are validated through language models trained on biomedical literature.
arXiv Detail & Related papers (2023-05-28T17:07:46Z) - Optimizing Data-driven Causal Discovery Using Knowledge-guided Search [3.7489744097107316]
This study introduces a knowledge-guided causal structure search (KGS) approach that utilizes observational data and structural priors as constraints to learn the causal graph.
We extensively evaluate KGS in multiple settings using synthetic and benchmark real-world datasets, as well as in a real-life healthcare application related to oxygen therapy treatment.
arXiv Detail & Related papers (2023-04-11T20:56:33Z) - Learning interpretable causal networks from very large datasets,
application to 400,000 medical records of breast cancer patients [1.2647816797166165]
We report a more reliable and scalable causal discovery method (iMIIC) based on a general mutual information supremum principle.
We showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast cancer patients from the US Surveillance, Epidemiology, and End Results program.
arXiv Detail & Related papers (2023-03-11T15:18:19Z) - Active Bayesian Causal Inference [72.70593653185078]
We propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning.
ABCI jointly infers a posterior over causal models and queries of interest.
We show that our approach is more data-efficient than several baselines that only focus on learning the full causal graph.
arXiv Detail & Related papers (2022-06-04T22:38:57Z) - Intelligent Sight and Sound: A Chronic Cancer Pain Dataset [74.77784420691937]
This paper introduces the first chronic cancer pain dataset, collected as part of the Intelligent Sight and Sound (ISS) clinical trial.
The data collected to date consists of 29 patients, 509 smartphone videos, 189,999 frames, and self-reported affective and activity pain scores.
Using static images and multi-modal data to predict self-reported pain levels, early models show significant gaps between current methods available to predict pain.
arXiv Detail & Related papers (2022-04-07T22:14:37Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Estimation of Causal Effects in the Presence of Unobserved Confounding
in the Alzheimer's Continuum [3.2489082010225494]
We derive a causal graph from the current clinical knowledge on cause and effect in the Alzheimer's disease continuum.
We show that identifiability of the causal effect requires all confounders to be known and measured.
In our theoretical analysis, we prove that using the substitute confounder enables identifiability of the causal effect of neuroanatomy on cognition.
arXiv Detail & Related papers (2020-06-23T16:29:54Z) - Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete
Labels [66.57101219176275]
Disease diagnosis on chest X-ray images is a challenging multi-label classification task.
We propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases.
Our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning.
arXiv Detail & Related papers (2020-02-26T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.