Related papers: A tutorial on discovering and quantifying the effect of latent causal sources of multimodal EHR data

A tutorial on discovering and quantifying the effect of latent causal sources of multimodal EHR data

URL: http://arxiv.org/abs/2510.16026v2
Date: Fri, 31 Oct 2025 14:35:48 GMT
Title: A tutorial on discovering and quantifying the effect of latent causal sources of multimodal EHR data
Authors: Marco Barbero-Mota, Eric V. Strobl, John M. Still, William W. Stead, Thomas A. Lasko,
Abstract summary: We provide an accessible description of a generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes.<n>We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated.<n>We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.
Score: 2.9033848132822726
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We provide an accessible description of a peer-reviewed generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes. We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated. We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.

Related papers

Robust Multimodal Representation Learning in Healthcare [12.190907451083765]
Real-world medical datasets commonly contain systematic biases from multiple sources.<n>We propose a Dual-Stream Feature Decorrelation Framework that identifies and handles the biases.<n>Our method employs a causal-biased decorrelation framework with dual-stream neural networks to disentangle causal features from spurious correlations.
arXiv Detail & Related papers (2026-01-29T16:27:54Z)
Causal Debiasing Medical Multimodal Representation Learning with Missing Modalities [6.02318066285653]
Real-world medical datasets often suffer from missing modalities due to cost, protocol, or patient-specific constraints.<n>Our method consists of two key components: (1) a missingness deconfounding module that approximates causal intervention based on backdoor adjustment and (2) a dual-branch neural network that explicitly disentangles causal features from spurious correlations.
arXiv Detail & Related papers (2025-09-06T06:27:10Z)
Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z)
A data-driven approach to discover and quantify systemic lupus erythematosus etiological heterogeneity from electronic health records [4.167173990365707]
Systemic lupus erythematosus (SLE) is a complex disease with many manifestational facets.<n>We propose a data-driven approach to discover probabilistic independent sources from multimodal imperfect EHR data.
arXiv Detail & Related papers (2025-01-13T11:00:31Z)
Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z)
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z)
Discovering and Reasoning of Causality in the Hidden World with Large Language Models [109.62442253177376]
We develop a new framework termed Causal representatiOn AssistanT (COAT) to propose useful measured variables for causal discovery.<n>Instead of directly inferring causality with Large language models (LLMs), COAT constructs feedback from intermediate causal discovery results to LLMs to refine the proposed variables.
arXiv Detail & Related papers (2024-02-06T12:18:54Z)
Why Do Probabilistic Clinical Models Fail To Transport Between Sites? [6.660458629649825]
Computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. We present common sources for this failure to transport, which we divide into sources under the control of the experimenter and sources inherent to the clinical data-generating process.
arXiv Detail & Related papers (2023-11-08T16:09:25Z)
Identifiable Latent Polynomial Causal Models Through the Lens of Change [82.14087963690561]
Causal representation learning aims to unveil latent high-level causal representations from observed low-level data.<n>One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability.
arXiv Detail & Related papers (2023-10-24T07:46:10Z)
Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients [1.2647816797166165]
We report a more reliable and scalable causal discovery method (iMIIC) based on a general mutual information supremum principle. We showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast cancer patients from the US Surveillance, Epidemiology, and End Results program.
arXiv Detail & Related papers (2023-03-11T15:18:19Z)
Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework [89.8609061423685]
We propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks. We demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies.
arXiv Detail & Related papers (2023-02-23T18:59:05Z)
Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data. We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism. We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z)
Efficient Causal Inference from Combined Observational and Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders. We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.