Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift
- URL: http://arxiv.org/abs/2501.18798v1
- Date: Thu, 30 Jan 2025 23:21:25 GMT
- Title: Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift
- Authors: Yi Liu, Alexander W. Levis, Ke Zhu, Shu Yang, Peter B. Gilbert, Larry Han,
- Abstract summary: Causal inference has the potential to improve the generalizability, transportability, and replicability of scientific findings.
Existing data fusion methods focus on binary or continuous outcomes.
We propose two novel approaches for multi-source causal survival analysis.
- Score: 46.84912148188679
- License:
- Abstract: Causal inference across multiple data sources has the potential to improve the generalizability, transportability, and replicability of scientific findings. However, data integration methods for time-to-event outcomes -- common in medical contexts such as clinical trials -- remain underdeveloped. Existing data fusion methods focus on binary or continuous outcomes, neglecting the distinct challenges of survival analysis, including right-censoring and the unification of discrete and continuous time frameworks. To address these gaps, we propose two novel approaches for multi-source causal survival analysis. First, considering a target site-specific causal effect, we introduce a semiparametric efficient estimator for scenarios where data-sharing is feasible. Second, we develop a federated learning framework tailored to privacy-constrained environments. This framework dynamically adjusts source site-specific contributions, downweighting biased sources and upweighting less biased ones relative to the target population. Both approaches incorporate nonparametric machine learning models to enhance robustness and efficiency, with theoretical guarantees applicable to both continuous and discrete time-to-event outcomes. We demonstrate the practical utility of our methods through extensive simulations and an application to two randomized trials of a monoclonal neutralizing antibody for HIV-1 prevention: HVTN 704/HPTN 085 (cisgender men and transgender persons in the Americas and Switzerland) and HVTN 703/HPTN 081 (women in sub-Saharan Africa). The results highlight the potential of our approaches to efficiently estimate causal effects while addressing heterogeneity across data sources and adhering to privacy and robustness constraints.
Related papers
- Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources.
We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations.
We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Multiply Robust Federated Estimation of Targeted Average Treatment
Effects [0.0]
We propose a novel approach to derive valid causal inferences for a target population using multi-site data.
Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites.
arXiv Detail & Related papers (2023-09-22T03:15:08Z) - Robust Direct Learning for Causal Data Fusion [14.462235940634969]
We provide a framework for integrating multi-source data that separates the treatment effect from other nuisance functions.
We also propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory.
arXiv Detail & Related papers (2022-11-01T03:33:22Z) - Decentralized Distributed Learning with Privacy-Preserving Data
Synthesis [9.276097219140073]
In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data.
Recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis.
We present a decentralized distributed method that integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy.
arXiv Detail & Related papers (2022-06-20T23:49:38Z) - SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event
Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data.
We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.