A Counterfactual Fair Model for Longitudinal Electronic Health Records
via Deconfounder
- URL: http://arxiv.org/abs/2308.11819v3
- Date: Mon, 2 Oct 2023 17:46:40 GMT
- Title: A Counterfactual Fair Model for Longitudinal Electronic Health Records
via Deconfounder
- Authors: Zheng Liu, Xiaohan Li and Philip Yu
- Abstract summary: We propose a novel model called Fair Longitudinal Medical Deconfounder (FLMD)
FLMD aims to achieve both fairness and accuracy in longitudinal Electronic Health Records (EHR) modeling.
We conducted comprehensive experiments on two real-world EHR datasets to demonstrate the effectiveness of FLMD.
- Score: 5.198621505969445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fairness issue of clinical data modeling, especially on Electronic Health
Records (EHRs), is of utmost importance due to EHR's complex latent structure
and potential selection bias. It is frequently necessary to mitigate health
disparity while keeping the model's overall accuracy in practice. However,
traditional methods often encounter the trade-off between accuracy and
fairness, as they fail to capture the underlying factors beyond observed data.
To tackle this challenge, we propose a novel model called Fair Longitudinal
Medical Deconfounder (FLMD) that aims to achieve both fairness and accuracy in
longitudinal Electronic Health Records (EHR) modeling. Drawing inspiration from
the deconfounder theory, FLMD employs a two-stage training process. In the
first stage, FLMD captures unobserved confounders for each encounter, which
effectively represents underlying medical factors beyond observed EHR, such as
patient genotypes and lifestyle habits. This unobserved confounder is crucial
for addressing the accuracy/fairness dilemma. In the second stage, FLMD
combines the learned latent representation with other relevant features to make
predictions. By incorporating appropriate fairness criteria, such as
counterfactual fairness, FLMD ensures that it maintains high prediction
accuracy while simultaneously minimizing health disparities. We conducted
comprehensive experiments on two real-world EHR datasets to demonstrate the
effectiveness of FLMD. Apart from the comparison of baseline methods and FLMD
variants in terms of fairness and accuracy, we assessed the performance of all
models on disturbed/imbalanced and synthetic datasets to showcase the
superiority of FLMD across different settings and provide valuable insights
into its capabilities.
Related papers
- Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models.
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.
Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data [52.55123685248105]
Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment.
Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality.
This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD.
arXiv Detail & Related papers (2024-10-28T02:24:01Z) - DrFuse: Learning Disentangled Representation for Clinical Multi-Modal
Fusion with Missing Modality and Modal Inconsistency [18.291267748113142]
We propose DrFuse to achieve effective clinical multi-modal fusion.
We address the missing modality issue by disentangling the features shared across modalities and those unique within each modality.
We validate the proposed method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR.
arXiv Detail & Related papers (2024-03-10T12:41:34Z) - FairEHR-CLP: Towards Fairness-Aware Clinical Predictions with Contrastive Learning in Multimodal Electronic Health Records [15.407593899656762]
We present FairEHR-CLP: a framework for fairness-aware Clinical Predictions with Contrastive Learning in EHRs.
FairEHR-CLP operates through a two-stage process, utilizing patient demographics, longitudinal data, and clinical notes.
We introduce a novel fairness metric to effectively measure error rate disparities across subgroups.
arXiv Detail & Related papers (2024-02-01T19:24:45Z) - Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data [6.596656267996196]
We propose the Fair Mixed Effects Deep Learning (Fair MEDL) framework.
This framework quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through: 1) a cluster adversary for learning invariant FE, 2) a Bayesian neural network for RE, and 3) a mixing function combining FE and RE for final predictions.
Fair MEDL framework improves fairness by 86.4% for Age, 64.9% for Race, 57.8% for Sex, and 36.2% for Marital status, while maintaining robust predictive performance.
arXiv Detail & Related papers (2023-10-04T20:18:45Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Mitigating Health Disparities in EHR via Deconfounder [5.511343163506091]
We propose a novel framework, Parity Medical Deconfounder (PriMeD), to deal with the disparity issue in healthcare datasets.
PriMeD adopts a Conditional Variational Autoencoder (CVAE) to learn latent factors (substitute confounders) for observational data.
arXiv Detail & Related papers (2022-10-28T05:16:50Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - EVA: Generating Longitudinal Electronic Health Records Using Conditional
Variational Autoencoders [34.22731849545798]
We propose EHR Variational Autoencoder (EVA) for synthesizing sequences of discrete EHR encounters and encounter features.
We illustrate that EVA can produce realistic sequences, account for individual differences among patients, and can be conditioned on specific disease conditions.
We assess the utility of the methods on large real-world EHR repositories containing over 250, 000 patients.
arXiv Detail & Related papers (2020-12-18T02:37:49Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.