Related papers: A Counterfactual Fair Model for Longitudinal Electronic Health Records via Deconfounder

A Counterfactual Fair Model for Longitudinal Electronic Health Records via Deconfounder

URL: http://arxiv.org/abs/2308.11819v3
Date: Mon, 2 Oct 2023 17:46:40 GMT
Title: A Counterfactual Fair Model for Longitudinal Electronic Health Records via Deconfounder
Authors: Zheng Liu, Xiaohan Li and Philip Yu
Abstract summary: We propose a novel model called Fair Longitudinal Medical Deconfounder (FLMD) FLMD aims to achieve both fairness and accuracy in longitudinal Electronic Health Records (EHR) modeling. We conducted comprehensive experiments on two real-world EHR datasets to demonstrate the effectiveness of FLMD.
Score: 5.198621505969445
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The fairness issue of clinical data modeling, especially on Electronic Health Records (EHRs), is of utmost importance due to EHR's complex latent structure and potential selection bias. It is frequently necessary to mitigate health disparity while keeping the model's overall accuracy in practice. However, traditional methods often encounter the trade-off between accuracy and fairness, as they fail to capture the underlying factors beyond observed data. To tackle this challenge, we propose a novel model called Fair Longitudinal Medical Deconfounder (FLMD) that aims to achieve both fairness and accuracy in longitudinal Electronic Health Records (EHR) modeling. Drawing inspiration from the deconfounder theory, FLMD employs a two-stage training process. In the first stage, FLMD captures unobserved confounders for each encounter, which effectively represents underlying medical factors beyond observed EHR, such as patient genotypes and lifestyle habits. This unobserved confounder is crucial for addressing the accuracy/fairness dilemma. In the second stage, FLMD combines the learned latent representation with other relevant features to make predictions. By incorporating appropriate fairness criteria, such as counterfactual fairness, FLMD ensures that it maintains high prediction accuracy while simultaneously minimizing health disparities. We conducted comprehensive experiments on two real-world EHR datasets to demonstrate the effectiveness of FLMD. Apart from the comparison of baseline methods and FLMD variants in terms of fairness and accuracy, we assessed the performance of all models on disturbed/imbalanced and synthetic datasets to showcase the superiority of FLMD across different settings and provide valuable insights into its capabilities.

Related papers

Equitable Electronic Health Record Prediction with FAME: Fairness-Aware Multimodal Embedding [4.383326688441244]
We introduce FAME (Fairness-Aware Multimodal Embeddings), a framework that explicitly weights each modality according to its fairness contribution.<n>We leverage the Error Distribution Disparity Index (EDDI) to measure fairness across subgroups.<n>We demonstrate FAME's effectiveness in terms of performance and fairness compared with other baselines.
arXiv Detail & Related papers (2025-06-16T05:23:42Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data [52.55123685248105]
Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment. Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality. This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD.
arXiv Detail & Related papers (2024-10-28T02:24:01Z)
Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z)
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency [18.291267748113142]
We propose DrFuse to achieve effective clinical multi-modal fusion. We address the missing modality issue by disentangling the features shared across modalities and those unique within each modality. We validate the proposed method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR.
arXiv Detail & Related papers (2024-03-10T12:41:34Z)
FairEHR-CLP: Towards Fairness-Aware Clinical Predictions with Contrastive Learning in Multimodal Electronic Health Records [15.407593899656762]
We present FairEHR-CLP: a framework for fairness-aware Clinical Predictions with Contrastive Learning in EHRs. FairEHR-CLP operates through a two-stage process, utilizing patient demographics, longitudinal data, and clinical notes. We introduce a novel fairness metric to effectively measure error rate disparities across subgroups.
arXiv Detail & Related papers (2024-02-01T19:24:45Z)
Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data [6.596656267996196]
We propose the Fair Mixed Effects Deep Learning (Fair MEDL) framework. This framework quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through: 1) a cluster adversary for learning invariant FE, 2) a Bayesian neural network for RE, and 3) a mixing function combining FE and RE for final predictions. Fair MEDL framework improves fairness by 86.4% for Age, 64.9% for Race, 57.8% for Sex, and 36.2% for Marital status, while maintaining robust predictive performance.
arXiv Detail & Related papers (2023-10-04T20:18:45Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials. We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z)
Fair Patient Model: Mitigating Bias in the Patient Representation Learned from the Electronic Health Records [7.467693938220289]
We applied the proposed model, called Fair Patient Model (FPM), to a sample of 34,739 patients from the MIMIC-III dataset. FPM outperformed the baseline models in terms of three fairness metrics: demographic parity, equality of opportunity difference, and equalized odds ratio.
arXiv Detail & Related papers (2023-06-05T18:40:35Z)
Mitigating Health Disparities in EHR via Deconfounder [5.511343163506091]
We propose a novel framework, Parity Medical Deconfounder (PriMeD), to deal with the disparity issue in healthcare datasets. PriMeD adopts a Conditional Variational Autoencoder (CVAE) to learn latent factors (substitute confounders) for observational data.
arXiv Detail & Related papers (2022-10-28T05:16:50Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
EVA: Generating Longitudinal Electronic Health Records Using Conditional Variational Autoencoders [34.22731849545798]
We propose EHR Variational Autoencoder (EVA) for synthesizing sequences of discrete EHR encounters and encounter features. We illustrate that EVA can produce realistic sequences, account for individual differences among patients, and can be conditioned on specific disease conditions. We assess the utility of the methods on large real-world EHR repositories containing over 250, 000 patients.
arXiv Detail & Related papers (2020-12-18T02:37:49Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.