Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks
- URL: http://arxiv.org/abs/2406.02510v2
- Date: Sun, 22 Sep 2024 19:54:14 GMT
- Title: Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks
- Authors: Mirza Farhan Bin Tarek, Raphael Poulain, Rahmatollah Beheshti,
- Abstract summary: We present a new pipeline that generates synthetic EHR data consistent with (faithful to) the real EHR data.
We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets.
Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications.
- Score: 2.089191490381739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications, such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks [3.3903891679981593]
We present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain.
Our results demonstrate that Bt-GAN achieves SOTA accuracy while significantly improving fairness and minimizing bias.
arXiv Detail & Related papers (2024-04-21T12:16:38Z) - Guided Discrete Diffusion for Electronic Health Record Generation [47.129056768385084]
EHRs are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research.
Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases.
To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs.
arXiv Detail & Related papers (2024-04-18T16:50:46Z) - HealthGAT: Node Classifications in Electronic Health Records using Graph Attention Networks [2.2026317523029193]
HealthGAT is a graph attention network framework that generates embeddings from EHR.
Our model iteratively refines the embeddings for medical codes, resulting in improved EHR data analysis.
Our model shows outstanding performance in node classification and downstream tasks such as predicting readmissions and diagnosis classifications.
arXiv Detail & Related papers (2024-03-26T22:17:01Z) - Automated Multi-Task Learning for Joint Disease Prediction on Electronic Health Records [4.159498069487535]
We propose an automated approach named AutoDP, which can search for the optimal configuration of task grouping and architectures simultaneously.
It achieves significant performance improvements over both hand-crafted and automated state-of-the-art methods, also maintains a feasible search cost at the same time.
arXiv Detail & Related papers (2024-03-06T22:32:48Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - A review of Generative Adversarial Networks for Electronic Health
Records: applications, evaluation measures and data sources [8.319639237899155]
Generative Adversarial Networks (GANs) show great promise in generating synthetic EHR data by learning underlying data distributions.
This work aims to review the major developments in various applications of GANs for EHRs and provides an overview of the proposed methodologies.
We conclude by discussing challenges in GANs for EHRs development and proposing recommended practices.
arXiv Detail & Related papers (2022-03-14T11:56:47Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Deep-HR: Fast Heart Rate Estimation from Face Video Under Realistic
Conditions [62.68522031911656]
We propose a simple yet efficient approach to benefit the advantages of the Deep Neural Network (DNN) by simplifying HR estimation from a complex task to learning from very correlated representation to HR.
To be more accurate and work well on low-quality videos, two deep encoder-decoder networks are trained to refine the output of FE.
Experimental results on HR-D and MAHNOB datasets confirm that our method could run as a real-time method and estimate the average HR better than state-of-the-art ones.
arXiv Detail & Related papers (2020-02-12T07:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.