Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications
- URL: http://arxiv.org/abs/2112.12047v1
- Date: Wed, 22 Dec 2021 17:17:34 GMT
- Title: Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications
- Authors: Jin Li, Benjamin J. Cairns, Jingsong Li, Tingting Zhu
- Abstract summary: generative adversarial network (GAN) entitled EHR-M-GAN which synthesizes textitmixed-type timeseries EHR data.
We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients.
- Score: 9.374416143268892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent availability of electronic health records (EHRs) have provided
enormous opportunities to develop artificial intelligence (AI) algorithms.
However, patient privacy has become a major concern that limits data sharing
across hospital settings and subsequently hinders the advances in AI.
\textit{Synthetic data}, which benefits from the development and proliferation
of generative models, has served as a promising substitute for real patient EHR
data. However, the current generative models are limited as they only generate
\textit{single type} of clinical data, i.e., either continuous-valued or
discrete-valued. In this paper, we propose a generative adversarial network
(GAN) entitled EHR-M-GAN which synthesizes \textit{mixed-type} timeseries EHR
data. EHR-M-GAN is capable of capturing the multidimensional, heterogeneous,
and correlated temporal dynamics in patient trajectories. We have validated
EHR-M-GAN on three publicly-available intensive care unit databases with
records from a total of 141,488 unique patients, and performed privacy risk
evaluation of the proposed model. EHR-M-GAN has demonstrated its superiority in
performance over state-of-the-art benchmarks for synthesizing clinical
timeseries with high fidelity. Notably, prediction models for outcomes of
intensive care performed significantly better when training data was augmented
with the addition of EHR-M-GAN-generated timeseries. EHR-M-GAN may have use in
developing AI algorithms in resource-limited settings, lowering the barrier for
data acquisition while preserving patient privacy.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Guided Discrete Diffusion for Electronic Health Record Generation [47.129056768385084]
EHRs are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research.
Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases.
To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs.
arXiv Detail & Related papers (2024-04-18T16:50:46Z) - CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines [14.386260536090628]
We focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation.
This enables us to generate patient sequences that can be seamlessly converted to the Observational Medical outcomes Partnership (OMOP) data format.
arXiv Detail & Related papers (2024-02-06T20:58:36Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Multi-Label Clinical Time-Series Generation via Conditional GAN [23.380183382491495]
We propose a Multi-label Time-series GAN (MTGAN) to generate EHR data and imbalanced uncommon diseases.
The critic gives scores using Wasserstein distance to recognize real samples from synthetic samples by considering both data and temporal features.
Experimental results demonstrate the quality of the synthetic data and the effectiveness of MTGAN in generating realistic sequential EHR data.
arXiv Detail & Related papers (2022-04-10T23:30:07Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.