EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models
- URL: http://arxiv.org/abs/2303.05656v3
- Date: Sun, 24 Mar 2024 02:43:55 GMT
- Title: EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models
- Authors: Hongyi Yuan, Songchi Zhou, Sheng Yu,
- Abstract summary: Privacy concerns have resulted in limited access to high-quality and large-scale EHR data for researchers.
Recent research has delved into synthesizing realistic EHR data through generative modeling techniques.
In this study, we investigate the potential of diffusion models for EHR data synthesis and introduce a novel method, EHRDiff.
- Score: 8.799590232822752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic health records (EHR) contain a wealth of biomedical information, serving as valuable resources for the development of precision medicine systems. However, privacy concerns have resulted in limited access to high-quality and large-scale EHR data for researchers, impeding progress in methodological development. Recent research has delved into synthesizing realistic EHR data through generative modeling techniques, where a majority of proposed methods relied on generative adversarial networks (GAN) and their variants for EHR synthesis. Despite GAN-based methods attaining state-of-the-art performance in generating EHR data, these approaches are difficult to train and prone to mode collapse. Recently introduced in generative modeling, diffusion models have established cutting-edge performance in image generation, but their efficacy in EHR data synthesis remains largely unexplored. In this study, we investigate the potential of diffusion models for EHR data synthesis and introduce a novel method, EHRDiff. Through extensive experiments, EHRDiff establishes new state-of-the-art quality for synthetic EHR data, protecting private information in the meanwhile.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Guided Discrete Diffusion for Electronic Health Record Generation [47.129056768385084]
EHRs are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research.
Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases.
To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs.
arXiv Detail & Related papers (2024-04-18T16:50:46Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - Automated Fusion of Multimodal Electronic Health Records for Better
Medical Predictions [48.0590120095748]
We propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies.
We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework achieves significant performance improvement over existing state-of-the-art methods.
arXiv Detail & Related papers (2024-01-20T15:14:14Z) - Reliable Generation of Privacy-preserving Synthetic Electronic Health Record Time Series via Diffusion Models [4.240899165468488]
Electronic Health Records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis.
However, privacy concerns often restrict access to EHRs, hindering downstream analysis.
This study aims to overcome these challenges by generating realistic and privacy-preserving synthetic EHR time series efficiently.
arXiv Detail & Related papers (2023-10-23T18:56:01Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - MedDiff: Generating Electronic Health Records using Accelerated
Denoising Diffusion Model [5.677138915301383]
We present a novel generative model based on diffusion models that is the first successful application on electronic health records.
Our model proposes a mechanism to perform class-conditional sampling to preserve label information.
arXiv Detail & Related papers (2023-02-08T22:06:34Z) - Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications [9.374416143268892]
generative adversarial network (GAN) entitled EHR-M-GAN which synthesizes textitmixed-type timeseries EHR data.
We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients.
arXiv Detail & Related papers (2021-12-22T17:17:34Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.