MedDiff: Generating Electronic Health Records using Accelerated
Denoising Diffusion Model
- URL: http://arxiv.org/abs/2302.04355v1
- Date: Wed, 8 Feb 2023 22:06:34 GMT
- Title: MedDiff: Generating Electronic Health Records using Accelerated
Denoising Diffusion Model
- Authors: Huan He, Shifan Zhao, Yuanzhe Xi, Joyce C Ho
- Abstract summary: We present a novel generative model based on diffusion models that is the first successful application on electronic health records.
Our model proposes a mechanism to perform class-conditional sampling to preserve label information.
- Score: 5.677138915301383
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Due to patient privacy protection concerns, machine learning research in
healthcare has been undeniably slower and limited than in other application
domains. High-quality, realistic, synthetic electronic health records (EHRs)
can be leveraged to accelerate methodological developments for research
purposes while mitigating privacy concerns associated with data sharing. The
current state-of-the-art model for synthetic EHR generation is generative
adversarial networks, which are notoriously difficult to train and can suffer
from mode collapse. Denoising Diffusion Probabilistic Models, a class of
generative models inspired by statistical thermodynamics, have recently been
shown to generate high-quality synthetic samples in certain domains. It is
unknown whether these can generalize to generation of large-scale,
high-dimensional EHRs. In this paper, we present a novel generative model based
on diffusion models that is the first successful application on electronic
health records. Our model proposes a mechanism to perform class-conditional
sampling to preserve label information. We also introduce a new sampling
strategy to accelerate the inference speed. We empirically show that our model
outperforms existing state-of-the-art synthetic EHR generation methods.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Guided Discrete Diffusion for Electronic Health Record Generation [47.129056768385084]
EHRs are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research.
Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases.
To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs.
arXiv Detail & Related papers (2024-04-18T16:50:46Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models [8.799590232822752]
Privacy concerns have resulted in limited access to high-quality and large-scale EHR data for researchers.
Recent research has delved into synthesizing realistic EHR data through generative modeling techniques.
In this study, we investigate the potential of diffusion models for EHR data synthesis and introduce a novel method, EHRDiff.
arXiv Detail & Related papers (2023-03-10T02:15:58Z) - Synthesizing Mixed-type Electronic Health Records using Diffusion Models [10.973115905786129]
Synthetic data generation is a promising solution to mitigate privacy concerns when sharing sensitive patient information.
Recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound.
Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which confirms the trade-off between privacy and utility.
arXiv Detail & Related papers (2023-02-28T15:42:30Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications [9.374416143268892]
generative adversarial network (GAN) entitled EHR-M-GAN which synthesizes textitmixed-type timeseries EHR data.
We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients.
arXiv Detail & Related papers (2021-12-22T17:17:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.