Conditional Generation of Medical Time Series for Extrapolation to
Underrepresented Populations
- URL: http://arxiv.org/abs/2201.08186v1
- Date: Thu, 20 Jan 2022 14:04:21 GMT
- Title: Conditional Generation of Medical Time Series for Extrapolation to
Underrepresented Populations
- Authors: Simon Bing, Andrea Dittadi, Stefan Bauer, Patrick Schwab
- Abstract summary: HealthGen generates synthetic cohorts that are more faithful to real patient EHRs than the current state-of-the-art.
augmenting real data sets with conditionally generated cohorts of underrepresented subpopulations of patients can significantly enhance the generalisability of models.
- Score: 27.49371449726921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread adoption of electronic health records (EHRs) and subsequent
increased availability of longitudinal healthcare data has led to significant
advances in our understanding of health and disease with direct and immediate
impact on the development of new diagnostics and therapeutic treatment options.
However, access to EHRs is often restricted due to their perceived sensitive
nature and associated legal concerns, and the cohorts therein typically are
those seen at a specific hospital or network of hospitals and therefore not
representative of the wider population of patients. Here, we present HealthGen,
a new approach for the conditional generation of synthetic EHRs that maintains
an accurate representation of real patient characteristics, temporal
information and missingness patterns. We demonstrate experimentally that
HealthGen generates synthetic cohorts that are significantly more faithful to
real patient EHRs than the current state-of-the-art, and that augmenting real
data sets with conditionally generated cohorts of underrepresented
subpopulations of patients can significantly enhance the generalisability of
models derived from these data sets to different patient populations. Synthetic
conditionally generated EHRs could help increase the accessibility of
longitudinal healthcare data sets and improve the generalisability of
inferences made from these data sets to underrepresented populations.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Guided Discrete Diffusion for Electronic Health Record Generation [47.129056768385084]
EHRs are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research.
Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases.
To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs.
arXiv Detail & Related papers (2024-04-18T16:50:46Z) - CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines [14.386260536090628]
We focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation.
This enables us to generate patient sequences that can be seamlessly converted to the Observational Medical outcomes Partnership (OMOP) data format.
arXiv Detail & Related papers (2024-02-06T20:58:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Integrated Convolutional and Recurrent Neural Networks for Health Risk
Prediction using Patient Journey Data with Many Missing Values [9.418011774179794]
This paper proposes a novel end-to-end approach to modeling EHR patient journey data with Integrated Convolutional and Recurrent Neural Networks.
Our model can capture both long- and short-term temporal patterns within each patient journey and effectively handle the high degree of missingness in EHR data without any imputation data generation.
arXiv Detail & Related papers (2022-11-11T07:36:18Z) - Cumulative Stay-time Representation for Electronic Health Records in
Medical Event Time Prediction [8.261597797345342]
We propose a novel data representation for EHR called cumulative stay-time representation (CTR)
CTR directly models such cumulative health conditions.
We derive a trainable construction of CTR based on neural networks that has the flexibility to fit the target data.
arXiv Detail & Related papers (2022-04-28T12:34:41Z) - Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications [9.374416143268892]
generative adversarial network (GAN) entitled EHR-M-GAN which synthesizes textitmixed-type timeseries EHR data.
We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients.
arXiv Detail & Related papers (2021-12-22T17:17:34Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.