Synthetic Time Series Data Generation for Healthcare Applications: A PCG Case Study
- URL: http://arxiv.org/abs/2412.16207v1
- Date: Tue, 17 Dec 2024 18:07:40 GMT
- Title: Synthetic Time Series Data Generation for Healthcare Applications: A PCG Case Study
- Authors: Ainaz Jamshidi, Muhammad Arif, Sabir Ali Kalhoro, Alexander Gelbukh,
- Abstract summary: We employ and compare three state-of-the-art generative models to generate PCG data.
Our results demonstrate that the generated PCG data closely resembles the original datasets.
In our future work, we plan to incorporate this method into a data augmentation pipeline to synthesize abnormal PCG signals with heart murmurs.
- Score: 43.28613210217385
- License:
- Abstract: The generation of high-quality medical time series data is essential for advancing healthcare diagnostics and safeguarding patient privacy. Specifically, synthesizing realistic phonocardiogram (PCG) signals offers significant potential as a cost-effective and efficient tool for cardiac disease pre-screening. Despite its potential, the synthesis of PCG signals for this specific application received limited attention in research. In this study, we employ and compare three state-of-the-art generative models from different categories - WaveNet, DoppelGANger, and DiffWave - to generate high-quality PCG data. We use data from the George B. Moody PhysioNet Challenge 2022. Our methods are evaluated using various metrics widely used in the previous literature in the domain of time series data generation, such as mean absolute error and maximum mean discrepancy. Our results demonstrate that the generated PCG data closely resembles the original datasets, indicating the effectiveness of our generative models in producing realistic synthetic PCG data. In our future work, we plan to incorporate this method into a data augmentation pipeline to synthesize abnormal PCG signals with heart murmurs, in order to address the current scarcity of abnormal data. We hope to improve the robustness and accuracy of diagnostic tools in cardiology, enhancing their effectiveness in detecting heart murmurs.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines [14.386260536090628]
We focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation.
This enables us to generate patient sequences that can be seamlessly converted to the Observational Medical outcomes Partnership (OMOP) data format.
arXiv Detail & Related papers (2024-02-06T20:58:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Multi-Label Clinical Time-Series Generation via Conditional GAN [23.380183382491495]
We propose a Multi-label Time-series GAN (MTGAN) to generate EHR data and imbalanced uncommon diseases.
The critic gives scores using Wasserstein distance to recognize real samples from synthetic samples by considering both data and temporal features.
Experimental results demonstrate the quality of the synthetic data and the effectiveness of MTGAN in generating realistic sequential EHR data.
arXiv Detail & Related papers (2022-04-10T23:30:07Z) - Synthetic ECG Signal Generation Using Generative Neural Networks [7.122393663641668]
We studied the synthetic ECG generation capability of 5 different models from the generative adversarial network (GAN) family.
The results show that all the tested models can to an extent successfully mass-generate acceptable heartbeats with high similarity in morphological features.
arXiv Detail & Related papers (2021-12-05T20:28:55Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - Training neural networks with synthetic electrocardiograms [3.1583465114791105]
We present a method for training neural networks with synthetic electrocardiograms that mimic signals produced by a wearable single lead electrocardiogram monitor.
We use domain randomization where the synthetic signal properties such as the waveform shape, RR-intervals and noise are varied for every training example.
Models trained with synthetic data are compared to their counterparts trained with real data.
Experiments show robust performance with different seeds and training examples on different test sets without any test set specific tuning.
arXiv Detail & Related papers (2021-11-11T12:39:33Z) - Improving the efficacy of Deep Learning models for Heart Beat detection
on heterogeneous datasets [0.0]
We investigate the issues related to applying a Deep Learning model on heterogeneous datasets.
We show that the performance of a model trained on data from healthy subjects decreases when applied to patients with cardiac conditions.
We then evaluate the use of Transfer Learning to adapt the model to the different datasets.
arXiv Detail & Related papers (2021-10-26T14:26:55Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.