Generating Synthetic Clinical Data that Capture Class Imbalanced
Distributions with Generative Adversarial Networks: Example using
Antiretroviral Therapy for HIV
- URL: http://arxiv.org/abs/2208.08655v1
- Date: Thu, 18 Aug 2022 06:19:46 GMT
- Title: Generating Synthetic Clinical Data that Capture Class Imbalanced
Distributions with Generative Adversarial Networks: Example using
Antiretroviral Therapy for HIV
- Authors: Nicholas I-Hsien Kuo, Louisa Jorm and Sebastiano Barbieri
- Abstract summary: We extend the classic GAN setup with an external memory to replay features from real samples.
We show that our extended setup increases convergence and more importantly, it is effective in capturing the severe class imbalanced distributions common to real world clinical data.
- Score: 2.140861702387444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical data usually cannot be freely distributed due to their highly
confidential nature and this hampers the development of machine learning in the
healthcare domain. One way to mitigate this problem is by generating realistic
synthetic datasets using generative adversarial networks (GANs). However, GANs
are known to suffer from mode collapse and thus creating outputs of low
diveristy. In this paper, we extend the classic GAN setup with an external
memory to replay features from real samples. Using antiretroviral therapy for
human immunodeficiency virus (ART for HIV) as a case study, we show that our
extended setup increases convergence and more importantly, it is effective in
capturing the severe class imbalanced distributions common to real world
clinical data.
Related papers
- A graph neural network-based model with Out-of-Distribution Robustness
for enhancing Antiretroviral Therapy Outcome Prediction for HIV-1 [5.111166539327379]
We introduce a novel joint fusion model, which combines features from a Fully Connected Neural Network and a Graph Neural Network.
We evaluate these models' robustness against Out-of-Distribution drugs in the test set.
arXiv Detail & Related papers (2023-12-29T08:02:13Z) - Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted
Imaging Data via Anatomic-Conditional Controlled Latent Diffusion [68.45407109385306]
In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022.
There has been significant interest in the development of deep neural networks for prostate cancer diagnosis, prognosis, and treatment planning using diffusion weighted imaging (DWI) data.
In this study, we explore the efficacy of latent diffusion for generating realistic prostate DWI data through the introduction of an anatomic-conditional controlled latent diffusion strategy.
arXiv Detail & Related papers (2023-11-30T15:11:03Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Generative Adversarial Networks for Data Augmentation [0.0]
GANs have been utilized in medical image analysis for various tasks, including data augmentation, image creation, and domain adaptation.
GANs can generate synthetic samples that can be used to increase the available dataset.
It is essential to note that the use of GANs in medical imaging is still an active area of research to ensure that the produced images are of high quality and suitable for use in clinical settings.
arXiv Detail & Related papers (2023-06-03T06:33:33Z) - Generative Adversarial Network Based Synthetic Learning and a Novel
Domain Relevant Loss Term for Spine Radiographs [0.0]
We accomplish GAN generation of synthetic spine radiographs without meaningful input for the first time from a literature review.
The introduction of a new clinical loss term for the generator was found to increase generation recall as well as accelerate model training.
arXiv Detail & Related papers (2022-05-05T03:58:19Z) - The Health Gym: Synthetic Health-Related Datasets for the Development of
Reinforcement Learning Algorithms [2.032684842401705]
Health Gym is a collection of synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms.
The datasets were created using a novel generative adversarial network (GAN)
The risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.
arXiv Detail & Related papers (2022-03-12T07:28:02Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Distribution Approximation and Statistical Estimation Guarantees of
Generative Adversarial Networks [82.61546580149427]
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.
This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions with densities in a H"older space.
arXiv Detail & Related papers (2020-02-10T16:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.