Categorical EHR Imputation with Generative Adversarial Nets
- URL: http://arxiv.org/abs/2108.01701v2
- Date: Thu, 5 Aug 2021 07:50:47 GMT
- Title: Categorical EHR Imputation with Generative Adversarial Nets
- Authors: Yinchong Yang, Zhiliang Wu, Volker Tresp, Peter A. Fasching
- Abstract summary: We propose a simple and yet effective approach that is based on previous work on GANs for data imputation.
We show that our imputation approach largely improves the prediction accuracy, compared to more traditional data imputation approaches.
- Score: 11.171712535005357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic Health Records often suffer from missing data, which poses a major
problem in clinical practice and clinical studies. A novel approach for dealing
with missing data are Generative Adversarial Nets (GANs), which have been
generating huge research interest in image generation and transformation.
Recently, researchers have attempted to apply GANs to missing data generation
and imputation for EHR data: a major challenge here is the categorical nature
of the data. State-of-the-art solutions to the GAN-based generation of
categorical data involve either reinforcement learning, or learning a
bidirectional mapping between the categorical and the real latent feature
space, so that the GANs only need to generate real-valued features. However,
these methods are designed to generate complete feature vectors instead of
imputing only the subsets of missing features. In this paper we propose a
simple and yet effective approach that is based on previous work on GANs for
data imputation. We first motivate our solution by discussing the reason why
adversarial training often fails in case of categorical features. Then we
derive a novel way to re-code the categorical features to stabilize the
adversarial training. Based on experiments on two real-world EHR data with
multiple settings, we show that our imputation approach largely improves the
prediction accuracy, compared to more traditional data imputation approaches.
Related papers
- Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Generative Adversarial Network Based Synthetic Learning and a Novel
Domain Relevant Loss Term for Spine Radiographs [0.0]
We accomplish GAN generation of synthetic spine radiographs without meaningful input for the first time from a literature review.
The introduction of a new clinical loss term for the generator was found to increase generation recall as well as accelerate model training.
arXiv Detail & Related papers (2022-05-05T03:58:19Z) - Analysis of lifelog data using optimal feature selection based
unsupervised logistic regression (OFS-ULR) for chronic disease classification [2.3909933791900326]
Chronic disease classification models are now harnessing the potential of lifelog data to explore better healthcare practices.
This paper is to construct an optimal feature selection-based unsupervised logistic regression model (OFS-ULR) to classify chronic diseases.
arXiv Detail & Related papers (2022-04-04T07:11:26Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - The Imaginative Generative Adversarial Network: Automatic Data
Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action
Recognition [27.795763107984286]
We present a novel automatic data augmentation model, which approximates the distribution of the input data and samples new data from this distribution.
Our results show that the augmentation strategy is fast to train and can improve classification accuracy for both neural networks and state-of-the-art methods.
arXiv Detail & Related papers (2021-05-27T11:07:09Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation
Networks for Incomplete Data [19.952411963344556]
PC-GAIN is a novel unsupervised missing data imputation method named PC-GAIN.
We first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data.
Then an auxiliary classifier is determined using the synthetic pseudo-labels.
arXiv Detail & Related papers (2020-11-16T08:08:26Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Generation of Differentially Private Heterogeneous Electronic Health
Records [9.926231893220061]
We explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs.
We will explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets.
arXiv Detail & Related papers (2020-06-05T13:21:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.