PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation
Networks for Incomplete Data
- URL: http://arxiv.org/abs/2011.07770v2
- Date: Tue, 6 Apr 2021 08:41:36 GMT
- Title: PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation
Networks for Incomplete Data
- Authors: Yufeng Wang, Dan Li, Xiang Li, Min Yang
- Abstract summary: PC-GAIN is a novel unsupervised missing data imputation method named PC-GAIN.
We first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data.
Then an auxiliary classifier is determined using the synthetic pseudo-labels.
- Score: 19.952411963344556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Datasets with missing values are very common in real world applications.
GAIN, a recently proposed deep generative model for missing data imputation,
has been proved to outperform many state-of-the-art methods. But GAIN only uses
a reconstruction loss in the generator to minimize the imputation error of the
non-missing part, ignoring the potential category information which can reflect
the relationship between samples. In this paper, we propose a novel
unsupervised missing data imputation method named PC-GAIN, which utilizes
potential category information to further enhance the imputation power.
Specifically, we first propose a pre-training procedure to learn potential
category information contained in a subset of low-missing-rate data. Then an
auxiliary classifier is determined using the synthetic pseudo-labels. Further,
this classifier is incorporated into the generative adversarial framework to
help the generator to yield higher quality imputation results. The proposed
method can improve the imputation quality of GAIN significantly. Experimental
results on various benchmark datasets show that our method is also superior to
other baseline approaches. Our code is available at
\url{https://github.com/WYu-Feng/pc-gain}.
Related papers
- Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Mutual Information Learned Classifiers: an Information-theoretic
Viewpoint of Training Deep Learning Classification Systems [9.660129425150926]
Cross entropy loss can easily lead us to find models which demonstrate severe overfitting behavior.
In this paper, we prove that the existing cross entropy loss minimization for training DNN classifiers essentially learns the conditional entropy of the underlying data distribution.
We propose a mutual information learning framework where we train DNN classifiers via learning the mutual information between the label and input.
arXiv Detail & Related papers (2022-10-03T15:09:19Z) - A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks.
We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels.
We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Categorical EHR Imputation with Generative Adversarial Nets [11.171712535005357]
We propose a simple and yet effective approach that is based on previous work on GANs for data imputation.
We show that our imputation approach largely improves the prediction accuracy, compared to more traditional data imputation approaches.
arXiv Detail & Related papers (2021-08-03T18:50:26Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - IFGAN: Missing Value Imputation using Feature-specific Generative
Adversarial Networks [14.714106979097222]
We propose IFGAN, a missing value imputation algorithm based on Feature-specific Generative Adversarial Networks (GAN)
A feature-specific generator is trained to impute missing values, while a discriminator is expected to distinguish the imputed values from observed ones.
We empirically show on several real-life datasets that IFGAN outperforms current state-of-the-art algorithm under various missing conditions.
arXiv Detail & Related papers (2020-12-23T10:14:35Z) - Imputation of Missing Data with Class Imbalance using Conditional
Generative Adversarial Networks [24.075691766743702]
We propose a new method for imputing missing data based on its class-specific characteristics.
Our Conditional Generative Adversarial Imputation Network (CGAIN) imputes the missing data using class-specific distributions.
We tested our approach on benchmark datasets and achieved superior performance compared with the state-of-the-art and popular imputation approaches.
arXiv Detail & Related papers (2020-12-01T02:26:54Z) - Missing Features Reconstruction Using a Wasserstein Generative
Adversarial Imputation Network [0.0]
We experimentally research the use of generative and non-generative models for feature reconstruction.
Generative Autoencoder with Arbitrary Conditioning (VAEAC) and Generative Adversarial Imputation Network (GAIN) were researched as representatives of generative models.
We introduce WGAIN as the Wasserstein modification of GAIN, which turns out to be the best imputation model when the degree of missingness is less than or equal to 30%.
arXiv Detail & Related papers (2020-06-21T11:53:55Z) - Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled
Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems.
We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data.
We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.