Missing Features Reconstruction Using a Wasserstein Generative
Adversarial Imputation Network
- URL: http://arxiv.org/abs/2006.11783v1
- Date: Sun, 21 Jun 2020 11:53:55 GMT
- Title: Missing Features Reconstruction Using a Wasserstein Generative
Adversarial Imputation Network
- Authors: Magda Friedjungov\'a, Daniel Va\v{s}ata, Maksym Balatsko and Marcel
Ji\v{r}ina
- Abstract summary: We experimentally research the use of generative and non-generative models for feature reconstruction.
Generative Autoencoder with Arbitrary Conditioning (VAEAC) and Generative Adversarial Imputation Network (GAIN) were researched as representatives of generative models.
We introduce WGAIN as the Wasserstein modification of GAIN, which turns out to be the best imputation model when the degree of missingness is less than or equal to 30%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Missing data is one of the most common preprocessing problems. In this paper,
we experimentally research the use of generative and non-generative models for
feature reconstruction. Variational Autoencoder with Arbitrary Conditioning
(VAEAC) and Generative Adversarial Imputation Network (GAIN) were researched as
representatives of generative models, while the denoising autoencoder (DAE)
represented non-generative models. Performance of the models is compared to
traditional methods k-nearest neighbors (k-NN) and Multiple Imputation by
Chained Equations (MICE). Moreover, we introduce WGAIN as the Wasserstein
modification of GAIN, which turns out to be the best imputation model when the
degree of missingness is less than or equal to 30%. Experiments were performed
on real-world and artificial datasets with continuous features where different
percentages of features, varying from 10% to 50%, were missing. Evaluation of
algorithms was done by measuring the accuracy of the classification model
previously trained on the uncorrupted dataset. The results show that GAIN and
especially WGAIN are the best imputers regardless of the conditions. In
general, they outperform or are comparative to MICE, k-NN, DAE, and VAEAC.
Related papers
- Predicting Critical Heat Flux with Uncertainty Quantification and Domain Generalization Using Conditional Variational Autoencoders and Deep Neural Networks [2.517043342442487]
We develop a conditional variational autoencoder model to augment the critical heat flux measurement data.
A fine-tuned deep neural network (DNN) regression model was created and evaluated with the same dataset.
The CVAE model was shown to have significantly less variability and a higher confidence after assessment of the prediction-wise relative standard deviations.
arXiv Detail & Related papers (2024-09-09T16:50:41Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Compound Density Networks for Risk Prediction using Electronic Health
Records [1.1786249372283562]
We propose an integrated end-to-end approach by utilizing a Compound Density Network (CDNet)
CDNet allows the imputation method and prediction model to be tuned together within a single framework.
We validate CDNet on the mortality prediction task on the MIMIC-III dataset.
arXiv Detail & Related papers (2022-08-02T09:04:20Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - IFGAN: Missing Value Imputation using Feature-specific Generative
Adversarial Networks [14.714106979097222]
We propose IFGAN, a missing value imputation algorithm based on Feature-specific Generative Adversarial Networks (GAN)
A feature-specific generator is trained to impute missing values, while a discriminator is expected to distinguish the imputed values from observed ones.
We empirically show on several real-life datasets that IFGAN outperforms current state-of-the-art algorithm under various missing conditions.
arXiv Detail & Related papers (2020-12-23T10:14:35Z) - PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation
Networks for Incomplete Data [19.952411963344556]
PC-GAIN is a novel unsupervised missing data imputation method named PC-GAIN.
We first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data.
Then an auxiliary classifier is determined using the synthetic pseudo-labels.
arXiv Detail & Related papers (2020-11-16T08:08:26Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.