Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems
- URL: http://arxiv.org/abs/2004.02584v1
- Date: Mon, 6 Apr 2020 12:00:30 GMT
- Title: Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems
- Authors: Najmeh Abiri, Bj\"orn Linse, Patrik Ed\'en and Mattias Ohlsson
- Abstract summary: We develop a consistent framework for both training and imputation.
We benchmarked the results against state-of-the-art imputation methods.
The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Dealing with missing data in data analysis is inevitable. Although powerful
imputation methods that address this problem exist, there is still much room
for improvement. In this study, we examined single imputation based on deep
autoencoders, motivated by the apparent success of deep learning to efficiently
extract useful dataset features. We have developed a consistent framework for
both training and imputation. Moreover, we benchmarked the results against
state-of-the-art imputation methods on different data sizes and
characteristics. The work was not limited to the one-type variable dataset; we
also imputed missing data with multi-type variables, e.g., a combination of
binary, categorical, and continuous attributes. To evaluate the imputation
methods, we randomly corrupted the complete data, with varying degrees of
corruption, and then compared the imputed and original values. In all
experiments, the developed autoencoder obtained the smallest error for all
ranges of initial data corruption.
Related papers
- Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Explainable Data Imputation using Constraints [4.674053902991301]
We present a new algorithm for data imputation based on different data type values and their association constraints in data.
Our algorithm not only imputes the missing values but also generates human readable explanations describing the significance of attributes used for every imputation.
arXiv Detail & Related papers (2022-05-10T08:06:26Z) - Latent Vector Expansion using Autoencoder for Anomaly Detection [1.370633147306388]
We use the features of the autoencoder to train latent vectors from low to high dimensionality.
We propose a latent vector expansion autoencoder model that improves classification performance at imbalanced data.
arXiv Detail & Related papers (2022-01-05T02:28:38Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z) - Causal Discovery from Incomplete Data using An Encoder and Reinforcement
Learning [2.4469484645516837]
We propose an approach to discover causal structures from incomplete data by using a novel encoder and reinforcement learning (RL)
The encoder is designed for missing data imputation as well as feature extraction.
Our method takes the incomplete observational data as input and generates a causal structure graph.
arXiv Detail & Related papers (2020-06-09T23:33:47Z) - Multiple Imputation with Denoising Autoencoder using Metamorphic Truth
and Imputation Feedback [0.0]
We propose a Multiple Imputation model using Denoising Autoencoders to learn the internal representation of data.
We use the novel mechanisms of Metamorphic Truth and Imputation Feedback to maintain statistical integrity of attributes.
Our approach explores the effects of imputation on various missingness mechanisms and patterns of missing data, outperforming other methods in many standard test cases.
arXiv Detail & Related papers (2020-02-19T18:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.