Transformed Distribution Matching for Missing Value Imputation
- URL: http://arxiv.org/abs/2302.10363v2
- Date: Fri, 23 Jun 2023 01:26:36 GMT
- Title: Transformed Distribution Matching for Missing Value Imputation
- Authors: He Zhao, Ke Sun, Amir Dezfouli, Edwin Bonilla
- Abstract summary: Key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly.
In this paper, we propose to impute the missing values of two batches of data by transforming them into a latent space through deep invertible functions.
To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed.
- Score: 7.754689608872696
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study the problem of imputing missing values in a dataset, which has
important applications in many domains. The key to missing value imputation is
to capture the data distribution with incomplete samples and impute the missing
values accordingly. In this paper, by leveraging the fact that any two batches
of data with missing values come from the same data distribution, we propose to
impute the missing values of two batches of samples by transforming them into a
latent space through deep invertible functions and matching them
distributionally. To learn the transformations and impute the missing values
simultaneously, a simple and well-motivated algorithm is proposed. Our
algorithm has fewer hyperparameters to fine-tune and generates high-quality
imputations regardless of how missing values are generated. Extensive
experiments over a large number of datasets and competing benchmark algorithms
show that our method achieves state-of-the-art performance.
Related papers
- Iterative missing value imputation based on feature importance [6.300806721275004]
We have designed an imputation method that considers feature importance.
This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance.
The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.
arXiv Detail & Related papers (2023-11-14T09:03:33Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Chains of Autoreplicative Random Forests for missing value imputation in
high-dimensional datasets [1.5076964620370268]
Missing values are a common problem in data science and machine learning.
We consider missing value imputation as a multi-label classification problem and propose Chains of Autoreplicative Random Forests.
Our algorithm effectively imputes missing values based only on information from the dataset.
arXiv Detail & Related papers (2023-01-02T10:53:52Z) - Minimax rate of consistency for linear models with missing values [0.0]
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).
In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task.
This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets.
arXiv Detail & Related papers (2022-02-03T08:45:34Z) - IFGAN: Missing Value Imputation using Feature-specific Generative
Adversarial Networks [14.714106979097222]
We propose IFGAN, a missing value imputation algorithm based on Feature-specific Generative Adversarial Networks (GAN)
A feature-specific generator is trained to impute missing values, while a discriminator is expected to distinguish the imputed values from observed ones.
We empirically show on several real-life datasets that IFGAN outperforms current state-of-the-art algorithm under various missing conditions.
arXiv Detail & Related papers (2020-12-23T10:14:35Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z) - A Unified Framework of Surrogate Loss by Refactoring and Interpolation [65.60014616444623]
We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent.
We validate the effectiveness of UniLoss on three tasks and four datasets.
arXiv Detail & Related papers (2020-07-27T21:16:51Z) - Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and
Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval.
We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing.
This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Missing Data Imputation using Optimal Transport [43.14084843713895]
We leverage optimal transport distances to quantify a criterion and turn it into a loss function to impute missing data values.
We propose practical methods to minimize these losses using end-to-end learning.
These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values.
arXiv Detail & Related papers (2020-02-10T15:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.