Deep Generative Imputation Model for Missing Not At Random Data
- URL: http://arxiv.org/abs/2308.08158v1
- Date: Wed, 16 Aug 2023 06:01:12 GMT
- Title: Deep Generative Imputation Model for Missing Not At Random Data
- Authors: Jialei Chen, Yuanbo Xu, Pengyang Wang, Yongjian Yang
- Abstract summary: We exploit a deep generative imputation model, namely GNR, to process the real-world missing mechanism in the latent space.
The experimental results show that our GNR surpasses state-of-the-art MNAR baselines with significant margins.
- Score: 13.56794299885683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data analysis usually suffers from the Missing Not At Random (MNAR) problem,
where the cause of the value missing is not fully observed. Compared to the
naive Missing Completely At Random (MCAR) problem, it is more in line with the
realistic scenario whereas more complex and challenging. Existing statistical
methods model the MNAR mechanism by different decomposition of the joint
distribution of the complete data and the missing mask. But we empirically find
that directly incorporating these statistical methods into deep generative
models is sub-optimal. Specifically, it would neglect the confidence of the
reconstructed mask during the MNAR imputation process, which leads to
insufficient information extraction and less-guaranteed imputation quality. In
this paper, we revisit the MNAR problem from a novel perspective that the
complete data and missing mask are two modalities of incomplete data on an
equal footing. Along with this line, we put forward a generative-model-specific
joint probability decomposition method, conjunction model, to represent the
distributions of two modalities in parallel and extract sufficient information
from both complete data and missing mask. Taking a step further, we exploit a
deep generative imputation model, namely GNR, to process the real-world missing
mechanism in the latent space and concurrently impute the incomplete data and
reconstruct the missing mask. The experimental results show that our GNR
surpasses state-of-the-art MNAR baselines with significant margins (averagely
improved from 9.9% to 18.8% in RMSE) and always gives a better mask
reconstruction accuracy which makes the imputation more principle.
Related papers
- Sufficient Identification Conditions and Semiparametric Estimation under
Missing Not at Random Mechanisms [4.211128681972148]
Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data.
We consider a MNAR model that generalizes several prior popular MNAR models in two ways.
We propose methods for testing the independence restrictions encoded in such models using odds ratio as our parameter of interest.
arXiv Detail & Related papers (2023-06-10T13:46:16Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - Model-based Clustering with Missing Not At Random Data [0.8777702580252754]
We propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data.
Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership.
We focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership.
arXiv Detail & Related papers (2021-12-20T09:52:12Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Identifiable Generative Models for Missing Not at Random Data Imputation [13.790820495804567]
Many imputation methods do not take into account the missingness mechanism, resulting in biased imputation values when MNAR data is present.
In this work, we analyze the identifiability of generative models under MNAR.
We propose a practical deep generative model which can provide identifiability guarantees under mild assumptions.
arXiv Detail & Related papers (2021-10-27T18:51:38Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Masksembles for Uncertainty Estimation [60.400102501013784]
Deep neural networks have amply demonstrated their prowess but estimating the reliability of their predictions remains challenging.
Deep Ensembles are widely considered as being one of the best methods for generating uncertainty estimates but are very expensive to train and evaluate.
MC-Dropout is another popular alternative, which is less expensive, but also less reliable.
arXiv Detail & Related papers (2020-12-15T14:39:57Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - NeuMiss networks: differentiable programming for supervised learning
with missing values [0.0]
We derive the analytical form of the optimal predictor under a linearity assumption.
We propose a new principled architecture, named NeuMiss networks.
They have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.
arXiv Detail & Related papers (2020-07-03T11:42:25Z) - VAEs in the Presence of Missing Data [6.397263087026567]
We develop a novel latent variable model of a corruption process which generates missing data, and derive a corresponding tractable evidence lower bound (ELBO)
Our model is straightforward to implement, can handle both missing completely at random (MCAR) and missing not at random (MNAR) data, scales to high dimensional inputs and gives both the VAE encoder and decoder access to indicator variables for whether a data element is missing or not.
On the MNIST and SVHN datasets we demonstrate improved marginal log-likelihood of observed data and better missing data imputation, compared to existing approaches.
arXiv Detail & Related papers (2020-06-09T14:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.