Sufficient Identification Conditions and Semiparametric Estimation under
Missing Not at Random Mechanisms
- URL: http://arxiv.org/abs/2306.06443v1
- Date: Sat, 10 Jun 2023 13:46:16 GMT
- Title: Sufficient Identification Conditions and Semiparametric Estimation under
Missing Not at Random Mechanisms
- Authors: Anna Guo, Jiwei Zhao, Razieh Nabi
- Abstract summary: Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data.
We consider a MNAR model that generalizes several prior popular MNAR models in two ways.
We propose methods for testing the independence restrictions encoded in such models using odds ratio as our parameter of interest.
- Score: 4.211128681972148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conducting valid statistical analyses is challenging in the presence of
missing-not-at-random (MNAR) data, where the missingness mechanism is dependent
on the missing values themselves even conditioned on the observed data. Here,
we consider a MNAR model that generalizes several prior popular MNAR models in
two ways: first, it is less restrictive in terms of statistical independence
assumptions imposed on the underlying joint data distribution, and second, it
allows for all variables in the observed sample to have missing values. This
MNAR model corresponds to a so-called criss-cross structure considered in the
literature on graphical models of missing data that prevents nonparametric
identification of the entire missing data model. Nonetheless, part of the
complete-data distribution remains nonparametrically identifiable. By
exploiting this fact and considering a rich class of exponential family
distributions, we establish sufficient conditions for identification of the
complete-data distribution as well as the entire missingness mechanism. We then
propose methods for testing the independence restrictions encoded in such
models using odds ratio as our parameter of interest. We adopt two
semiparametric approaches for estimating the odds ratio parameter and establish
the corresponding asymptotic theories: one involves maximizing a conditional
likelihood with order statistics and the other uses estimating equations. The
utility of our methods is illustrated via simulation studies.
Related papers
- Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - Bayesian Networks for the robust and unbiased prediction of depression
and its symptoms utilizing speech and multimodal data [65.28160163774274]
We apply a Bayesian framework to capture the relationships between depression, depression symptoms, and features derived from speech, facial expression and cognitive game data collected at thymia.
arXiv Detail & Related papers (2022-11-09T14:48:13Z) - Mathematical Theory of Bayesian Statistics for Unknown Information
Source [0.0]
In statistical inference, uncertainty is unknown and all models are wrong.
We show general properties of cross validation, information criteria, and marginal likelihood.
The derived theory holds even if an unknown uncertainty is unrealizable by a statistical morel or even if the posterior distribution cannot be approximated by any normal distribution.
arXiv Detail & Related papers (2022-06-11T23:35:06Z) - Evaluating Aleatoric Uncertainty via Conditional Generative Models [15.494774321257939]
We study conditional generative models for aleatoric uncertainty estimation.
We introduce two metrics to measure the discrepancy between two conditional distributions.
We demonstrate numerically how our metrics provide correct measurements of conditional distributional discrepancies.
arXiv Detail & Related papers (2022-06-09T05:39:04Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Model-based Clustering with Missing Not At Random Data [0.8777702580252754]
We propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data.
Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership.
We focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership.
arXiv Detail & Related papers (2021-12-20T09:52:12Z) - Variational Gibbs Inference for Statistical Model Estimation from
Incomplete Data [7.4250022679087495]
We introduce variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data.
We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as variational autoencoders and normalising flows from incomplete data.
arXiv Detail & Related papers (2021-11-25T17:22:22Z) - Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system.
In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX)
The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Uncertainty-Gated Stochastic Sequential Model for EHR Mortality
Prediction [6.170898159041278]
We present a novel variational recurrent network that estimates the distribution of missing variables, updates hidden states, and predicts the possibility of in-hospital mortality.
It is noteworthy that our model can conduct these procedures in a single stream and learn all network parameters jointly in an end-to-end manner.
arXiv Detail & Related papers (2020-03-02T04:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.