Identification and Estimation for Nonignorable Missing Data: A Data
Fusion Approach
- URL: http://arxiv.org/abs/2311.09015v2
- Date: Wed, 28 Feb 2024 18:36:15 GMT
- Title: Identification and Estimation for Nonignorable Missing Data: A Data
Fusion Approach
- Authors: Zixiao Wang, AmirEmad Ghassami, Ilya Shpitser
- Abstract summary: We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR)
In this paper, we take an alternative approach, where information in an MNAR dataset is augmented by information in an auxiliary dataset subject to missingness at random (MAR)
We derive an inverse probability weighted (IPW) estimator for identified parameters, and evaluate the performance of our estimation strategies via simulation studies, and a data application.
- Score: 16.57879794516524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the task of identifying and estimating a parameter of interest in
settings where data is missing not at random (MNAR). In general, such
parameters are not identified without strong assumptions on the missing data
model. In this paper, we take an alternative approach and introduce a method
inspired by data fusion, where information in an MNAR dataset is augmented by
information in an auxiliary dataset subject to missingness at random (MAR). We
show that even if the parameter of interest cannot be identified given either
dataset alone, it can be identified given pooled data, under two complementary
sets of assumptions. We derive an inverse probability weighted (IPW) estimator
for identified parameters, and evaluate the performance of our estimation
strategies via simulation studies, and a data application.
Related papers
- Personalized Federated Learning via Active Sampling [50.456464838807115]
This paper proposes a novel method for sequentially identifying similar (or relevant) data generators.
Our method evaluates the relevance of a data generator by evaluating the effect of a gradient step using its local dataset.
We extend this method to non-parametric models by a suitable generalization of the gradient step to update a hypothesis using the local dataset provided by a data generator.
arXiv Detail & Related papers (2024-09-03T17:12:21Z) - Truthful Dataset Valuation by Pointwise Mutual Information [28.63827288801458]
We propose a new data valuation method that provably guarantees the following: data providers always maximize their expected score by truthfully reporting their observed data.
Our method, following the paradigm of proper scoring rules, measures the pointwise mutual information (PMI) of the test dataset and the evaluated dataset.
arXiv Detail & Related papers (2024-05-28T15:04:17Z) - Efficient semi-supervised inference for logistic regression under
case-control studies [3.5485531932219243]
We consider an inference problem in semi-supervised settings where the outcome in the labeled data is binary.
Case-control sampling is an effective sampling scheme for alleviating imbalance structure in binary data.
We find out that with the availability of the unlabeled data, the intercept parameter can be identified in semi-supervised learning setting.
arXiv Detail & Related papers (2024-02-23T14:55:58Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Sufficient Identification Conditions and Semiparametric Estimation under
Missing Not at Random Mechanisms [4.211128681972148]
Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data.
We consider a MNAR model that generalizes several prior popular MNAR models in two ways.
We propose methods for testing the independence restrictions encoded in such models using odds ratio as our parameter of interest.
arXiv Detail & Related papers (2023-06-10T13:46:16Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Uncertainty-guided Source-free Domain Adaptation [77.3844160723014]
Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model.
We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation.
arXiv Detail & Related papers (2022-08-16T08:03:30Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Bayesian data combination model with Gaussian process latent variable
model for mixed observed variables under NMAR missingness [0.0]
It is difficult to obtain a "(quasi) single-source dataset" in which the variables of interest are simultaneously observed.
It is necessary to utilize these datasets as a single-source dataset with missing variables.
We propose a data fusion method that does not assume that datasets are homogenous.
arXiv Detail & Related papers (2021-09-01T16:09:55Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Learning from missing data with the Latent Block Model [0.5735035463793007]
We propose a co-clustering model, based on the Latent Block Model, that aims to take advantage of Missing Not At Random data.
A variational expectation-maximization algorithm is derived to perform inference and a model selection criterion is presented.
arXiv Detail & Related papers (2020-10-23T08:11:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.