Label Shift Estimators for Non-Ignorable Missing Data
- URL: http://arxiv.org/abs/2310.18261v1
- Date: Fri, 27 Oct 2023 16:50:13 GMT
- Title: Label Shift Estimators for Non-Ignorable Missing Data
- Authors: Andrew C. Miller and Joseph Futoma
- Abstract summary: We consider the problem of estimating the mean of a random variable Y subject to non-ignorable missingness, i.e., where the missingness mechanism depends on Y.
We use our approach to estimate disease prevalence using a large health survey, comparing ignorable and non-ignorable approaches.
- Score: 2.605549784939959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of estimating the mean of a random variable Y subject
to non-ignorable missingness, i.e., where the missingness mechanism depends on
Y . We connect the auxiliary proxy variable framework for non-ignorable
missingness (West and Little, 2013) to the label shift setting (Saerens et al.,
2002). Exploiting this connection, we construct an estimator for non-ignorable
missing data that uses high-dimensional covariates (or proxies) without the
need for a generative model. In synthetic and semi-synthetic experiments, we
study the behavior of the proposed estimator, comparing it to commonly used
ignorable estimators in both well-specified and misspecified settings.
Additionally, we develop a score to assess how consistent the data are with the
label shift assumption. We use our approach to estimate disease prevalence
using a large health survey, comparing ignorable and non-ignorable approaches.
We show that failing to account for non-ignorable missingness can have profound
consequences on conclusions drawn from non-representative samples.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Estimating the Contamination Factor's Distribution in Unsupervised
Anomaly Detection [7.174572371800215]
Anomaly detection methods identify examples that do not follow the expected behaviour.
The proportion of examples marked as anomalies equals the expected proportion of anomalies, called contamination factor.
We introduce a method for estimating the posterior distribution of the contamination factor of a given unlabeled dataset.
arXiv Detail & Related papers (2022-10-19T11:51:25Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - Quantifying Ignorance in Individual-Level Causal-Effect Estimates under
Hidden Confounding [38.09565581056218]
We study the problem of learning conditional average treatment effects (CATE) from high-dimensional, observational data with unobserved confounders.
We present a new parametric interval estimator suited for high-dimensional data.
arXiv Detail & Related papers (2021-03-08T15:58:06Z) - Deep Generative Pattern-Set Mixture Models for Nonignorable Missingness [0.0]
We propose a variational autoencoder architecture to model both ignorable and nonignorable missing data.
Our model explicitly learns to cluster the missing data into missingness pattern sets based on the observed data and missingness masks.
Our setup trades off the characteristics of ignorable and nonignorable missingness and can thus be applied to data of both types.
arXiv Detail & Related papers (2021-03-05T08:21:35Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - The Hidden Uncertainty in a Neural Networks Activations [105.4223982696279]
The distribution of a neural network's latent representations has been successfully used to detect out-of-distribution (OOD) data.
This work investigates whether this distribution correlates with a model's epistemic uncertainty, thus indicating its ability to generalise to novel inputs.
arXiv Detail & Related papers (2020-12-05T17:30:35Z) - Semi-supervised learning and the question of true versus estimated
propensity scores [0.456877715768796]
We propose a simple procedure that reconciles the strong intuition that a known propensity functions should be useful for estimating treatment effects.
Further, simulation studies suggest that direct regression may be preferable to inverse-propensity weight estimators in many circumstances.
arXiv Detail & Related papers (2020-09-14T04:13:12Z) - Survival Estimation for Missing not at Random Censoring Indicators based
on Copula Models [1.52292571922932]
We provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism.
In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.
arXiv Detail & Related papers (2020-09-03T15:04:27Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.