The Decaying Missing-at-Random Framework: Doubly Robust Causal Inference
with Partially Labeled Data
- URL: http://arxiv.org/abs/2305.12789v2
- Date: Sun, 31 Dec 2023 11:35:11 GMT
- Title: The Decaying Missing-at-Random Framework: Doubly Robust Causal Inference
with Partially Labeled Data
- Authors: Yuqian Zhang, Abhishek Chakrabortty and Jelena Bradic
- Abstract summary: In real-world scenarios, data collection limitations often result in partially labeled datasets, leading to difficulties in drawing reliable causal inferences.
Traditional approaches in the semi-parametric (SS) and missing data literature may not adequately handle these complexities, leading to biased estimates.
This framework tackles missing outcomes in high-dimensional settings and accounts for selection bias.
- Score: 10.021381302215062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world scenarios, data collection limitations often result in
partially labeled datasets, leading to difficulties in drawing reliable causal
inferences. Traditional approaches in the semi-supervised (SS) and missing data
literature may not adequately handle these complexities, leading to biased
estimates. To address these challenges, our paper introduces a novel decaying
missing-at-random (decaying MAR) framework. This framework tackles missing
outcomes in high-dimensional settings and accounts for selection bias arising
from the dependence of labeling probability on covariates. Notably, we relax
the need for a positivity condition, commonly required in the missing data
literature, and allow uniform decay of labeling propensity scores with sample
size, accommodating faster growth of unlabeled data. Our decaying MAR framework
enables easy rate double-robust (DR) estimation of average treatment effects,
succeeding where other methods fail, even with correctly specified nuisance
models. Additionally, it facilitates asymptotic normality under model
misspecification. To achieve this, we propose adaptive new targeted
bias-reducing nuisance estimators and asymmetric cross-fitting, along with a
novel semi-parametric approach that fully leverages large volumes of unlabeled
data. Our approach requires weak sparsity conditions. Numerical results confirm
our estimators' efficacy and versatility, addressing selection bias and model
misspecification.
Related papers
- Learning from Noisy Labels via Conditional Distributionally Robust Optimization [5.85767711644773]
crowdsourcing has emerged as a practical solution for labeling large datasets.
It presents a significant challenge in learning accurate models due to noisy labels from annotators with varying levels of expertise.
arXiv Detail & Related papers (2024-11-26T05:03:26Z) - ROTI-GCV: Generalized Cross-Validation for right-ROTationally Invariant Data [1.194799054956877]
Two key tasks in high-dimensional regularized regression are tuning the regularization strength for accurate predictions and estimating the out-of-sample risk.
We introduce a new framework, ROTI-GCV, for reliably performing cross-validation under challenging conditions.
arXiv Detail & Related papers (2024-06-17T15:50:00Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - Holistic Robust Data-Driven Decisions [0.0]
Practical overfitting can typically not be attributed to a single cause but instead is caused by several factors all at once.
We consider here three overfitting sources: (i) statistical error as a result of working with finite sample data, (ii) data noise which occurs when the data points are measured only with finite precision, and finally (iii) data misspecification in which a small fraction of all data may be wholly corrupted.
We argue that although existing data-driven formulations may be robust against one of these three sources in isolation they do not provide holistic protection against all overfitting sources simultaneously.
arXiv Detail & Related papers (2022-07-19T21:28:51Z) - Gray Learning from Non-IID Data with Out-of-distribution Samples [45.788789553551176]
The integrity of training data, even when annotated by experts, is far from guaranteed.
We introduce a novel approach, termed textitGray Learning, which leverages both ground-truth and complementary labels.
By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings.
arXiv Detail & Related papers (2022-06-19T10:46:38Z) - Double Robust Semi-Supervised Inference for the Mean: Selection Bias
under MAR Labeling with Decaying Overlap [11.758346319792361]
Semi-supervised (SS) inference has received much attention in recent years.
Most of the SS literature implicitly assumes L and U to be equally distributed.
Inferential challenges in missing at random (MAR) type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS)
arXiv Detail & Related papers (2021-04-14T07:27:27Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Unsupervised Robust Domain Adaptation without Source Data [75.85602424699447]
We study the problem of robust domain adaptation in the context of unavailable target labels and source data.
We show a consistent performance improvement of over $10%$ in accuracy against the tested baselines on four benchmark datasets.
arXiv Detail & Related papers (2021-03-26T16:42:28Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.