Statistical Inference Under Constrained Selection Bias
- URL: http://arxiv.org/abs/2306.03302v3
- Date: Sat, 4 Nov 2023 16:56:02 GMT
- Title: Statistical Inference Under Constrained Selection Bias
- Authors: Santiago Cortes-Gomez, Mateo Dulce, Carlos Patino, Bryan Wilder
- Abstract summary: We propose a framework that enables statistical inference in the presence of selection bias.
The output is high-probability bounds on the value of an estimand for the target distribution.
We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks.
- Score: 20.862583584531322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale datasets are increasingly being used to inform decision making.
While this effort aims to ground policy in real-world evidence, challenges have
arisen as selection bias and other forms of distribution shifts often plague
observational data. Previous attempts to provide robust inference have given
guarantees depending on a user-specified amount of possible distribution shift
(e.g., the maximum KL divergence between the observed and target
distributions). However, decision makers will often have additional knowledge
about the target distribution which constrains the kind of possible shifts. To
leverage such information, we propose a framework that enables statistical
inference in the presence of selection bias which obeys user-specified
constraints in the form of functions whose expectation is known under the
target distribution. The output is high-probability bounds on the value of an
estimand for the target distribution. Hence, our method leverages domain
knowledge in order to partially identify a wide class of estimands. We analyze
the computational and statistical properties of methods to estimate these
bounds and show that our method can produce informative bounds on a variety of
simulated and semisynthetic tasks, as well as in a real-world use case.
Related papers
- Distributionally robust risk evaluation with an isotonic constraint [20.74502777102024]
Distributionally robust learning aims to control the worst-case statistical performance within an uncertainty set of candidate distributions.
We propose a shape-constrained approach to DRL, which incorporates prior information about the way in which the unknown target distribution differs from its estimate.
Empirical studies on both synthetic and real data examples demonstrate the improved accuracy of the proposed shape-constrained approach.
arXiv Detail & Related papers (2024-07-09T13:56:34Z) - Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift [9.387706860375461]
A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance.
The prediction interval serves as a crucial tool for characterizing uncertainties induced by their underlying distribution.
We propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain.
arXiv Detail & Related papers (2024-05-16T17:55:42Z) - Distributional Counterfactual Explanations With Optimal Transport [7.597676579494146]
Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models.
This paper proposes distributional counterfactual explanation (DCE), shifting focus to the distributional properties of observed and counterfactual data.
arXiv Detail & Related papers (2024-01-23T21:48:52Z) - Probabilistic Test-Time Generalization by Variational Neighbor-Labeling [62.158807685159736]
This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed on unseen target domains.
Probability pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time.
Variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels.
arXiv Detail & Related papers (2023-07-08T18:58:08Z) - Inferential Moments of Uncertain Multivariable Systems [0.0]
We treat Bayesian probability updating as a random process and uncover intrinsic quantitative features of joint probability distributions called inferential moments.
Inferential moments quantify shape information about how a prior distribution is expected to update in response to yet to be obtained information.
We find a power series expansion of the mutual information in terms of inferential moments, which implies a connection between inferential theoretic logic and elements of information theory.
arXiv Detail & Related papers (2023-05-03T00:56:12Z) - Data-Driven Approximations of Chance Constrained Programs in
Nonstationary Environments [3.126118485851773]
We study sample average approximations (SAA) of chance constrained programs.
We consider a nonstationary variant of this problem, where the random samples are assumed to be independently drawn in a sequential fashion.
We propose a novel robust SAA method exploiting information about the Wasserstein distance between the sequence of data-generating distributions and the actual chance constraint distribution.
arXiv Detail & Related papers (2022-05-08T01:01:57Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Personalized Trajectory Prediction via Distribution Discrimination [78.69458579657189]
Trarimiy prediction is confronted with the dilemma to capture the multi-modal nature of future dynamics.
We present a distribution discrimination (DisDis) method to predict personalized motion patterns.
Our method can be integrated with existing multi-modal predictive models as a plug-and-play module.
arXiv Detail & Related papers (2021-07-29T17:42:12Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Learning Calibrated Uncertainties for Domain Shift: A Distributionally
Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts.
In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution.
We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z) - Estimating Generalization under Distribution Shifts via Domain-Invariant
Representations [75.74928159249225]
We use a set of domain-invariant predictors as a proxy for the unknown, true target labels.
The error of the resulting risk estimate depends on the target risk of the proxy model.
arXiv Detail & Related papers (2020-07-06T17:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.