Sample Selection Bias in Evaluation of Prediction Performance of Causal
Models
- URL: http://arxiv.org/abs/2106.01921v1
- Date: Thu, 3 Jun 2021 15:15:30 GMT
- Title: Sample Selection Bias in Evaluation of Prediction Performance of Causal
Models
- Authors: James P. Long and Min Jin Ha
- Abstract summary: Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding.
We revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren.
We find that sample selection bias is likely a key driver of model performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal models are notoriously difficult to validate because they make
untestable assumptions regarding confounding. New scientific experiments offer
the possibility of evaluating causal models using prediction performance.
Prediction performance measures are typically robust to violations in causal
assumptions. However prediction performance does depend on the selection of
training and test sets. In particular biased training sets can lead to
optimistic assessments of model performance. In this work, we revisit the
prediction performance of several recently proposed causal models tested on a
genetic perturbation data set of Kemmeren [Kemmeren et al., 2014]. We find that
sample selection bias is likely a key driver of model performance. We propose
using a less-biased evaluation set for assessing prediction performance on
Kemmeren and compare models on this new set. In this setting, the causal model
tested have similar performance to standard association based estimators such
as Lasso. Finally we compare the performance of causal estimators in simulation
studies which reproduce the Kemmeren structure of genetic knockout experiments
but without any sample selection bias. These results provide an improved
understanding of the performance of several causal models and offer guidance on
how future studies should use Kemmeren.
Related papers
- An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification [0.0]
This paper examines the impact of balancing methods on predictive multiplicity using the Rashomon effect.
It is crucial because the blind model selection in data-centric AI is risky from a set of approximately equally accurate models.
arXiv Detail & Related papers (2024-03-22T13:08:22Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Guide the Learner: Controlling Product of Experts Debiasing Method Based
on Token Attribution Similarities [17.082695183953486]
A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model.
Here, the underlying assumption is that the biased model resorts to shortcut features.
We introduce a fine-tuning strategy that incorporates the similarity between the main and biased model attribution scores in a Product of Experts loss function.
arXiv Detail & Related papers (2023-02-06T15:21:41Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Post-Selection Confidence Bounds for Prediction Performance [2.28438857884398]
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks.
We propose an algorithm how to compute valid lower confidence bounds for multiple models that have been selected based on their prediction performances in the evaluation set.
arXiv Detail & Related papers (2022-10-24T13:28:43Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Boost Test-Time Performance with Closed-Loop Inference [85.43516360332646]
We propose to predict hard-classified test samples in a looped manner to boost the model performance.
We first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops.
For each hard sample, we construct an additional auxiliary learning task based on its original top-$K$ predictions to calibrate the model.
arXiv Detail & Related papers (2022-03-21T10:20:21Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Expected Validation Performance and Estimation of a Random Variable's
Maximum [48.83713377993604]
We analyze three statistical estimators for expected validation performance.
We find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias.
We find that the two biased estimators lead to the fewest incorrect conclusions.
arXiv Detail & Related papers (2021-10-01T18:48:47Z) - Model Selection for Time Series Forecasting: Empirical Analysis of
Different Estimators [1.6328866317851185]
We compare a set of estimation methods for model selection in time series forecasting tasks.
We empirically found that the accuracy of the estimators for selecting the best solution is low.
Some factors, such as the sample size, are important in the relative performance of the estimators.
arXiv Detail & Related papers (2021-04-01T16:08:25Z) - Counterfactual Predictions under Runtime Confounding [74.90756694584839]
We study the counterfactual prediction task in the setting where all relevant factors are captured in the historical data.
We propose a doubly-robust procedure for learning counterfactual prediction models in this setting.
arXiv Detail & Related papers (2020-06-30T15:49:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.