Performance Prediction Under Dataset Shift
- URL: http://arxiv.org/abs/2206.10697v1
- Date: Tue, 21 Jun 2022 19:40:58 GMT
- Title: Performance Prediction Under Dataset Shift
- Authors: Simona Maggio, Victor Bouvier and L\'eo Dreyfus-Schmidt
- Abstract summary: We study the generalization capabilities of various performance prediction models to new domains by learning on generated synthetic perturbations.
We propose a natural and effortless uncertainty estimation of the predicted accuracy that ensures reliable use of performance predictors.
- Score: 1.1602089225841632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ML models deployed in production often have to face unknown domain changes,
fundamentally different from their training settings. Performance prediction
models carry out the crucial task of measuring the impact of these changes on
model performance. We study the generalization capabilities of various
performance prediction models to new domains by learning on generated synthetic
perturbations. Empirical validation on a benchmark of ten tabular datasets
shows that models based upon state-of-the-art shift detection metrics are not
expressive enough to generalize to unseen domains, while Error Predictors bring
a consistent improvement in performance prediction under shift. We additionally
propose a natural and effortless uncertainty estimation of the predicted
accuracy that ensures reliable use of performance predictors. Our
implementation is available at https:
//github.com/dataiku-research/performance_prediction_under_shift.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Estimating Model Performance Under Covariate Shift Without Labels [9.804680621164168]
We introduce Probabilistic Adaptive Performance Estimation (PAPE) for evaluating classification models on unlabeled data.
PAPE provides more accurate performance estimates than other evaluated methodologies.
arXiv Detail & Related papers (2024-01-16T13:29:30Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Closer Look at the Uncertainty Estimation in Semantic Segmentation under
Distributional Shift [2.05617385614792]
Uncertainty estimation for the task of semantic segmentation is evaluated under a varying level of domain shift.
It was shown that simple color transformations already provide a strong baseline.
ensemble of models was utilized in the self-training setting to improve the pseudo-labels generation.
arXiv Detail & Related papers (2021-05-31T19:50:43Z) - Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks.
First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU.
Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z) - Learning Prediction Intervals for Model Performance [1.433758865948252]
We propose a method to compute prediction intervals for model performance.
We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines.
arXiv Detail & Related papers (2020-12-15T21:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.