Learning Prediction Intervals for Model Performance
- URL: http://arxiv.org/abs/2012.08625v1
- Date: Tue, 15 Dec 2020 21:32:03 GMT
- Title: Learning Prediction Intervals for Model Performance
- Authors: Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiri Navratil
- Abstract summary: We propose a method to compute prediction intervals for model performance.
We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines.
- Score: 1.433758865948252
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding model performance on unlabeled data is a fundamental challenge
of developing, deploying, and maintaining AI systems. Model performance is
typically evaluated using test sets or periodic manual quality assessments,
both of which require laborious manual data labeling. Automated performance
prediction techniques aim to mitigate this burden, but potential inaccuracy and
a lack of trust in their predictions has prevented their widespread adoption.
We address this core problem of performance prediction uncertainty with a
method to compute prediction intervals for model performance. Our methodology
uses transfer learning to train an uncertainty model to estimate the
uncertainty of model performance predictions. We evaluate our approach across a
wide range of drift conditions and show substantial improvement over
competitive baselines. We believe this result makes prediction intervals, and
performance prediction in general, significantly more practical for real-world
use.
Related papers
- Certified Human Trajectory Prediction [66.1736456453465]
Tray prediction plays an essential role in autonomous vehicles.
We propose a certification approach tailored for the task of trajectory prediction.
We address the inherent challenges associated with trajectory prediction, including unbounded outputs, and mutli-modality.
arXiv Detail & Related papers (2024-03-20T17:41:35Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Performance Prediction Under Dataset Shift [1.1602089225841632]
We study the generalization capabilities of various performance prediction models to new domains by learning on generated synthetic perturbations.
We propose a natural and effortless uncertainty estimation of the predicted accuracy that ensures reliable use of performance predictors.
arXiv Detail & Related papers (2022-06-21T19:40:58Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Data Uncertainty without Prediction Models [0.8223798883838329]
We propose an uncertainty estimation method named a Distance-weighted Class Impurity without explicit use of prediction models.
We verified that the Distance-weighted Class Impurity works effectively regardless of prediction models.
arXiv Detail & Related papers (2022-04-25T13:26:06Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Evaluation of Machine Learning Techniques for Forecast Uncertainty
Quantification [0.13999481573773068]
Ensemble forecasting is, so far, the most successful approach to produce relevant forecasts along with an estimation of their uncertainty.
Main limitations of ensemble forecasting are the high computational cost and the difficulty to capture and quantify different sources of uncertainty.
In this work proof-of-concept model experiments are conducted to examine the performance of ANNs trained to predict a corrected state of the system and the state uncertainty using only a single deterministic forecast as input.
arXiv Detail & Related papers (2021-11-29T16:52:17Z) - Uncertainty-Aware Time-to-Event Prediction using Deep Kernel Accelerated
Failure Time Models [11.171712535005357]
We propose Deep Kernel Accelerated Failure Time models for the time-to-event prediction task.
Our model shows better point estimate performance than recurrent neural network based baselines in experiments on two real-world datasets.
arXiv Detail & Related papers (2021-07-26T14:55:02Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Counterfactual Predictions under Runtime Confounding [74.90756694584839]
We study the counterfactual prediction task in the setting where all relevant factors are captured in the historical data.
We propose a doubly-robust procedure for learning counterfactual prediction models in this setting.
arXiv Detail & Related papers (2020-06-30T15:49:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.