Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited
- URL: http://arxiv.org/abs/2008.03033v1
- Date: Fri, 7 Aug 2020 08:22:26 GMT
- Title: Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited
- Authors: Timo Dimitriadis, Tilmann Gneiting, Alexander I. Jordan
- Abstract summary: We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way.
Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A probability forecast or probabilistic classifier is reliable or calibrated
if the predicted probabilities are matched by ex post observed frequencies, as
examined visually in reliability diagrams. The classical binning and counting
approach to plotting reliability diagrams has been hampered by a lack of
stability under unavoidable, ad hoc implementation decisions. Here we introduce
the CORP approach, which generates provably statistically Consistent, Optimally
binned, and Reproducible reliability diagrams in an automated way. CORP is
based on non-parametric isotonic regression and implemented via the
Pool-adjacent-violators (PAV) algorithm - essentially, the CORP reliability
diagram shows the graph of the PAV- (re)calibrated forecast probabilities. The
CORP approach allows for uncertainty quantification via either resampling
techniques or asymptotic theory, furnishes a new numerical measure of
miscalibration, and provides a CORP based Brier score decomposition that
generalizes to any proper scoring rule. We anticipate that judicious uses of
the PAV algorithm yield improved tools for diagnostics and inference for a very
wide range of statistical and machine learning methods.
Related papers
- Calibrated Probabilistic Forecasts for Arbitrary Sequences [58.54729945445505]
Real-world data streams can change unpredictably due to distribution shifts, feedback loops and adversarial actors.
We present a forecasting framework ensuring valid uncertainty estimates regardless of how data evolves.
arXiv Detail & Related papers (2024-09-27T21:46:42Z) - Probabilistic Scores of Classifiers, Calibration is not Enough [0.32985979395737786]
In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications.
In this study, we highlight approaches that prioritize the alignment between predicted scores and true probability distributions.
Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
arXiv Detail & Related papers (2024-08-06T19:53:00Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2023-10-17T20:30:16Z) - On Uncertainty Calibration and Selective Generation in Probabilistic
Neural Summarization: A Benchmark Study [14.041071717005362]
Modern deep models for summarization attains impressive benchmark performance, but they are prone to generating miscalibrated predictive uncertainty.
This means that they assign high confidence to low-quality predictions, leading to compromised reliability and trustworthiness in real-world applications.
Probabilistic deep learning methods are common solutions to the miscalibration problem, but their relative effectiveness in complex autoregressive summarization tasks are not well-understood.
arXiv Detail & Related papers (2023-04-17T23:06:28Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2022-06-16T06:13:53Z) - Random Noise vs State-of-the-Art Probabilistic Forecasting Methods : A
Case Study on CRPS-Sum Discrimination Ability [4.9449660544238085]
We show that the statistical properties of target data affect the discrimination ability of CRPS-Sum.
We highlight that CRPS-Sum calculation overlooks the performance of the model on each dimension.
We show that it is easily possible to have a better CRPS-Sum for a dummy model, which looks like random noise.
arXiv Detail & Related papers (2022-01-21T12:36:58Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.