Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited
- URL: http://arxiv.org/abs/2008.03033v1
- Date: Fri, 7 Aug 2020 08:22:26 GMT
- Title: Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited
- Authors: Timo Dimitriadis, Tilmann Gneiting, Alexander I. Jordan
- Abstract summary: We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way.
Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A probability forecast or probabilistic classifier is reliable or calibrated
if the predicted probabilities are matched by ex post observed frequencies, as
examined visually in reliability diagrams. The classical binning and counting
approach to plotting reliability diagrams has been hampered by a lack of
stability under unavoidable, ad hoc implementation decisions. Here we introduce
the CORP approach, which generates provably statistically Consistent, Optimally
binned, and Reproducible reliability diagrams in an automated way. CORP is
based on non-parametric isotonic regression and implemented via the
Pool-adjacent-violators (PAV) algorithm - essentially, the CORP reliability
diagram shows the graph of the PAV- (re)calibrated forecast probabilities. The
CORP approach allows for uncertainty quantification via either resampling
techniques or asymptotic theory, furnishes a new numerical measure of
miscalibration, and provides a CORP based Brier score decomposition that
generalizes to any proper scoring rule. We anticipate that judicious uses of
the PAV algorithm yield improved tools for diagnostics and inference for a very
wide range of statistical and machine learning methods.
Related papers
- When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2023-10-17T20:30:16Z) - On Uncertainty Calibration and Selective Generation in Probabilistic
Neural Summarization: A Benchmark Study [14.041071717005362]
Modern deep models for summarization attains impressive benchmark performance, but they are prone to generating miscalibrated predictive uncertainty.
This means that they assign high confidence to low-quality predictions, leading to compromised reliability and trustworthiness in real-world applications.
Probabilistic deep learning methods are common solutions to the miscalibration problem, but their relative effectiveness in complex autoregressive summarization tasks are not well-understood.
arXiv Detail & Related papers (2023-04-17T23:06:28Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2022-06-16T06:13:53Z) - Mathematical Properties of Continuous Ranked Probability Score
Forecasting [0.0]
We study the rate of convergence in terms of CRPS of distributional regression methods.
We show that the k-nearest neighbor method and the kernel method for the distributional regression reach the optimal rate of convergence in dimension $dgeq2$.
arXiv Detail & Related papers (2022-05-09T15:01:13Z) - Random Noise vs State-of-the-Art Probabilistic Forecasting Methods : A
Case Study on CRPS-Sum Discrimination Ability [4.9449660544238085]
We show that the statistical properties of target data affect the discrimination ability of CRPS-Sum.
We highlight that CRPS-Sum calculation overlooks the performance of the model on each dimension.
We show that it is easily possible to have a better CRPS-Sum for a dummy model, which looks like random noise.
arXiv Detail & Related papers (2022-01-21T12:36:58Z) - PDC-Net+: Enhanced Probabilistic Dense Correspondence Network [161.76275845530964]
Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences.
We develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction.
Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-09-28T17:56:41Z) - Revisiting One-vs-All Classifiers for Predictive Uncertainty and
Out-of-Distribution Detection in Neural Networks [22.34227625637843]
We investigate how the parametrization of the probabilities in discriminative classifiers affects the uncertainty estimates.
We show that one-vs-all formulations can improve calibration on image classification tasks.
arXiv Detail & Related papers (2020-07-10T01:55:02Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.