Calibration tests beyond classification
- URL: http://arxiv.org/abs/2210.13355v1
- Date: Fri, 21 Oct 2022 09:49:57 GMT
- Title: Calibration tests beyond classification
- Authors: David Widmann, Fredrik Lindsten, Dave Zachariah
- Abstract summary: Most supervised machine learning tasks are subject to irreducible prediction errors.
Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets.
Calibrated models guarantee that the predictions are neither over- nor under-confident.
- Score: 30.616624345970973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most supervised machine learning tasks are subject to irreducible prediction
errors. Probabilistic predictive models address this limitation by providing
probability distributions that represent a belief over plausible targets,
rather than point estimates. Such models can be a valuable tool in
decision-making under uncertainty, provided that the model output is meaningful
and interpretable. Calibrated models guarantee that the probabilistic
predictions are neither over- nor under-confident. In the machine learning
literature, different measures and statistical tests have been proposed and
studied for evaluating the calibration of classification models. For regression
problems, however, research has been focused on a weaker condition of
calibration based on predicted quantiles for real-valued targets. In this
paper, we propose the first framework that unifies calibration evaluation and
tests for general probabilistic predictive models. It applies to any such
model, including classification and regression models of arbitrary dimension.
Furthermore, the framework generalizes existing measures and provides a more
intuitive reformulation of a recently proposed framework for calibration in
multi-class classification. In particular, we reformulate and generalize the
kernel calibration error, its estimators, and hypothesis tests using
scalar-valued kernels, and evaluate the calibration of real-valued regression
problems.
Related papers
- Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.
We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - Rigorous Assessment of Model Inference Accuracy using Language
Cardinality [5.584832154027001]
We develop a systematic approach that minimizes bias and uncertainty in model accuracy assessment by replacing statistical estimation with deterministic accuracy measures.
We experimentally demonstrate the consistency and applicability of our approach by assessing the accuracy of models inferred by state-of-the-art inference tools.
arXiv Detail & Related papers (2022-11-29T21:03:26Z) - Stability of clinical prediction models developed using statistical or
machine learning methods [0.5482532589225552]
Clinical prediction models estimate an individual's risk of a particular health outcome, conditional on their values of multiple predictors.
Many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks)
We show instability in a model's estimated risks is often considerable, and manifests itself as miscalibration of predictions in new data.
arXiv Detail & Related papers (2022-11-02T11:55:28Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation.
We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z) - Estimating Expected Calibration Errors [1.52292571922932]
Uncertainty in probabilistics predictions is a key concern when models are used to support human decision making.
Most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities.
We build an empirical procedure to quantify the quality of $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.
arXiv Detail & Related papers (2021-09-08T08:00:23Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.