Calibrate: Interactive Analysis of Probabilistic Model Output
- URL: http://arxiv.org/abs/2207.13770v1
- Date: Wed, 27 Jul 2022 20:01:27 GMT
- Title: Calibrate: Interactive Analysis of Probabilistic Model Output
- Authors: Peter Xenopoulos, Joao Rulff, Luis Gustavo Nonato, Brian Barr, Claudio
Silva
- Abstract summary: We present Calibrate, an interactive reliability diagram that is resistant to drawbacks in traditional approaches.
We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data.
- Score: 5.444048397001003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing classification model performance is a crucial task for machine
learning practitioners. While practitioners often use count-based metrics
derived from confusion matrices, like accuracy, many applications, such as
weather prediction, sports betting, or patient risk prediction, rely on a
classifier's predicted probabilities rather than predicted labels. In these
instances, practitioners are concerned with producing a calibrated model, that
is, one which outputs probabilities that reflect those of the true
distribution. Model calibration is often analyzed visually, through static
reliability diagrams, however, the traditional calibration visualization may
suffer from a variety of drawbacks due to the strong aggregations it
necessitates. Furthermore, count-based approaches are unable to sufficiently
analyze model calibration. We present Calibrate, an interactive reliability
diagram that addresses the aforementioned issues. Calibrate constructs a
reliability diagram that is resistant to drawbacks in traditional approaches,
and allows for interactive subgroup analysis and instance-level inspection. We
demonstrate the utility of Calibrate through use cases on both real-world and
synthetic data. We further validate Calibrate by presenting the results of a
think-aloud experiment with data scientists who routinely analyze model
calibration.
Related papers
- Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.
We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z) - Confidence-Aware Multi-Field Model Calibration [39.44356123378625]
Field-aware calibration can adjust model output on different feature field values to satisfy fine-grained advertising demands.
We propose a confidence-aware multi-field calibration method, which adaptively adjusts the calibration intensity based on confidence levels derived from sample statistics.
arXiv Detail & Related papers (2024-02-27T16:24:28Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Calibration tests beyond classification [30.616624345970973]
Most supervised machine learning tasks are subject to irreducible prediction errors.
Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets.
Calibrated models guarantee that the predictions are neither over- nor under-confident.
arXiv Detail & Related papers (2022-10-21T09:49:57Z) - Variable-Based Calibration for Machine Learning Classifiers [11.9995808096481]
We introduce the notion of variable-based calibration to characterize calibration properties of a model.
We find that models with near-perfect expected calibration error can exhibit significant miscalibration as a function of features of the data.
arXiv Detail & Related papers (2022-09-30T00:49:31Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.