Related papers: Stop Measuring Calibration When Humans Disagree

Stop Measuring Calibration When Humans Disagree

URL: http://arxiv.org/abs/2210.16133v1
Date: Fri, 28 Oct 2022 14:01:32 GMT
Title: Stop Measuring Calibration When Humans Disagree
Authors: Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernandez
Abstract summary: We show that measuring calibration to human majority given inherent disagreements is theoretically problematic. We derive several instance-level measures of calibration that capture key statistical properties of human judgements.
Score: 25.177984280183402
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy.

Related papers

Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training. We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training. Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Calibration through the Lens of Interpretability [3.9962751777898955]
calibration is a frequently invoked concept when useful label probability estimates are required on top of classification accuracy. In this work, we initiate an axiomatic study of the notion of calibration. We catalogue desirable properties of calibrated models as well as corresponding evaluation metrics and analyze their feasibility and correspondences.
arXiv Detail & Related papers (2024-12-01T19:28:16Z)
Truthfulness of Calibration Measures [18.21682539787221]
A calibration measure is said to be truthful if the forecaster minimizes expected penalty by predicting the conditional expectation of the next outcome. This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness. We introduce a new calibration measure termed the Subsampled Smooth Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor.
arXiv Detail & Related papers (2024-07-19T02:07:55Z)
Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$. We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Class-wise and reduced calibration methods [0.0]
We show how a reduced calibration method transforms the original problem into a simpler one. Second, we propose class-wise calibration methods, based on building on a phenomenon called neural collapse. Applying the two methods together results in class-wise reduced calibration algorithms, which are powerful tools for reducing the prediction and per-class calibration errors.
arXiv Detail & Related papers (2022-10-07T17:13:17Z)
T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z)
Estimating Expected Calibration Errors [1.52292571922932]
Uncertainty in probabilistics predictions is a key concern when models are used to support human decision making. Most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. We build an empirical procedure to quantify the quality of $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.
arXiv Detail & Related papers (2021-09-08T08:00:23Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized. We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.