Related papers: Measuring Informativeness Gap of (Mis)Calibrated Predictors

Measuring Informativeness Gap of (Mis)Calibrated Predictors

URL: http://arxiv.org/abs/2507.12094v1
Date: Wed, 16 Jul 2025 10:01:22 GMT
Title: Measuring Informativeness Gap of (Mis)Calibrated Predictors
Authors: Yiding Feng, Wei Tang,
Abstract summary: In many applications, decision-makers must choose between multiple predictive models that may all be miscalibrated.<n>Our framework strictly generalizes U-Calibration [KLST-23] and Decision Loss [HW-24], which compare a miscalibrated predictor to its calibrated counterpart.<n>Our second contribution is a dual characterization of the informativeness gap, which gives rise to a natural informativeness measure.
Score: 15.651406777700517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many applications, decision-makers must choose between multiple predictive models that may all be miscalibrated. Which model (i.e., predictor) is more "useful" in downstream decision tasks? To answer this, our first contribution introduces the notion of the informativeness gap between any two predictors, defined as the maximum normalized payoff advantage one predictor offers over the other across all decision-making tasks. Our framework strictly generalizes several existing notions: it subsumes U-Calibration [KLST-23] and Calibration Decision Loss [HW-24], which compare a miscalibrated predictor to its calibrated counterpart, and it recovers Blackwell informativeness [Bla-51, Bla-53] as a special case when both predictors are perfectly calibrated. Our second contribution is a dual characterization of the informativeness gap, which gives rise to a natural informativeness measure that can be viewed as a relaxed variant of the earth mover's distance (EMD) between two prediction distributions. We show that this measure satisfies natural desiderata: it is complete and sound, and it can be estimated sample-efficiently in the prediction-only access setting. Along the way, we also obtain novel combinatorial structural results when applying this measure to perfectly calibrated predictors.

Related papers

Quantifying the Reliability of Predictions in Detection Transformers: Object-Level Calibration and Image-Level Uncertainty [6.209833978040362]
In practice, DETR generates hundreds of predictions that far outnumber the actual number of objects present in an image.<n>This raises the question: can we trust and use all of these predictions?<n>We present empirical evidence highlighting how different predictions within the same image play distinct roles, resulting in varying reliability levels.
arXiv Detail & Related papers (2024-12-02T18:34:17Z)
Reconciling Model Multiplicity for Downstream Decision Making [24.335927243672952]
We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to differ on a substantial portion of the population. We propose a framework that calibrates the predictive models with regard to both the downstream decision-making problem and the individual probability prediction.
arXiv Detail & Related papers (2024-05-30T03:36:46Z)
Boldness-Recalibration for Binary Event Predictions [0.0]
Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, i.e., spread out enough to be informative for decision making. There is a fundamental tension between calibration and boldness, since calibration metrics can be high when predictions are overly cautious, i.e., non-bold. The purpose of this work is to develop a Bayesian model selection-based approach to assess calibration, and a strategy for boldness-recalibration.
arXiv Detail & Related papers (2023-05-05T18:14:47Z)
Performative Prediction with Bandit Feedback: Learning through Reparameterization [23.039885534575966]
performative prediction is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model. We develop a reparametization that reparametrizes the performative prediction objective as a function of induced data distribution.
arXiv Detail & Related papers (2023-05-01T21:31:29Z)
Confidence and Dispersity Speak: Characterising Prediction Matrix for Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels. We use the nuclear norm that has been shown to be effective in characterizing both properties. We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z)
Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation. We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z)
CovarianceNet: Conditional Generative Model for Correct Covariance Prediction in Human Motion Prediction [71.31516599226606]
We present a new method to correctly predict the uncertainty associated with the predicted distribution of future trajectories. Our approach, CovariaceNet, is based on a Conditional Generative Model with Gaussian latent variables.
arXiv Detail & Related papers (2021-09-07T09:38:24Z)
Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z)
Private Prediction Sets [72.75711776601973]
Machine learning systems need reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. We evaluate the method on large-scale computer vision datasets.
arXiv Detail & Related papers (2021-02-11T18:59:11Z)
Curse of Small Sample Size in Forecasting of the Active Cases in COVID-19 Outbreak [0.0]
During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made. However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy. This paper gives an explanation for the failure of machine learning models in this particular forecasting problem.
arXiv Detail & Related papers (2020-11-06T23:13:34Z)
Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized. We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.