How Global Calibration Strengthens Multiaccuracy
- URL: http://arxiv.org/abs/2504.15206v1
- Date: Mon, 21 Apr 2025 16:22:44 GMT
- Title: How Global Calibration Strengthens Multiaccuracy
- Authors: SÃlvia Casacuberta, Parikshit Gopalan, Varun Kanade, Omer Reingold,
- Abstract summary: We find that multiaccuracy in itself is rather weak, but that the addition of global calibration boosts its power substantially.<n>We also show that it yields a restricted form of weak agnostic learning, which requires some concept in the class to have correlation greater than $1/2$ with the labels.
- Score: 13.849487128339792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiaccuracy and multicalibration are multigroup fairness notions for prediction that have found numerous applications in learning and computational complexity. They can be achieved from a single learning primitive: weak agnostic learning. Here we investigate the power of multiaccuracy as a learning primitive, both with and without the additional assumption of calibration. We find that multiaccuracy in itself is rather weak, but that the addition of global calibration (this notion is called calibrated multiaccuracy) boosts its power substantially, enough to recover implications that were previously known only assuming the stronger notion of multicalibration. We give evidence that multiaccuracy might not be as powerful as standard weak agnostic learning, by showing that there is no way to post-process a multiaccurate predictor to get a weak learner, even assuming the best hypothesis has correlation $1/2$. Rather, we show that it yields a restricted form of weak agnostic learning, which requires some concept in the class to have correlation greater than $1/2$ with the labels. However, by also requiring the predictor to be calibrated, we recover not just weak, but strong agnostic learning. A similar picture emerges when we consider the derivation of hardcore measures from predictors satisfying multigroup fairness notions. On the one hand, while multiaccuracy only yields hardcore measures of density half the optimal, we show that (a weighted version of) calibrated multiaccuracy achieves optimal density. Our results yield new insights into the complementary roles played by multiaccuracy and calibration in each setting. They shed light on why multiaccuracy and global calibration, although not particularly powerful by themselves, together yield considerably stronger notions.
Related papers
- Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training.<n>We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training.<n>Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z) - When is Multicalibration Post-Processing Necessary? [12.628103786954487]
Multicalibration is a property of predictors which guarantees meaningful uncertainty estimates.
We conduct the first comprehensive study evaluating the usefulness of multicalibration post-processing.
We distill many independent observations which may be useful for practical and effective applications of multicalibration post-processing.
arXiv Detail & Related papers (2024-06-10T17:26:39Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - On Computationally Efficient Multi-Class Calibration [9.032290717007065]
Project calibration gives strong guarantees for all downstream decision makers.
It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor.
arXiv Detail & Related papers (2024-02-12T17:25:23Z) - Calibrated Uncertainty Quantification for Operator Learning via
Conformal Prediction [95.75771195913046]
We propose a risk-controlling quantile neural operator, a distribution-free, finite-sample functional calibration conformal prediction method.
We provide a theoretical calibration guarantee on the coverage rate, defined as the expected percentage of points on the function domain.
Empirical results on a 2D Darcy flow and a 3D car surface pressure prediction task validate our theoretical results.
arXiv Detail & Related papers (2024-02-02T23:43:28Z) - Calibrating Multimodal Learning [94.65232214643436]
We propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods.
This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.
arXiv Detail & Related papers (2023-06-02T04:29:57Z) - Multi-Head Multi-Loss Model Calibration [13.841172927454204]
We introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles.
Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches.
We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets.
arXiv Detail & Related papers (2023-03-02T09:32:32Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Low-Degree Multicalibration [16.99099840073075]
Low-Degree Multicalibration defines a hierarchy of increasingly-powerful multi-group fairness notions.
We show that low-degree multicalibration can be significantly more efficient than full multicalibration.
Our work presents compelling evidence that low-degree multicalibration represents a sweet spot, pairing computational and sample efficiency with strong fairness and accuracy guarantees.
arXiv Detail & Related papers (2022-03-02T17:24:55Z) - Sample Complexity of Uniform Convergence for Multicalibration [43.10452387619829]
We address the multicalibration error and decouple it from the prediction error.
Our work gives sample complexity bounds for uniform convergence guarantees of multicalibration error.
arXiv Detail & Related papers (2020-05-04T18:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.