When is Multicalibration Post-Processing Necessary?
- URL: http://arxiv.org/abs/2406.06487v2
- Date: Mon, 04 Nov 2024 22:17:58 GMT
- Title: When is Multicalibration Post-Processing Necessary?
- Authors: Dutch Hansen, Siddartha Devic, Preetum Nakkiran, Vatsal Sharan,
- Abstract summary: Multicalibration is a property of predictors which guarantees meaningful uncertainty estimates.
We conduct the first comprehensive study evaluating the usefulness of multicalibration post-processing.
We distill many independent observations which may be useful for practical and effective applications of multicalibration post-processing.
- Score: 12.628103786954487
- License:
- Abstract: Calibration is a well-studied property of predictors which guarantees meaningful uncertainty estimates. Multicalibration is a related notion -- originating in algorithmic fairness -- which requires predictors to be simultaneously calibrated over a potentially complex and overlapping collection of protected subpopulations (such as groups defined by ethnicity, race, or income). We conduct the first comprehensive study evaluating the usefulness of multicalibration post-processing across a broad set of tabular, image, and language datasets for models spanning from simple decision trees to 90 million parameter fine-tuned LLMs. Our findings can be summarized as follows: (1) models which are calibrated out of the box tend to be relatively multicalibrated without any additional post-processing; (2) multicalibration post-processing can help inherently uncalibrated models and large vision and language models; and (3) traditional calibration measures may sometimes provide multicalibration implicitly. More generally, we also distill many independent observations which may be useful for practical and effective applications of multicalibration post-processing in real-world contexts. We also release a python package implementing multicalibration algorithms, available via `pip install multicalibration'.
Related papers
- Calibrated Multivariate Regression with Localized PIT Mappings [4.277516034244117]
This paper introduces a novel post-hoc recalibration approach that addresses multivariate calibration for potentially misspecified models.
We present two versions of our approach: one uses K-nearest neighbors, and the other uses normalizing flows.
We demonstrate the effectiveness of our approach on two real data applications: recalibrating a deep neural network's currency exchange rate forecast and improving a regression model for childhood malnutrition in India.
arXiv Detail & Related papers (2024-09-17T02:41:03Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Multi-Head Multi-Loss Model Calibration [13.841172927454204]
We introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles.
Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches.
We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets.
arXiv Detail & Related papers (2023-03-02T09:32:32Z) - A Unifying Perspective on Multi-Calibration: Game Dynamics for
Multi-Objective Learning [63.20009081099896]
We provide a unifying framework for the design and analysis of multicalibrated predictors.
We exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multicalibration learning problems.
arXiv Detail & Related papers (2023-02-21T18:24:17Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Fair admission risk prediction with proportional multicalibration [0.16249424686052708]
Multicalibration constrains calibration error among flexibly-defined subpopulations.
It is possible for a decision-maker to learn to trust or distrust model predictions for specific groups.
We propose proportional multicalibration, a criteria that constrains the percent calibration error among groups and within prediction bins.
arXiv Detail & Related papers (2022-09-29T08:15:29Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - Low-Degree Multicalibration [16.99099840073075]
Low-Degree Multicalibration defines a hierarchy of increasingly-powerful multi-group fairness notions.
We show that low-degree multicalibration can be significantly more efficient than full multicalibration.
Our work presents compelling evidence that low-degree multicalibration represents a sweet spot, pairing computational and sample efficiency with strong fairness and accuracy guarantees.
arXiv Detail & Related papers (2022-03-02T17:24:55Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Self-Calibration Supported Robust Projective Structure-from-Motion [80.15392629310507]
We propose a unified Structure-from-Motion (SfM) method, in which the matching process is supported by self-calibration constraints.
We show experimental results demonstrating robust multiview matching and accurate camera calibration by exploiting these constraints.
arXiv Detail & Related papers (2020-07-04T08:47:10Z) - Quantile Regularization: Towards Implicit Calibration of Regression
Models [30.872605139672086]
We present a method for calibrating regression models based on a novel quantile regularizer defined as the cumulative KL divergence between two CDFs.
We show that the proposed quantile regularizer significantly improves calibration for regression models trained using approaches, such as Dropout VI and Deep Ensembles.
arXiv Detail & Related papers (2020-02-28T16:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.