Fair admission risk prediction with proportional multicalibration
- URL: http://arxiv.org/abs/2209.14613v3
- Date: Thu, 31 Aug 2023 19:57:50 GMT
- Title: Fair admission risk prediction with proportional multicalibration
- Authors: William La Cava, Elle Lett, Guangya Wan
- Abstract summary: Multicalibration constrains calibration error among flexibly-defined subpopulations.
It is possible for a decision-maker to learn to trust or distrust model predictions for specific groups.
We propose proportional multicalibration, a criteria that constrains the percent calibration error among groups and within prediction bins.
- Score: 0.16249424686052708
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Fair calibration is a widely desirable fairness criteria in risk prediction
contexts. One way to measure and achieve fair calibration is with
multicalibration. Multicalibration constrains calibration error among
flexibly-defined subpopulations while maintaining overall calibration. However,
multicalibrated models can exhibit a higher percent calibration error among
groups with lower base rates than groups with higher base rates. As a result,
it is possible for a decision-maker to learn to trust or distrust model
predictions for specific groups. To alleviate this, we propose
\emph{proportional multicalibration}, a criteria that constrains the percent
calibration error among groups and within prediction bins. We prove that
satisfying proportional multicalibration bounds a model's multicalibration as
well its \emph{differential calibration}, a fairness criteria that directly
measures how closely a model approximates sufficiency. Therefore,
proportionally calibrated models limit the ability of decision makers to
distinguish between model performance on different patient groups, which may
make the models more trustworthy in practice. We provide an efficient algorithm
for post-processing risk prediction models for proportional multicalibration
and evaluate it empirically. We conduct simulation studies and investigate a
real-world application of PMC-postprocessing to prediction of emergency
department patient admissions. We observe that proportional multicalibration is
a promising criteria for controlling simultaneous measures of calibration
fairness of a model over intersectional groups with virtually no cost in terms
of classification performance.
Related papers
- Probabilistic Scores of Classifiers, Calibration is not Enough [0.32985979395737786]
In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications.
In this study, we highlight approaches that prioritize the alignment between predicted scores and true probability distributions.
Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
arXiv Detail & Related papers (2024-08-06T19:53:00Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Is this model reliable for everyone? Testing for strong calibration [4.893345190925178]
In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup.
The task of auditing a model for strong calibration is well-known to be difficult due to the sheer number of potential subgroups.
Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal.
arXiv Detail & Related papers (2023-07-28T00:59:14Z) - Mitigating Calibration Bias Without Fixed Attribute Grouping for
Improved Fairness in Medical Imaging Analysis [2.8943928153775826]
Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias.
We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients.
arXiv Detail & Related papers (2023-07-04T14:14:12Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - Should Ensemble Members Be Calibrated? [16.331175260764]
Modern deep neural networks are often observed to be poorly calibrated.
Deep learning approaches make use of large numbers of model parameters.
This paper explores the application of calibration schemes to deep ensembles.
arXiv Detail & Related papers (2021-01-13T23:59:00Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z) - Quantile Regularization: Towards Implicit Calibration of Regression
Models [30.872605139672086]
We present a method for calibrating regression models based on a novel quantile regularizer defined as the cumulative KL divergence between two CDFs.
We show that the proposed quantile regularizer significantly improves calibration for regression models trained using approaches, such as Dropout VI and Deep Ensembles.
arXiv Detail & Related papers (2020-02-28T16:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.