Does calibration mean what they say it means; or, the reference class problem rises again
- URL: http://arxiv.org/abs/2412.16769v1
- Date: Sat, 21 Dec 2024 20:50:31 GMT
- Title: Does calibration mean what they say it means; or, the reference class problem rises again
- Authors: Lily Hu,
- Abstract summary: Group-calibrated scores "mean the same thing" (on average) across individuals from different groups.
calibration cannot ensure the kind of consistent score interpretation that the Same Meaning picture implies matters for fairness.
Reflecting on the origins of this error opens a wider lens onto the predominant methodology in algorithmic fairness.
- Score: 0.0
- License:
- Abstract: Discussions of statistical criteria for fairness commonly convey the normative significance of calibration within groups by invoking what risk scores "mean." On the Same Meaning picture, group-calibrated scores "mean the same thing" (on average) across individuals from different groups and accordingly, guard against disparate treatment of individuals based on group membership. My contention is that calibration guarantees no such thing. Since concrete actual people belong to many groups, calibration cannot ensure the kind of consistent score interpretation that the Same Meaning picture implies matters for fairness, unless calibration is met within every group to which an individual belongs. Alas only perfect predictors may meet this bar. The Same Meaning picture thus commits a reference class fallacy by inferring from calibration within some group to the "meaning" or evidential value of an individual's score, because they are a member of that group. Furthermore, the reference class answer it presumes is almost surely wrong. I then show that the reference class problem besets not just calibration but all group statistical facts that claim a close connection to fairness. Reflecting on the origins of this error opens a wider lens onto the predominant methodology in algorithmic fairness based on stylized cases.
Related papers
- It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - A Universal Unbiased Method for Classification from Aggregate
Observations [115.20235020903992]
This paper presents a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses.
Our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses.
arXiv Detail & Related papers (2023-06-20T07:22:01Z) - On Fairness and Stability: Is Estimator Variance a Friend or a Foe? [6.751310968561177]
We propose a new family of performance measures based on group-wise parity in variance.
We develop and release an open-source library that reconciles uncertainty quantification techniques with fairness analysis.
arXiv Detail & Related papers (2023-02-09T09:35:36Z) - On the Richness of Calibration [10.482805367361818]
We make explicit the choices involved in designing calibration scores.
We organise these into three grouping choices and a choice concerning the agglomeration of group errors.
In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions.
We demonstrate that with appropriate choices of grouping, these novel global fairness scores can provide notions of (sub-)group or individual fairness.
arXiv Detail & Related papers (2023-02-08T15:19:46Z) - Stop Measuring Calibration When Humans Disagree [25.177984280183402]
We show that measuring calibration to human majority given inherent disagreements is theoretically problematic.
We derive several instance-level measures of calibration that capture key statistical properties of human judgements.
arXiv Detail & Related papers (2022-10-28T14:01:32Z) - Beyond calibration: estimating the grouping loss of modern neural
networks [68.8204255655161]
Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss.
We show that modern neural network architectures in vision and NLP exhibit grouping loss, notably in distribution shifts settings.
arXiv Detail & Related papers (2022-10-28T07:04:20Z) - Is calibration a fairness requirement? An argument from the point of
view of moral philosophy and decision theory [0.0]
We argue that a violation of group calibration may be unfair in some cases, but not unfair in others.
This is in line with claims already advanced in the literature, that algorithmic fairness should be defined in a way that is sensitive to context.
arXiv Detail & Related papers (2022-05-11T14:03:33Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Selective Classification Can Magnify Disparities Across Groups [89.14499988774985]
We find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities.
Increasing abstentions can even decrease accuracies on some groups.
We train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group.
arXiv Detail & Related papers (2020-10-27T08:51:30Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.