Matched Pair Calibration for Ranking Fairness
- URL: http://arxiv.org/abs/2306.03775v3
- Date: Thu, 30 Nov 2023 19:22:59 GMT
- Title: Matched Pair Calibration for Ranking Fairness
- Authors: Hannah Korevaar, Chris McConnell, Edmund Tong, Erik Brinkman, Alana
Shine, Misam Abbas, Blossom Metevier, Sam Corbett-Davies, Khalid El-Arini
- Abstract summary: We propose a test of fairness in score-based ranking systems called matched pair calibration.
We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking.
- Score: 2.580183306478581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a test of fairness in score-based ranking systems called matched
pair calibration. Our approach constructs a set of matched item pairs with
minimal confounding differences between subgroups before computing an
appropriate measure of ranking error over the set. The matching step ensures
that we compare subgroup outcomes between identically scored items so that
measured performance differences directly imply unfairness in subgroup-level
exposures. We show how our approach generalizes the fairness intuitions of
calibration from a binary classification setting to ranking and connect our
approach to other proposals for ranking fairness measures. Moreover, our
strategy shows how the logic of marginal outcome tests extends to cases where
the analyst has access to model scores. Lastly, we provide an example of
applying matched pair calibration to a real-word ranking data set to
demonstrate its efficacy in detecting ranking bias.
Related papers
- A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - On the Richness of Calibration [10.482805367361818]
We make explicit the choices involved in designing calibration scores.
We organise these into three grouping choices and a choice concerning the agglomeration of group errors.
In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions.
We demonstrate that with appropriate choices of grouping, these novel global fairness scores can provide notions of (sub-)group or individual fairness.
arXiv Detail & Related papers (2023-02-08T15:19:46Z) - Re-Examining System-Level Correlations of Automatic Summarization
Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.
We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z) - Repairing Regressors for Fair Binary Classification at Any Decision
Threshold [8.322348511450366]
We show that we can increase fair performance across all thresholds at once.
We introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups.
Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity.
arXiv Detail & Related papers (2022-03-14T20:53:35Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Pairwise Fairness for Ordinal Regression [22.838858781036574]
We adapt two fairness notions previously considered in fair ranking and propose a strategy for training a predictor that is approximately fair according to either notion.
Our predictor consists of a threshold model, composed of a scoring function and a set of thresholds.
We show that our strategy allows us to effectively explore the accuracy-vs-fairness trade-off and that it often compares favorably to "unfair" state-of-the-art methods for ordinal regression.
arXiv Detail & Related papers (2021-05-07T10:33:42Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - MatchGAN: A Self-Supervised Semi-Supervised Conditional Generative
Adversarial Network [51.84251358009803]
We present a novel self-supervised learning approach for conditional generative adversarial networks (GANs) under a semi-supervised setting.
We perform augmentation by randomly sampling sensible labels from the label space of the few labelled examples available.
Our method surpasses the baseline with only 20% of the labelled examples used to train the baseline.
arXiv Detail & Related papers (2020-06-11T17:14:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.