Related papers: Properties of Group Fairness Metrics for Rankings

Properties of Group Fairness Metrics for Rankings

URL: http://arxiv.org/abs/2212.14351v1
Date: Thu, 29 Dec 2022 15:50:18 GMT
Title: Properties of Group Fairness Metrics for Rankings
Authors: Tobias Schumacher, Marlene Lutz, Sandipan Sikdar, Markus Strohmaier
Abstract summary: We perform a comparative analysis of existing group fairness metrics developed in the context of fair ranking. We take an axiomatic approach whereby we design a set of thirteen properties for group fairness metrics. We demonstrate that most of these metrics only satisfy a small subset of the proposed properties.
Score: 4.479834103607384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, several metrics have been developed for evaluating group fairness of rankings. Given that these metrics were developed with different application contexts and ranking algorithms in mind, it is not straightforward which metric to choose for a given scenario. In this paper, we perform a comprehensive comparative analysis of existing group fairness metrics developed in the context of fair ranking. By virtue of their diverse application contexts, we argue that such a comparative analysis is not straightforward. Hence, we take an axiomatic approach whereby we design a set of thirteen properties for group fairness metrics that consider different ranking settings. A metric can then be selected depending on whether it satisfies all or a subset of these properties. We apply these properties on eleven existing group fairness metrics, and through both empirical and theoretical results we demonstrate that most of these metrics only satisfy a small subset of the proposed properties. These findings highlight limitations of existing metrics, and provide insights into how to evaluate and interpret different fairness metrics in practical deployment. The proposed properties can also assist practitioners in selecting appropriate metrics for evaluating fairness in a specific application.

Related papers

Bayes-Optimal Fair Classification with Multiple Sensitive Features [24.42403136889636]
We characterize the Bayes-optimal fair classifier for multiple sensitive features under general approximate fairness measures. We show that these approximate measures for existing group fairness notions, including Demographic Parity, are linear transformations of selection rates for specific groups. Our framework applies to both attribute-aware and attribute-blind settings and can accommodate composite fairness notions like Equalized Odds.
arXiv Detail & Related papers (2025-05-01T16:12:12Z)
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy [52.261323452286554]
We introduce a method for contextual metric meta-evaluation by comparing the local metric accuracy of evaluation metrics. Across translation, speech recognition, and ranking tasks, we demonstrate that the local metric accuracies vary both in absolute value and relative effectiveness as we shift across evaluation contexts.
arXiv Detail & Related papers (2025-03-25T16:42:25Z)
Ranking evaluation metrics from a group-theoretic perspective [5.333192842860574]
We show instances resulting in inconsistent evaluations, sources of potential mistrust in commonly used metrics. Our analysis sheds light on ranking evaluation metrics, highlighting that inconsistent evaluations should not be seen as a source of mistrust.
arXiv Detail & Related papers (2024-08-14T09:06:58Z)
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations [74.70957445600936]
Multiple metrics have been introduced to measure fairness in various natural language processing tasks. These metrics can be roughly categorized into two categories: 1) emphextrinsic metrics for evaluating fairness in downstream applications and 2) emphintrinsic metrics for estimating fairness in upstream language representation models.
arXiv Detail & Related papers (2022-03-25T22:17:43Z)
Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z)
Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics [5.74271110290378]
We evaluate a set of metrics originating from economics, distributional inequality metrics, and their ability to measure disparities in content exposure in the Twitter algorithmic timeline. We show that we can use these metrics to identify content suggestion algorithms that contribute more strongly to skewed outcomes between users.
arXiv Detail & Related papers (2022-02-03T14:41:39Z)
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance. Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)
Estimation of Fair Ranking Metrics with Incomplete Judgments [70.37717864975387]
We propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items.
arXiv Detail & Related papers (2021-08-11T10:57:00Z)
Fair Performance Metric Elicitation [29.785862520452955]
We consider the choice of fairness metrics through the lens of metric elicitation. We propose a novel strategy to elicit group-fair performance metrics for multiclass classification problems.
arXiv Detail & Related papers (2020-06-23T04:03:24Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Overview of the TREC 2019 Fair Ranking Track [65.15263872493799]
The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers. This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process.
arXiv Detail & Related papers (2020-03-25T21:34:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.