Related papers: Estimation of Fair Ranking Metrics with Incomplete Judgments

Estimation of Fair Ranking Metrics with Incomplete Judgments

URL: http://arxiv.org/abs/2108.05152v1
Date: Wed, 11 Aug 2021 10:57:00 GMT
Title: Estimation of Fair Ranking Metrics with Incomplete Judgments
Authors: \"Omer K{\i}rnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, Emine Y{\i}lmaz
Abstract summary: We propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items.
Score: 70.37717864975387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation.

Related papers

Quantifying Query Fairness Under Unawareness [82.33181164973365]
We introduce a robust fairness estimator based on quantification that effectively handles multiple sensitive attributes beyond binary classifications.<n>Our method outperforms existing baselines across various sensitive attributes and is the first to establish a reliable protocol for measuring fairness under unawareness.
arXiv Detail & Related papers (2025-06-04T16:31:44Z)
Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics [0.0]
We show the significance of sample-size bias in classification metrics.<n>This revelation challenges the efficacy of these metrics in assessing bias with high resolution.<n>We propose a model-agnostic assessment and correction technique.
arXiv Detail & Related papers (2025-05-06T22:02:53Z)
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy [52.261323452286554]
We introduce a method for contextual metric meta-evaluation by comparing the local metric accuracy of evaluation metrics. Across translation, speech recognition, and ranking tasks, we demonstrate that the local metric accuracies vary both in absolute value and relative effectiveness as we shift across evaluation contexts.
arXiv Detail & Related papers (2025-03-25T16:42:25Z)
Ranking evaluation metrics from a group-theoretic perspective [5.333192842860574]
We show instances resulting in inconsistent evaluations, sources of potential mistrust in commonly used metrics. Our analysis sheds light on ranking evaluation metrics, highlighting that inconsistent evaluations should not be seen as a source of mistrust.
arXiv Detail & Related papers (2024-08-14T09:06:58Z)
Measuring Fairness in Large-Scale Recommendation Systems with Missing Labels [8.921669180278274]
In large-scale recommendation systems, the vast array of items makes it infeasible to obtain accurate user preferences for each product, resulting in a common issue of missing labels. Previous methods often treat these samples missing labels as negative, which can significantly deviate from the ground truth fairness metrics. We propose a novel method employing a small randomized traffic to estimate fairness metrics accurately.
arXiv Detail & Related papers (2024-06-07T20:14:13Z)
Practical Bias Mitigation through Proxy Sensitive Attribute Label Generation [0.0]
We propose a two-stage approach of unsupervised embedding generation followed by clustering to obtain proxy-sensitive labels. The efficacy of our work relies on the assumption that bias propagates through non-sensitive attributes that are correlated to the sensitive attributes. Experimental results demonstrate that bias mitigation using existing algorithms such as Fair Mixup and Adversarial Debiasing yields comparable results on derived proxy labels.
arXiv Detail & Related papers (2023-12-26T10:54:15Z)
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z)
Properties of Group Fairness Metrics for Rankings [4.479834103607384]
We perform a comparative analysis of existing group fairness metrics developed in the context of fair ranking. We take an axiomatic approach whereby we design a set of thirteen properties for group fairness metrics. We demonstrate that most of these metrics only satisfy a small subset of the proposed properties.
arXiv Detail & Related papers (2022-12-29T15:50:18Z)
Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z)
Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics [5.74271110290378]
We evaluate a set of metrics originating from economics, distributional inequality metrics, and their ability to measure disparities in content exposure in the Twitter algorithmic timeline. We show that we can use these metrics to identify content suggestion algorithms that contribute more strongly to skewed outcomes between users.
arXiv Detail & Related papers (2022-02-03T14:41:39Z)
Measuring Fairness Under Unawareness of Sensitive Attributes: A Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes. We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems. Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective. We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.