Estimation of Fair Ranking Metrics with Incomplete Judgments
- URL: http://arxiv.org/abs/2108.05152v1
- Date: Wed, 11 Aug 2021 10:57:00 GMT
- Title: Estimation of Fair Ranking Metrics with Incomplete Judgments
- Authors: \"Omer K{\i}rnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben
Carterette, Emine Y{\i}lmaz
- Abstract summary: We propose a sampling strategy and estimation technique for four fair ranking metrics.
We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items.
- Score: 70.37717864975387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is increasing attention to evaluating the fairness of search system
ranking decisions. These metrics often consider the membership of items to
particular groups, often identified using protected attributes such as gender
or ethnicity. To date, these metrics typically assume the availability and
completeness of protected attribute labels of items. However, the protected
attributes of individuals are rarely present, limiting the application of fair
ranking metrics in large scale systems. In order to address this problem, we
propose a sampling strategy and estimation technique for four fair ranking
metrics. We formulate a robust and unbiased estimator which can operate even
with very limited number of labeled items. We evaluate our approach using both
simulated and real world data. Our experimental results demonstrate that our
method can estimate this family of fair ranking metrics and provides a robust,
reliable alternative to exhaustive or random data annotation.
Related papers
- Ranking evaluation metrics from a group-theoretic perspective [5.333192842860574]
We show instances resulting in inconsistent evaluations, sources of potential mistrust in commonly used metrics.
Our analysis sheds light on ranking evaluation metrics, highlighting that inconsistent evaluations should not be seen as a source of mistrust.
arXiv Detail & Related papers (2024-08-14T09:06:58Z) - Measuring Fairness in Large-Scale Recommendation Systems with Missing Labels [8.921669180278274]
In large-scale recommendation systems, the vast array of items makes it infeasible to obtain accurate user preferences for each product, resulting in a common issue of missing labels.
Previous methods often treat these samples missing labels as negative, which can significantly deviate from the ground truth fairness metrics.
We propose a novel method employing a small randomized traffic to estimate fairness metrics accurately.
arXiv Detail & Related papers (2024-06-07T20:14:13Z) - Practical Bias Mitigation through Proxy Sensitive Attribute Label
Generation [0.0]
We propose a two-stage approach of unsupervised embedding generation followed by clustering to obtain proxy-sensitive labels.
The efficacy of our work relies on the assumption that bias propagates through non-sensitive attributes that are correlated to the sensitive attributes.
Experimental results demonstrate that bias mitigation using existing algorithms such as Fair Mixup and Adversarial Debiasing yields comparable results on derived proxy labels.
arXiv Detail & Related papers (2023-12-26T10:54:15Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Properties of Group Fairness Metrics for Rankings [4.479834103607384]
We perform a comparative analysis of existing group fairness metrics developed in the context of fair ranking.
We take an axiomatic approach whereby we design a set of thirteen properties for group fairness metrics.
We demonstrate that most of these metrics only satisfy a small subset of the proposed properties.
arXiv Detail & Related papers (2022-12-29T15:50:18Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Measuring Disparate Outcomes of Content Recommendation Algorithms with
Distributional Inequality Metrics [5.74271110290378]
We evaluate a set of metrics originating from economics, distributional inequality metrics, and their ability to measure disparities in content exposure in the Twitter algorithmic timeline.
We show that we can use these metrics to identify content suggestion algorithms that contribute more strongly to skewed outcomes between users.
arXiv Detail & Related papers (2022-02-03T14:41:39Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative
Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems.
Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective.
We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.