Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance
- URL: http://arxiv.org/abs/2405.18276v1
- Date: Tue, 28 May 2024 15:25:04 GMT
- Title: Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance
- Authors: Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma,
- Abstract summary: Relevance and fairness are two major objectives of recommender systems (RSs)
Recent work proposes measures of RS fairness that are either independent from relevance (fairness-only) or conditioned on relevance (joint measures)
We collect all joint evaluation measures of RS relevance and fairness, and ask: How much do they agree with each other?
We empirically study for the first time the behaviour of these measures across 4 real-world datasets and 4 recommenders.
- Score: 12.013380880264439
- License:
- Abstract: Relevance and fairness are two major objectives of recommender systems (RSs). Recent work proposes measures of RS fairness that are either independent from relevance (fairness-only) or conditioned on relevance (joint measures). While fairness-only measures have been studied extensively, we look into whether joint measures can be trusted. We collect all joint evaluation measures of RS relevance and fairness, and ask: How much do they agree with each other? To what extent do they agree with relevance/fairness measures? How sensitive are they to changes in rank position, or to increasingly fair and relevant recommendations? We empirically study for the first time the behaviour of these measures across 4 real-world datasets and 4 recommenders. We find that most of these measures: i) correlate weakly with one another and even contradict each other at times; ii) are less sensitive to rank position changes than relevance- and fairness-only measures, meaning that they are less granular than traditional RS measures; and iii) tend to compress scores at the low end of their range, meaning that they are not very expressive. We counter the above limitations with a set of guidelines on the appropriate usage of such measures, i.e., they should be used with caution due to their tendency to contradict each other and of having a very small empirical range.
Related papers
- Joint Evaluation of Fairness and Relevance in Recommender Systems with Pareto Frontier [12.013380880264439]
We present a new approach for jointly evaluating fairness and relevance in recommender systems (RSs)
Our approach is modular and intuitive as it can be computed with existing measures.
Experiments with 4 RS models, 3 re-ranking strategies, and 6 datasets show that existing metrics have inconsistent associations with our solution.
arXiv Detail & Related papers (2025-02-17T15:33:28Z) - Standardized Interpretable Fairness Measures for Continuous Risk Scores [4.192037827105842]
We propose a standardized version of fairness measures for continuous scores with a reasonable interpretation based on the Wasserstein distance.
Our measures are easily computable and well suited for quantifying and interpreting the strength of group disparities as well as for comparing biases across different models, datasets, or time points.
arXiv Detail & Related papers (2023-08-22T12:01:49Z) - Trustworthy Social Bias Measurement [92.87080873893618]
In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling.
We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures.
We demonstrate considerable evidence to trust our measures, showing they overcome conceptual, technical, and empirical deficiencies present in prior measures.
arXiv Detail & Related papers (2022-12-20T18:45:12Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Cascaded Debiasing: Studying the Cumulative Effect of Multiple
Fairness-Enhancing Interventions [48.98659895355356]
This paper investigates the cumulative effect of multiple fairness enhancing interventions at different stages of the machine learning (ML) pipeline.
Applying multiple interventions results in better fairness and lower utility than individual interventions on aggregate.
On the downside, fairness-enhancing interventions can negatively impact different population groups, especially the privileged group.
arXiv Detail & Related papers (2022-02-08T09:20:58Z) - On Quantitative Evaluations of Counterfactuals [88.42660013773647]
This paper consolidates work on evaluating visual counterfactual examples through an analysis and experiments.
We find that while most metrics behave as intended for sufficiently simple datasets, some fail to tell the difference between good and bad counterfactuals when the complexity increases.
We propose two new metrics, the Label Variation Score and the Oracle score, which are both less vulnerable to such tiny changes.
arXiv Detail & Related papers (2021-10-30T05:00:36Z) - On the Choice of Fairness: Finding Representative Fairness Metrics for a
Given Context [5.667221573173013]
Various notions of fairness have been defined, though choosing an appropriate metric is cumbersome.
Trade-offs and impossibility theorems make such selection even more complicated and controversial.
We propose a framework that automatically discovers the correlations and trade-offs between different pairs of measures for a given context.
arXiv Detail & Related papers (2021-09-13T04:17:38Z) - Gradual (In)Compatibility of Fairness Criteria [0.0]
Impossibility results show that important fairness measures cannot be satisfied at the same time under reasonable assumptions.
This paper explores whether we can satisfy and/or improve these fairness measures simultaneously to a certain degree.
arXiv Detail & Related papers (2021-09-09T16:37:30Z) - Balancing Accuracy and Fairness for Interactive Recommendation with
Reinforcement Learning [68.25805655688876]
Fairness in recommendation has attracted increasing attention due to bias and discrimination possibly caused by traditional recommenders.
We propose a reinforcement learning based framework, FairRec, to dynamically maintain a long-term balance between accuracy and fairness in IRS.
Extensive experiments validate that FairRec can improve fairness, while preserving good recommendation quality.
arXiv Detail & Related papers (2021-06-25T02:02:51Z) - Fairness in machine learning: against false positive rate equality as a
measure of fairness [0.0]
Two popular fairness measures are calibration and equality of false positive rate.
I give an ethical framework for thinking about these measures and argue that false positive rate equality does not track anything about fairness.
arXiv Detail & Related papers (2020-07-06T17:03:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.