Joint Evaluation of Fairness and Relevance in Recommender Systems with Pareto Frontier
- URL: http://arxiv.org/abs/2502.11921v1
- Date: Mon, 17 Feb 2025 15:33:28 GMT
- Title: Joint Evaluation of Fairness and Relevance in Recommender Systems with Pareto Frontier
- Authors: Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma,
- Abstract summary: We present a new approach for jointly evaluating fairness and relevance in recommender systems (RSs)
Our approach is modular and intuitive as it can be computed with existing measures.
Experiments with 4 RS models, 3 re-ranking strategies, and 6 datasets show that existing metrics have inconsistent associations with our solution.
- Score: 12.013380880264439
- License:
- Abstract: Fairness and relevance are two important aspects of recommender systems (RSs). Typically, they are evaluated either (i) separately by individual measures of fairness and relevance, or (ii) jointly using a single measure that accounts for fairness with respect to relevance. However, approach (i) often does not provide a reliable joint estimate of the goodness of the models, as it has two different best models: one for fairness and another for relevance. Approach (ii) is also problematic because these measures tend to be ad-hoc and do not relate well to traditional relevance measures, like NDCG. Motivated by this, we present a new approach for jointly evaluating fairness and relevance in RSs: Distance to Pareto Frontier (DPFR). Given some user-item interaction data, we compute their Pareto frontier for a pair of existing relevance and fairness measures, and then use the distance from the frontier as a measure of the jointly achievable fairness and relevance. Our approach is modular and intuitive as it can be computed with existing measures. Experiments with 4 RS models, 3 re-ranking strategies, and 6 datasets show that existing metrics have inconsistent associations with our Pareto-optimal solution, making DPFR a more robust and theoretically well-founded joint measure for assessing fairness and relevance. Our code: https://github.com/theresiavr/DPFR-recsys-evaluation
Related papers
- Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method [60.364834418531366]
We propose five new evaluation metrics that comprehensively and accurately assess the performance of RRS.
We formulate the RRS from a causal perspective, formulating recommendations as bilateral interventions.
We introduce a reranking strategy to maximize matching outcomes, as measured by the proposed metrics.
arXiv Detail & Related papers (2024-08-19T07:21:02Z) - Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance [12.013380880264439]
Relevance and fairness are two major objectives of recommender systems (RSs)
Recent work proposes measures of RS fairness that are either independent from relevance (fairness-only) or conditioned on relevance (joint measures)
We collect all joint evaluation measures of RS relevance and fairness, and ask: How much do they agree with each other?
We empirically study for the first time the behaviour of these measures across 4 real-world datasets and 4 recommenders.
arXiv Detail & Related papers (2024-05-28T15:25:04Z) - Intersectional Two-sided Fairness in Recommendation [41.96733939002468]
We propose a novel approach called Inter-sectional Two-sided Fairness Recommendation (ITFR)
Our method utilizes a sharpness-aware loss to perceive disadvantaged groups, and then uses collaborative loss balance to develop consistent distinguishing abilities for different intersectional groups.
Our proposed approach effectively alleviates the intersectional two-sided unfairness and consistently outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2024-02-05T08:56:24Z) - Standardized Interpretable Fairness Measures for Continuous Risk Scores [4.192037827105842]
We propose a standardized version of fairness measures for continuous scores with a reasonable interpretation based on the Wasserstein distance.
Our measures are easily computable and well suited for quantifying and interpreting the strength of group disparities as well as for comparing biases across different models, datasets, or time points.
arXiv Detail & Related papers (2023-08-22T12:01:49Z) - C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue
Evaluation [68.59356746305255]
We propose a novel model-agnostic approach to measure the turn-level interaction between the system and the user.
Our approach significantly improves the correlation with human judgment compared with existing evaluation systems.
arXiv Detail & Related papers (2023-06-27T06:58:03Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Simpson's Paradox in Recommender Fairness: Reconciling differences
between per-user and aggregated evaluations [16.053419956606557]
We argue that two notions of fairness in ranking and recommender systems can lead to opposite conclusions.
We reconcile these notions and show that the tension is due to differences in distributions of users where items are relevant.
Based on this new understanding, practitioners might be interested in either notions, but might face challenges with the per-user metric.
arXiv Detail & Related papers (2022-10-14T12:43:32Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.