Related papers: CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System

CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System

URL: http://arxiv.org/abs/2403.05668v3
Date: Thu, 20 Feb 2025 19:30:53 GMT
Title: CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System
Authors: Yashar Deldjoo, Tommaso di Noia,
Abstract summary: This work takes a critical stance on previous studies concerning fairness evaluation in Large Language Model (LLM)-based recommender systems.<n>We introduce CFaiRLLM, an enhanced evaluation framework that not only incorporates true preference alignment but also rigorously examines intersectional fairness.<n>To validate the efficacy of CFaiRLLM, we conducted extensive experiments using MovieLens and LastFM.
Score: 16.84754752395103
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work takes a critical stance on previous studies concerning fairness evaluation in Large Language Model (LLM)-based recommender systems, which have primarily assessed consumer fairness by comparing recommendation lists generated with and without sensitive user attributes. Such approaches implicitly treat discrepancies in recommended items as biases, overlooking whether these changes might stem from genuine personalization aligned with the true preferences of users. Moreover, these earlier studies typically address single sensitive attributes in isolation, neglecting the complex interplay of intersectional identities. In response to these shortcomings, we introduce CFaiRLLM, an enhanced evaluation framework that not only incorporates true preference alignment but also rigorously examines intersectional fairness by considering overlapping sensitive attributes. Additionally, CFaiRLLM introduces diverse user profile sampling strategies-random, top-rated, and recency-focused-to better understand the impact of profile generation fed to LLMs in light of inherent token limitations in these systems. Given that fairness depends on accurately understanding users' tastes and preferences, these strategies provide a more realistic assessment of fairness within RecLLMs. To validate the efficacy of CFaiRLLM, we conducted extensive experiments using MovieLens and LastFM datasets, applying various sampling strategies and sensitive attribute configurations. The evaluation metrics include both item similarity measures and true preference alignment considering both hit and ranking (Jaccard Similarity and PRAG), thereby conducting a multifaceted analysis of recommendation fairness.

Related papers

Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs [7.197702136906138]
We propose an uncertainty-aware fairness metric, UCerF, to enable a fine-grained evaluation of model fairness.<n> observing data size, diversity, and clarity issues in current datasets, we introduce a new gender-occupation fairness evaluation dataset.<n>We establish a benchmark, using our metric and dataset, and apply it to evaluate the behavior of ten open-source AI systems.
arXiv Detail & Related papers (2025-05-29T20:45:18Z)
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation [57.380464382910375]
We show that the choice of feedback protocol can significantly affect evaluation reliability and induce systematic biases. In particular, we show that pairwise evaluation protocols are more vulnerable to distracted evaluation.
arXiv Detail & Related papers (2025-04-20T19:05:59Z)
Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes. We find that the majority of disagreements are in opposition with standard reward modeling approaches. We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z)
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [63.32585910975191]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset. We show that our approach consistently boosts DPO by a considerable margin. Our method not only maximizes the utility of preference data but also mitigates the issue of unlearning, demonstrating its broad effectiveness beyond mere data expansion.
arXiv Detail & Related papers (2024-10-10T16:01:51Z)
Improved Diversity-Promoting Collaborative Metric Learning for Recommendation [127.08043409083687]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems. This paper focuses on a challenging scenario where a user has multiple categories of interests. We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2024-09-02T07:44:48Z)
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC. We increase the consistency and informativeness of the pairwise preference signals through targeted modifications. We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z)
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments [41.25558612970942]
We show that large language models (LLMs) exhibit preference biases and worrying sensitivity to prompt designs. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO.
arXiv Detail & Related papers (2024-06-17T09:48:53Z)
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System [9.470545149911072]
This paper proposes a normative framework to benchmark consumer fairness in LLM-powered recommender systems. We argue that this gap can lead to arbitrary conclusions about fairness. Experiments on the MovieLens dataset on consumer fairness reveal fairness deviations in age-based recommendations.
arXiv Detail & Related papers (2024-05-03T16:25:27Z)
A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender Systems [13.89038866451741]
We propose CP-FairRank, an optimization-based re-ranking algorithm that seamlessly integrates fairness constraints from both the consumer and producer side. We demonstrate that our proposed strategy is able to improve both consumer and producer fairness without compromising overall recommendation quality.
arXiv Detail & Related papers (2024-02-01T10:42:05Z)
Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency [9.882829614199453]
This paper explores the biases in ChatGPT-based recommender systems, focusing on provider fairness (item-side fairness) In the first experiment, we assess seven distinct prompt scenarios on top-K recommendation accuracy and fairness. Embedding fairness into system roles, such as "act as a fair recommender," proved more effective than fairness directives within prompts.
arXiv Detail & Related papers (2024-01-19T08:09:20Z)
Marginal Debiased Network for Fair Visual Recognition [59.05212866862219]
We propose a novel marginal debiased network (MDN) to learn debiased representations. Our MDN can achieve a remarkable performance on under-represented samples.
arXiv Detail & Related papers (2024-01-04T08:57:09Z)
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation [52.62492168507781]
We propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM) This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations.
arXiv Detail & Related papers (2023-05-12T16:54:36Z)
Fairness via Adversarial Attribute Neighbourhood Robust Learning [49.93775302674591]
We propose a principled underlineRobust underlineAdversarial underlineAttribute underlineNeighbourhood (RAAN) loss to debias the classification head.
arXiv Detail & Related papers (2022-10-12T23:39:28Z)
Opportunistic Multi-aspect Fairness through Personalized Re-ranking [5.8562079474220665]
We present a re-ranking approach to fairness-aware recommendation that learns individual preferences across multiple fairness dimensions. We show that our opportunistic and metric-agnostic approach achieves a better trade-off between accuracy and fairness than prior re-ranking approaches.
arXiv Detail & Related papers (2020-05-21T04:25:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.