Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large
Language Model Recommendation
- URL: http://arxiv.org/abs/2305.07609v3
- Date: Tue, 17 Oct 2023 13:29:54 GMT
- Title: Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large
Language Model Recommendation
- Authors: Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan
He
- Abstract summary: We propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM)
This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes.
By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations.
- Score: 52.62492168507781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The remarkable achievements of Large Language Models (LLMs) have led to the
emergence of a novel recommendation paradigm -- Recommendation via LLM
(RecLLM). Nevertheless, it is important to note that LLMs may contain social
prejudices, and therefore, the fairness of recommendations made by RecLLM
requires further investigation. To avoid the potential risks of RecLLM, it is
imperative to evaluate the fairness of RecLLM with respect to various sensitive
attributes on the user side. Due to the differences between the RecLLM paradigm
and the traditional recommendation paradigm, it is problematic to directly use
the fairness benchmark of traditional recommendation. To address the dilemma,
we propose a novel benchmark called Fairness of Recommendation via LLM
(FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset
that accounts for eight sensitive attributes1 in two recommendation scenarios:
music and movies. By utilizing our FaiRLLM benchmark, we conducted an
evaluation of ChatGPT and discovered that it still exhibits unfairness to some
sensitive attributes when generating recommendations. Our code and dataset can
be found at https://github.com/jizhi-zhang/FaiRLLM.
Related papers
- Learning Recommender Systems with Soft Target: A Decoupled Perspective [49.83787742587449]
We propose a novel decoupled soft label optimization framework to consider the objectives as two aspects by leveraging soft labels.
We present a sensible soft-label generation algorithm that models a label propagation algorithm to explore users' latent interests in unobserved feedback via neighbors.
arXiv Detail & Related papers (2024-10-09T04:20:15Z) - LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints [86.59857711385833]
We introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions.
To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline.
Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback.
arXiv Detail & Related papers (2024-10-09T01:25:10Z) - Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders [31.116716790604116]
Large language models (LLMs) have been garnering increasing attention in the recommendation community.
Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve state-of-the-art' performance in sequential recommendation.
This study provides theoretical justification for the superiority of the cross-entropy loss.
arXiv Detail & Related papers (2024-08-26T12:52:02Z) - On Softmax Direct Preference Optimization for Recommendation [50.896117978746]
We propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives.
Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders.
arXiv Detail & Related papers (2024-06-13T15:16:11Z) - A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System [9.470545149911072]
This paper proposes a normative framework to benchmark consumer fairness in LLM-powered recommender systems.
We argue that this gap can lead to arbitrary conclusions about fairness.
Experiments on the MovieLens dataset on consumer fairness reveal fairness deviations in age-based recommendations.
arXiv Detail & Related papers (2024-05-03T16:25:27Z) - Federated Recommendation via Hybrid Retrieval Augmented Generation [16.228589300933262]
Federated Recommendation (FR) enables privacy-preserving recommendations.
Large Language Models (LLMs) as recommenders have proven effective across various recommendation scenarios.
We propose GPT-FedRec, a federated recommendation framework leveraging ChatGPT and a novel hybrid Retrieval Augmented Generation (RAG) mechanism.
arXiv Detail & Related papers (2024-03-07T06:38:41Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z) - UP5: Unbiased Foundation Model for Fairness-aware Recommendation [45.47673627667594]
A growing concern that Large Language Models might inadvertently perpetuate societal stereotypes, resulting in unfair recommendations.
This paper focuses on user-side fairness for LLM-based recommendation where the users may require a recommender system to be fair on sensitive features such as gender or age.
We introduce a novel Counterfactually-Fair-Prompt (CFP) method towards Unbiased Foundation mOdels (UFO) for fairness-aware LLM-based recommendation.
arXiv Detail & Related papers (2023-05-20T04:32:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.