Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large
Language Model Recommendation
- URL: http://arxiv.org/abs/2305.07609v3
- Date: Tue, 17 Oct 2023 13:29:54 GMT
- Title: Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large
Language Model Recommendation
- Authors: Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan
He
- Abstract summary: We propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM)
This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes.
By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations.
- Score: 52.62492168507781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The remarkable achievements of Large Language Models (LLMs) have led to the
emergence of a novel recommendation paradigm -- Recommendation via LLM
(RecLLM). Nevertheless, it is important to note that LLMs may contain social
prejudices, and therefore, the fairness of recommendations made by RecLLM
requires further investigation. To avoid the potential risks of RecLLM, it is
imperative to evaluate the fairness of RecLLM with respect to various sensitive
attributes on the user side. Due to the differences between the RecLLM paradigm
and the traditional recommendation paradigm, it is problematic to directly use
the fairness benchmark of traditional recommendation. To address the dilemma,
we propose a novel benchmark called Fairness of Recommendation via LLM
(FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset
that accounts for eight sensitive attributes1 in two recommendation scenarios:
music and movies. By utilizing our FaiRLLM benchmark, we conducted an
evaluation of ChatGPT and discovered that it still exhibits unfairness to some
sensitive attributes when generating recommendations. Our code and dataset can
be found at https://github.com/jizhi-zhang/FaiRLLM.
Related papers
- Do Reviews Matter for Recommendations in the Era of Large Language Models? [8.772803183525284]
With the advent of large language models (LLMs), the landscape of recommender systems is undergoing a significant transformation.<n>Traditionally, user reviews have served as a critical source of rich, contextual information for enhancing recommendation quality.<n>This paper provides a systematic investigation of the evolving role of text reviews in recommendation by comparing deep learning methods and LLM approaches.
arXiv Detail & Related papers (2025-12-15T04:46:48Z) - Do LLM-judges Align with Human Relevance in Cranfield-style Recommender Evaluation? [40.49875426230813]
This paper investigates whether Large Language Models (LLMs) can serve as reliable automatic judges to address scalability challenges.<n>Using the ML-32M-ext Cranfield-style movie recommendation collection, we first examine the limitations of existing evaluation methodologies.<n>We find that incorporating richer item metadata and longer user histories improves alignment, and that LLM-judge yields high agreement with human-based rankings.
arXiv Detail & Related papers (2025-11-28T16:10:39Z) - Towards Comprehensible Recommendation with Large Language Model Fine-tuning [41.218487308635126]
We propose a novel Content Understanding from a Collaborative Perspective framework (CURec) for recommendation systems.<n>Curec generates collaborative-aligned content features for more comprehensive recommendations.<n>Experiments on public benchmarks demonstrate the superiority of CURec over existing methods.
arXiv Detail & Related papers (2025-08-11T03:55:31Z) - $\ ext{R}^2\ ext{ec}$: Towards Large Recommender Models with Reasoning [50.291998724376654]
We propose name, a unified large recommender model with intrinsic reasoning capabilities.<n> RecPO is a corresponding reinforcement learning framework that optimize name both the reasoning and recommendation capabilities simultaneously in a single policy update.<n> Experiments on three datasets with various baselines verify the effectiveness of name, showing relative improvements of 68.67% in Hit@5 and 45.21% in NDCG@20.
arXiv Detail & Related papers (2025-05-22T17:55:43Z) - DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation [83.21140655248624]
Large language models (LLMs) have been introduced into recommender systems (RSs)<n>We propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space.<n> Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines.
arXiv Detail & Related papers (2025-05-22T15:49:38Z) - Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation [96.18720164390699]
This paper explores the use of large language models (LLMs) for annotating document utility in training retrieval and retrieval-augmented generation (RAG) systems.<n>Our results show that LLM-generated annotations enhance out-of-domain retrieval performance and improve RAG outcomes compared to models trained solely on human annotations or downstream QA metrics.
arXiv Detail & Related papers (2025-04-07T16:05:52Z) - Learning Recommender Systems with Soft Target: A Decoupled Perspective [49.83787742587449]
We propose a novel decoupled soft label optimization framework to consider the objectives as two aspects by leveraging soft labels.
We present a sensible soft-label generation algorithm that models a label propagation algorithm to explore users' latent interests in unobserved feedback via neighbors.
arXiv Detail & Related papers (2024-10-09T04:20:15Z) - LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints [86.59857711385833]
We introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions.
To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline.
Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback.
arXiv Detail & Related papers (2024-10-09T01:25:10Z) - Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders [31.116716790604116]
Large language models (LLMs) have been garnering increasing attention in the recommendation community.
Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve state-of-the-art' performance in sequential recommendation.
This study provides theoretical justification for the superiority of the cross-entropy loss.
arXiv Detail & Related papers (2024-08-26T12:52:02Z) - On Softmax Direct Preference Optimization for Recommendation [50.896117978746]
We propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives.
Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders.
arXiv Detail & Related papers (2024-06-13T15:16:11Z) - A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System [9.470545149911072]
This paper proposes a normative framework to benchmark consumer fairness in LLM-powered recommender systems.
We argue that this gap can lead to arbitrary conclusions about fairness.
Experiments on the MovieLens dataset on consumer fairness reveal fairness deviations in age-based recommendations.
arXiv Detail & Related papers (2024-05-03T16:25:27Z) - CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System [16.84754752395103]
This work takes a critical stance on previous studies concerning fairness evaluation in Large Language Model (LLM)-based recommender systems.
We introduce CFaiRLLM, an enhanced evaluation framework that not only incorporates true preference alignment but also rigorously examines intersectional fairness.
To validate the efficacy of CFaiRLLM, we conducted extensive experiments using MovieLens and LastFM.
arXiv Detail & Related papers (2024-03-08T20:44:59Z) - Federated Recommendation via Hybrid Retrieval Augmented Generation [16.228589300933262]
Federated Recommendation (FR) enables privacy-preserving recommendations.
Large Language Models (LLMs) as recommenders have proven effective across various recommendation scenarios.
We propose GPT-FedRec, a federated recommendation framework leveraging ChatGPT and a novel hybrid Retrieval Augmented Generation (RAG) mechanism.
arXiv Detail & Related papers (2024-03-07T06:38:41Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z) - UP5: Unbiased Foundation Model for Fairness-aware Recommendation [45.47673627667594]
A growing concern that Large Language Models might inadvertently perpetuate societal stereotypes, resulting in unfair recommendations.
This paper focuses on user-side fairness for LLM-based recommendation where the users may require a recommender system to be fair on sensitive features such as gender or age.
We introduce a novel Counterfactually-Fair-Prompt (CFP) method towards Unbiased Foundation mOdels (UFO) for fairness-aware LLM-based recommendation.
arXiv Detail & Related papers (2023-05-20T04:32:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.