Related papers: Exploring Diversity, Novelty, and Popularity Bias in ChatGPT's Recommendations

Exploring Diversity, Novelty, and Popularity Bias in ChatGPT's Recommendations

URL: http://arxiv.org/abs/2601.01997v1
Date: Mon, 05 Jan 2026 10:56:01 GMT
Title: Exploring Diversity, Novelty, and Popularity Bias in ChatGPT's Recommendations
Authors: Dario Di Palma, Giovanni Maria Biancofiore, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia,
Abstract summary: ChatGPT has emerged as a versatile tool, demonstrating capabilities across diverse domains.<n>This study investigates the recommendations provided by ChatGPT-3.5 and ChatGPT-4 by assessing their capabilities in terms of diversity, novelty, and popularity bias.
Score: 13.261017248837822
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: ChatGPT has emerged as a versatile tool, demonstrating capabilities across diverse domains. Given these successes, the Recommender Systems (RSs) community has begun investigating its applications within recommendation scenarios primarily focusing on accuracy. While the integration of ChatGPT into RSs has garnered significant attention, a comprehensive analysis of its performance across various dimensions remains largely unexplored. Specifically, the capabilities of providing diverse and novel recommendations or exploring potential biases such as popularity bias have not been thoroughly examined. As the use of these models continues to expand, understanding these aspects is crucial for enhancing user satisfaction and achieving long-term personalization. This study investigates the recommendations provided by ChatGPT-3.5 and ChatGPT-4 by assessing ChatGPT's capabilities in terms of diversity, novelty, and popularity bias. We evaluate these models on three distinct datasets and assess their performance in Top-N recommendation and cold-start scenarios. The findings reveal that ChatGPT-4 matches or surpasses traditional recommenders, demonstrating the ability to balance novelty and diversity in recommendations. Furthermore, in the cold-start scenario, ChatGPT models exhibit superior performance in both accuracy and novelty, suggesting they can be particularly beneficial for new users. This research highlights the strengths and limitations of ChatGPT's recommendations, offering new perspectives on the capacity of these models to provide recommendations beyond accuracy-focused metrics.

Related papers

Tree of Preferences for Diversified Recommendation [54.183647833064136]
We study diversified recommendation from a data-bias perspective.<n>Inspired by the outstanding performance of large language models (LLMs) in zero-shot inference leveraging world knowledge, we propose a novel approach.
arXiv Detail & Related papers (2025-12-24T04:13:17Z)
Using ChatGPT to Score Essays and Short-Form Constructed Responses [0.0]
Investigation focused on various prediction models, including linear regression, random forest, gradient boost, and boost. ChatGPT's performance was evaluated against human raters using quadratic weighted kappa (QWK) metrics. Study concludes that ChatGPT can complement human scoring but requires additional development to be reliable for high-stakes assessments.
arXiv Detail & Related papers (2024-08-18T16:51:28Z)
ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback [1.3654846342364308]
Large Language Models (LLMs) like ChatGPT have gained popularity due to their ease of use and their ability to adapt dynamically to various tasks while responding to feedback. We build a rigorous pipeline around ChatGPT to simulate how a user might realistically probe the model for recommendations. We explore the effect of popularity bias in ChatGPT's recommendations, and compare its performance to baseline models.
arXiv Detail & Related papers (2024-01-07T23:17:42Z)
Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case Study [26.17177931611486]
We present a preliminary case study investigating the recommendation capabilities of GPT-4V(ison), a recently released LMM by OpenAI. We employ a series of qualitative test samples spanning multiple domains to assess the quality of GPT-4V's responses within recommendation scenarios. We have also identified some limitations in using GPT-4V for recommendations, including a tendency to provide similar responses when given similar inputs.
arXiv Detail & Related papers (2023-11-07T18:39:10Z)
Evaluating ChatGPT as a Recommender System: A Rigorous Approach [12.458752059072706]
We propose a robust evaluation pipeline to assess ChatGPT's ability as an RS and post-process ChatGPT recommendations. We analyze the model's functionality in three settings: the Top-N Recommendation, the cold-start recommendation, and the re-ranking of a list of recommendations.
arXiv Detail & Related papers (2023-09-07T10:13:09Z)
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences [6.821378903525802]
ChatGPT has consistently demonstrated a remarkable level of accuracy and reliability in terms of content evaluation. A test set consisting of prompts is created, covering a wide range of use cases, and five models are utilized to generate corresponding responses. Results on the test set show that ChatGPT's ranking preferences are consistent with human to a certain extent.
arXiv Detail & Related papers (2023-03-14T03:13:02Z)
Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z)
Prompted Opinion Summarization with GPT-3.5 [115.95460650578678]
We show that GPT-3.5 models achieve very strong performance in human evaluation. We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity.
arXiv Detail & Related papers (2022-11-29T04:06:21Z)
Choosing the Best of Both Worlds: Diverse and Novel Recommendations through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting. SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations. Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z)
PURS: Personalized Unexpected Recommender System for Improving User Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process. Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z)
Latent Unexpected Recommendations [89.2011481379093]
We propose to model unexpectedness in the latent space of user and item embeddings, which allows to capture hidden and complex relations between new recommendations and historic purchases. In addition, we develop a novel Latent Closure (LC) method to construct hybrid utility function and provide unexpected recommendations based on the proposed model.
arXiv Detail & Related papers (2020-07-27T02:39:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.