Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models
- URL: http://arxiv.org/abs/2305.13112v2
- Date: Fri, 3 Nov 2023 02:49:46 GMT
- Title: Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models
- Authors: Xiaolei Wang, Xinyu Tang, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen
- Abstract summary: The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs)
In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol.
We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
- Score: 115.7508325840751
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success of large language models (LLMs) has shown great potential
to develop more powerful conversational recommender systems (CRSs), which rely
on natural language conversations to satisfy user needs. In this paper, we
embark on an investigation into the utilization of ChatGPT for conversational
recommendation, revealing the inadequacy of the existing evaluation protocol.
It might over-emphasize the matching with the ground-truth items or utterances
generated by human annotators, while neglecting the interactive nature of being
a capable CRS. To overcome the limitation, we further propose an interactive
Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user
simulators. Our evaluation approach can simulate various interaction scenarios
between users and systems. Through the experiments on two publicly available
CRS datasets, we demonstrate notable improvements compared to the prevailing
evaluation protocol. Furthermore, we emphasize the evaluation of
explainability, and ChatGPT showcases persuasive explanation generation for its
recommendations. Our study contributes to a deeper comprehension of the
untapped potential of LLMs for CRSs and provides a more flexible and
easy-to-use evaluation framework for future research endeavors. The codes and
data are publicly available at https://github.com/RUCAIBox/iEvaLM-CRS.
Related papers
- Stop Playing the Guessing Game! Target-free User Simulation for Evaluating Conversational Recommender Systems [15.481944998961847]
PEPPER is an evaluation protocol with target-free user simulators constructed from real-user interaction histories and reviews.
PEPPER enables realistic user-CRS dialogues without falling into simplistic guessing games.
PEPPER presents detailed measures for comprehensively evaluating the preference elicitation capabilities of CRSs.
arXiv Detail & Related papers (2024-11-25T07:36:20Z) - A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems [14.646529557978512]
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences.
Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities.
We introduce a Controllable, scalable, and human-Involved (CSHI) simulator framework that manages the behavior of user simulators.
arXiv Detail & Related papers (2024-05-13T03:02:56Z) - How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation [14.646529557978512]
We analyze the limitations of using Large Language Models in constructing user simulators for Conversational Recommender System.
Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results.
We propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items.
arXiv Detail & Related papers (2024-03-25T04:21:06Z) - Parameter-Efficient Conversational Recommender System as a Language
Processing Task [52.47087212618396]
Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation.
Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items.
In this paper, we represent items in natural language and formulate CRS as a natural language processing task.
arXiv Detail & Related papers (2024-01-25T14:07:34Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z) - Improving Conversational Recommendation Systems via Counterfactual Data
Simulation [73.4526400381668]
Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations.
Existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data.
We propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs.
arXiv Detail & Related papers (2023-06-05T12:48:56Z) - Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender
System [11.404192885921498]
Chat-Rec is a new paradigm for building conversational recommender systems.
Chat-Rec is effective in learning user preferences and establishing connections between users and products.
In experiments, Chat-Rec effectively improve the results of top-k recommendations and performs better in zero-shot rating prediction task.
arXiv Detail & Related papers (2023-03-25T17:37:43Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Leveraging Historical Interaction Data for Improving Conversational
Recommender System [105.90963882850265]
We propose a novel pre-training approach to integrate item- and attribute-based preference sequence.
Experiment results on two real-world datasets have demonstrated the effectiveness of our approach.
arXiv Detail & Related papers (2020-08-19T03:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.