Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
- URL: http://arxiv.org/abs/2404.03304v3
- Date: Mon, 6 May 2024 12:44:34 GMT
- Title: Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
- Authors: Chen Huang, Peixin Qin, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua,
- Abstract summary: We propose a new and inclusive evaluation protocol, Concept, which integrates both system- and user-centric factors.
Our protocol, Concept, serves a dual purpose. First, it provides an overview of the pros and cons in current CRS models.
Second, it pinpoints the problem of low usability in the "omnipotent" ChatGPT and offers a comprehensive reference guide for evaluating CRS.
- Score: 68.68418801681965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protocol, Concept, which integrates both system- and user-centric factors. We conceptualise three key characteristics in representing such factors and further divide them into six primary abilities. To implement Concept, we adopt a LLM-based user simulator and evaluator with scoring rubrics that are tailored for each primary ability. Our protocol, Concept, serves a dual purpose. First, it provides an overview of the pros and cons in current CRS models. Second, it pinpoints the problem of low usability in the "omnipotent" ChatGPT and offers a comprehensive reference guide for evaluating CRS, thereby setting the foundation for CRS improvement.
Related papers
- Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method [60.364834418531366]
We propose five new evaluation metrics that comprehensively and accurately assess the performance of RRS.
We formulate the RRS from a causal perspective, formulating recommendations as bilateral interventions.
We introduce a reranking strategy to maximize matching outcomes, as measured by the proposed metrics.
arXiv Detail & Related papers (2024-08-19T07:21:02Z) - Navigating User Experience of ChatGPT-based Conversational Recommender Systems: The Effects of Prompt Guidance and Recommendation Domain [15.179413273734761]
This study investigates the impact of prompt guidance (PG) and recommendation domain (RD) on the overall user experience of the system.
The findings reveal that PG can substantially enhance the system's explainability, adaptability, perceived ease of use, and transparency.
arXiv Detail & Related papers (2024-05-22T11:49:40Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Towards Explainable Conversational Recommender Systems [44.26020239452129]
Explanations in recommender systems have demonstrated benefits in helping the user understand the rationality of the recommendations.
In the conversational environment, multiple contextualized explanations need to be generated.
We propose ten evaluation perspectives based on concepts from conventional recommender systems together with the characteristics of recommender systems.
arXiv Detail & Related papers (2023-05-27T07:36:08Z) - Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs)
In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol.
We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z) - Evaluating the Robustness of Conversational Recommender Systems by
Adversarial Examples [16.49836195831763]
We propose an adversarial evaluation scheme including four scenarios in two categories.
We generate adversarial examples to evaluate the robustness of these systems in the face of different input data.
Our results show that none of these systems are robust and reliable to the adversarial examples.
arXiv Detail & Related papers (2023-03-09T20:51:18Z) - KECRS: Towards Knowledge-Enriched Conversational Recommendation System [50.0292306485452]
chit-chat-based conversational recommendation systems (CRS) provide item recommendations to users through natural language interactions.
external knowledge graphs (KG) have been introduced into chit-chat-based CRS.
We propose the Knowledge-Enriched Conversational Recommendation System (KECRS)
Experimental results on a large-scale dataset demonstrate that KECRS outperforms state-of-the-art chit-chat-based CRS.
arXiv Detail & Related papers (2021-05-18T03:52:06Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.