Evaluating the Robustness of Conversational Recommender Systems by
Adversarial Examples
- URL: http://arxiv.org/abs/2303.05575v1
- Date: Thu, 9 Mar 2023 20:51:18 GMT
- Title: Evaluating the Robustness of Conversational Recommender Systems by
Adversarial Examples
- Authors: Ali Montazeralghaem and James Allan
- Abstract summary: We propose an adversarial evaluation scheme including four scenarios in two categories.
We generate adversarial examples to evaluate the robustness of these systems in the face of different input data.
Our results show that none of these systems are robust and reliable to the adversarial examples.
- Score: 16.49836195831763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversational recommender systems (CRSs) are improving rapidly, according to
the standard recommendation accuracy metrics. However, it is essential to make
sure that these systems are robust in interacting with users including regular
and malicious users who want to attack the system by feeding the system
modified input data. In this paper, we propose an adversarial evaluation scheme
including four scenarios in two categories and automatically generate
adversarial examples to evaluate the robustness of these systems in the face of
different input data. By executing these adversarial examples we can compare
the ability of different conversational recommender systems to satisfy the
user's preferences. We evaluate three CRSs by the proposed adversarial examples
on two datasets. Our results show that none of these systems are robust and
reliable to the adversarial examples.
Related papers
- A Unified Causal Framework for Auditing Recommender Systems for Ethical Concerns [40.793466500324904]
We view recommender system auditing from a causal lens and provide a general recipe for defining auditing metrics.
Under this general causal auditing framework, we categorize existing auditing metrics and identify gaps in them.
We propose two classes of such metrics:future- and past-reacheability and stability, that measure the ability of a user to influence their own and other users' recommendations.
arXiv Detail & Related papers (2024-09-20T04:37:36Z) - Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method [60.364834418531366]
We propose five new evaluation metrics that comprehensively and accurately assess the performance of RRS.
We formulate the RRS from a causal perspective, formulating recommendations as bilateral interventions.
We introduce a reranking strategy to maximize matching outcomes, as measured by the proposed metrics.
arXiv Detail & Related papers (2024-08-19T07:21:02Z) - System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes [80.97898201876592]
We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process.
We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility.
arXiv Detail & Related papers (2024-05-29T18:19:37Z) - User-Controllable Recommendation via Counterfactual Retrospective and
Prospective Explanations [96.45414741693119]
We present a user-controllable recommender system that seamlessly integrates explainability and controllability.
By providing both retrospective and prospective explanations through counterfactual reasoning, users can customize their control over the system.
arXiv Detail & Related papers (2023-08-02T01:13:36Z) - Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation [17.41434948048325]
We conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios.
Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system.
We discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities.
arXiv Detail & Related papers (2023-05-23T09:24:53Z) - Re-Examining System-Level Correlations of Automatic Summarization
Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.
We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z) - Membership Inference Attacks Against Recommender Systems [33.66394989281801]
We make the first attempt on quantifying the privacy leakage of recommender systems through the lens of membership inference.
Our attack is on the user-level but not on the data sample-level.
A shadow recommender is established to derive the labeled training data for training the attack model.
arXiv Detail & Related papers (2021-09-16T15:19:19Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - A Robust Reputation-based Group Ranking System and its Resistance to
Bribery [8.300507994596416]
We propose a new reputation-based ranking system, utilizing multipartite ratingworks.
We study its resistance to bribery and how to design optimal bribing strategies.
arXiv Detail & Related papers (2020-04-13T22:28:29Z) - PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative
Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems.
Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective.
We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.