Evaluating the Robustness of Conversational Recommender Systems by
Adversarial Examples
- URL: http://arxiv.org/abs/2303.05575v1
- Date: Thu, 9 Mar 2023 20:51:18 GMT
- Title: Evaluating the Robustness of Conversational Recommender Systems by
Adversarial Examples
- Authors: Ali Montazeralghaem and James Allan
- Abstract summary: We propose an adversarial evaluation scheme including four scenarios in two categories.
We generate adversarial examples to evaluate the robustness of these systems in the face of different input data.
Our results show that none of these systems are robust and reliable to the adversarial examples.
- Score: 16.49836195831763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversational recommender systems (CRSs) are improving rapidly, according to
the standard recommendation accuracy metrics. However, it is essential to make
sure that these systems are robust in interacting with users including regular
and malicious users who want to attack the system by feeding the system
modified input data. In this paper, we propose an adversarial evaluation scheme
including four scenarios in two categories and automatically generate
adversarial examples to evaluate the robustness of these systems in the face of
different input data. By executing these adversarial examples we can compare
the ability of different conversational recommender systems to satisfy the
user's preferences. We evaluate three CRSs by the proposed adversarial examples
on two datasets. Our results show that none of these systems are robust and
reliable to the adversarial examples.
Related papers
- System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes [80.97898201876592]
We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process.
We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility.
arXiv Detail & Related papers (2024-05-29T18:19:37Z) - User-Controllable Recommendation via Counterfactual Retrospective and
Prospective Explanations [96.45414741693119]
We present a user-controllable recommender system that seamlessly integrates explainability and controllability.
By providing both retrospective and prospective explanations through counterfactual reasoning, users can customize their control over the system.
arXiv Detail & Related papers (2023-08-02T01:13:36Z) - Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation [17.41434948048325]
We conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios.
Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system.
We discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities.
arXiv Detail & Related papers (2023-05-23T09:24:53Z) - Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior.
We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference.
We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z) - Re-Examining System-Level Correlations of Automatic Summarization
Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.
We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z) - To Recommend or Not? A Model-Based Comparison of Item-Matching Processes [7.636113901205644]
recommender systems are central to modern online platforms, but a popular concern is that they may be pulling society in dangerous directions.
We take a model-based approach to this challenge, introducing a dichotomy of process models that we can compare.
Our key finding is that the recommender and organic models result in dramatically different outcomes at both the individual and societal level.
arXiv Detail & Related papers (2021-10-21T20:37:56Z) - Membership Inference Attacks Against Recommender Systems [33.66394989281801]
We make the first attempt on quantifying the privacy leakage of recommender systems through the lens of membership inference.
Our attack is on the user-level but not on the data sample-level.
A shadow recommender is established to derive the labeled training data for training the attack model.
arXiv Detail & Related papers (2021-09-16T15:19:19Z) - Correcting the User Feedback-Loop Bias for Recommendation Systems [34.44834423714441]
We propose a systematic and dynamic way to correct user feedback-loop bias in recommendation systems.
Our method includes a deep-learning component to learn each user's dynamic rating history embedding.
We empirically validated the existence of such user feedback-loop bias in real world recommendation systems.
arXiv Detail & Related papers (2021-09-13T15:02:55Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - A Robust Reputation-based Group Ranking System and its Resistance to
Bribery [8.300507994596416]
We propose a new reputation-based ranking system, utilizing multipartite ratingworks.
We study its resistance to bribery and how to design optimal bribing strategies.
arXiv Detail & Related papers (2020-04-13T22:28:29Z) - PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative
Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems.
Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective.
We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.