Related papers: Evaluating the Robustness of Conversational Recommender Systems by Adversarial Examples

Evaluating the Robustness of Conversational Recommender Systems by Adversarial Examples

URL: http://arxiv.org/abs/2303.05575v1
Date: Thu, 9 Mar 2023 20:51:18 GMT
Title: Evaluating the Robustness of Conversational Recommender Systems by Adversarial Examples
Authors: Ali Montazeralghaem and James Allan
Abstract summary: We propose an adversarial evaluation scheme including four scenarios in two categories. We generate adversarial examples to evaluate the robustness of these systems in the face of different input data. Our results show that none of these systems are robust and reliable to the adversarial examples.
Score: 16.49836195831763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conversational recommender systems (CRSs) are improving rapidly, according to the standard recommendation accuracy metrics. However, it is essential to make sure that these systems are robust in interacting with users including regular and malicious users who want to attack the system by feeding the system modified input data. In this paper, we propose an adversarial evaluation scheme including four scenarios in two categories and automatically generate adversarial examples to evaluate the robustness of these systems in the face of different input data. By executing these adversarial examples we can compare the ability of different conversational recommender systems to satisfy the user's preferences. We evaluate three CRSs by the proposed adversarial examples on two datasets. Our results show that none of these systems are robust and reliable to the adversarial examples.

Related papers

A Unified Causal Framework for Auditing Recommender Systems for Ethical Concerns [40.793466500324904]
We view recommender system auditing from a causal lens and provide a general recipe for defining auditing metrics. Under this general causal auditing framework, we categorize existing auditing metrics and identify gaps in them. We propose two classes of such metrics:future- and past-reacheability and stability, that measure the ability of a user to influence their own and other users' recommendations.
arXiv Detail & Related papers (2024-09-20T04:37:36Z)
Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method [60.364834418531366]
We propose five new evaluation metrics that comprehensively and accurately assess the performance of RRS. We formulate the RRS from a causal perspective, formulating recommendations as bilateral interventions. We introduce a reranking strategy to maximize matching outcomes, as measured by the proposed metrics.
arXiv Detail & Related papers (2024-08-19T07:21:02Z)
System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes [80.97898201876592]
We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process. We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility.
arXiv Detail & Related papers (2024-05-29T18:19:37Z)
User-Controllable Recommendation via Counterfactual Retrospective and Prospective Explanations [96.45414741693119]
We present a user-controllable recommender system that seamlessly integrates explainability and controllability. By providing both retrospective and prospective explanations through counterfactual reasoning, users can customize their control over the system.
arXiv Detail & Related papers (2023-08-02T01:13:36Z)
Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation [17.41434948048325]
We conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios. Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system. We discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities.
arXiv Detail & Related papers (2023-05-23T09:24:53Z)
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations. We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z)
Membership Inference Attacks Against Recommender Systems [33.66394989281801]
We make the first attempt on quantifying the privacy leakage of recommender systems through the lens of membership inference. Our attack is on the user-level but not on the data sample-level. A shadow recommender is established to derive the labeled training data for training the attack model.
arXiv Detail & Related papers (2021-09-16T15:19:19Z)
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
A Robust Reputation-based Group Ranking System and its Resistance to Bribery [8.300507994596416]
We propose a new reputation-based ranking system, utilizing multipartite ratingworks. We study its resistance to bribery and how to design optimal bribing strategies.
arXiv Detail & Related papers (2020-04-13T22:28:29Z)
PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems. Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective. We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.