Personalized Chatbot Trustworthiness Ratings
- URL: http://arxiv.org/abs/2005.10067v2
- Date: Sat, 10 Oct 2020 01:13:36 GMT
- Title: Personalized Chatbot Trustworthiness Ratings
- Authors: Biplav Srivastava and Francesca Rossi and Sheema Usmani and and
Mariana Bernagozzi
- Abstract summary: We envision a personalized rating methodology for chatbots that relies on separate rating modules for each issue.
The method is independent of the specific trust issues and is parametric to the aggregation procedure.
- Score: 19.537492400265577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversation agents, commonly referred to as chatbots, are increasingly
deployed in many domains to allow people to have a natural interaction while
trying to solve a specific problem. Given their widespread use, it is important
to provide their users with methods and tools to increase users awareness of
various properties of the chatbots, including non-functional properties that
users may consider important in order to trust a specific chatbot. For example,
users may want to use chatbots that are not biased, that do not use abusive
language, that do not leak information to other users, and that respond in a
style which is appropriate for the user's cognitive level.
In this paper, we address the setting where a chatbot cannot be modified, its
training data cannot be accessed, and yet a neutral party wants to assess and
communicate its trustworthiness to a user, tailored to the user's priorities
over the various trust issues. Such a rating can help users choose among
alternative chatbots, developers test their systems, business leaders price
their offering, and regulators set policies. We envision a personalized rating
methodology for chatbots that relies on separate rating modules for each issue,
and users' detected priority orderings among the relevant trust issues, to
generate an aggregate personalized rating for the trustworthiness of a chatbot.
The method is independent of the specific trust issues and is parametric to the
aggregation procedure, thereby allowing for seamless generalization. We
illustrate its general use, integrate it with a live chatbot, and evaluate it
on four dialog datasets and representative user profiles, validated with user
surveys.
Related papers
- First-Person Fairness in Chatbots [13.787745105316043]
We study "first-person fairness," which means fairness toward the user.
This includes providing high-quality responses to all users regardless of their identity or background.
We propose a scalable, privacy-preserving method for evaluating one aspect of first-person fairness.
arXiv Detail & Related papers (2024-10-16T17:59:47Z) - Are LLM-based methods good enough for detecting unfair terms of service? [67.49487557224415]
Large language models (LLMs) are good at parsing long text-based documents.
We build a dataset consisting of 12 questions applied individually to a set of privacy policies.
Some open-source models are able to provide a higher accuracy compared to some commercial models.
arXiv Detail & Related papers (2024-08-24T09:26:59Z) - Evaluating Chatbots to Promote Users' Trust -- Practices and Open
Problems [11.427175278545517]
This paper reviews current practices for testing chatbots.
It identifies gaps as open problems in pursuit of user trust.
It outlines a path forward to mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society.
arXiv Detail & Related papers (2023-09-09T22:40:30Z) - Rewarding Chatbots for Real-World Engagement with Millions of Users [1.2583983802175422]
This work investigates the development of social chatbots that prioritize user engagement to enhance retention.
The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses.
A/B testing on groups of 10,000 new dailychat users on the Chai Research platform shows that this approach increases the MCL by up to 70%.
Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
arXiv Detail & Related papers (2023-03-10T18:53:52Z) - Understanding How People Rate Their Conversations [73.17730062864314]
We conduct a study to better understand how people rate their interactions with conversational agents.
We focus on agreeableness and extraversion as variables that may explain variation in ratings.
arXiv Detail & Related papers (2022-06-01T00:45:32Z) - What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
Evaluation [73.03318027164605]
We propose to use information that can be automatically extracted from the next user utterance as a proxy to measure the quality of the previous system response.
Our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.
arXiv Detail & Related papers (2022-03-25T22:09:52Z) - A Deep Learning Approach to Integrate Human-Level Understanding in a
Chatbot [0.4632366780742501]
Unlike humans, chatbots can serve multiple customers at a time, are available 24/7 and reply in less than a fraction of a second.
We performed sentiment analysis, emotion detection, intent classification and named-entity recognition using deep learning to develop chatbots with humanistic understanding and intelligence.
arXiv Detail & Related papers (2021-12-31T22:26:41Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Pchatbot: A Large-Scale Dataset for Personalized Chatbot [49.16746174238548]
We introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively.
To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization.
The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models.
arXiv Detail & Related papers (2020-09-28T12:49:07Z) - Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for
Automatic Dialog Evaluation [69.03658685761538]
Open Domain dialog system evaluation is one of the most important challenges in dialog research.
We propose an automatic evaluation model CMADE that automatically cleans self-reported user ratings as it trains on them.
Our experiments show that CMADE achieves 89.2% accuracy in the dialog comparison task.
arXiv Detail & Related papers (2020-05-21T15:14:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.