Evaluator for Emotionally Consistent Chatbots
- URL: http://arxiv.org/abs/2112.01616v1
- Date: Thu, 2 Dec 2021 21:47:29 GMT
- Title: Evaluator for Emotionally Consistent Chatbots
- Authors: Chenxiao Liu, Guanzhi Deng, Tao Ji, Difei Tang, Silai Zheng
- Abstract summary: The most recent work only evaluates on the aspects of context coherence, language fluency, response diversity, or logical self-consistency between dialogues.
This work proposes training an evaluator to determine the emotional consistency of chatbots.
- Score: 2.8348950186890467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One challenge for evaluating current sequence- or dialogue-level chatbots,
such as Empathetic Open-domain Conversation Models, is to determine whether the
chatbot performs in an emotionally consistent way. The most recent work only
evaluates on the aspects of context coherence, language fluency, response
diversity, or logical self-consistency between dialogues. This work proposes
training an evaluator to determine the emotional consistency of chatbots.
Related papers
- Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots [8.763670548363443]
A new attack, toxicbot, is developed to generate toxic responses in a multi-turn conversation.
toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue.
arXiv Detail & Related papers (2023-07-14T03:58:42Z) - Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing.
We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z) - EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments [75.11753644302385]
Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner.
We propose a method based on a transformer pretrained language model (T5)
We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation.
arXiv Detail & Related papers (2021-10-30T19:04:48Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Addressing Inquiries about History: An Efficient and Practical Framework
for Evaluating Open-domain Chatbot Consistency [28.255324166852535]
We propose the Addressing Inquiries about History (AIH) framework for the consistency evaluation.
At the conversation stage, AIH attempts to address appropriate inquiries about the dialogue history to induce the chatbots to redeclare the historical facts or opinions.
At the contradiction recognition stage, we can either employ human judges or a natural language inference (NLI) model to recognize whether the answers to the inquiries are contradictory with history.
arXiv Detail & Related papers (2021-06-04T03:04:13Z) - DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework.
It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue.
Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Spot The Bot: A Robust and Efficient Framework for the Evaluation of
Conversational Dialogue Systems [21.36935947626793]
emphSpot The Bot replaces human-bot conversations with conversations between bots.
Human judges only annotate for each entity in a conversation whether they think it is human or not.
emphSurvival Analysis measures which bot can uphold human-like behavior the longest.
arXiv Detail & Related papers (2020-10-05T16:37:52Z) - If I Hear You Correctly: Building and Evaluating Interview Chatbots with
Active Listening Skills [4.395837214164745]
It is challenging to build effective interview chatbots that can handle user free-text responses to open-ended questions.
We are investigating the feasibility and effectiveness of using publicly available, practical AI technologies to build effective interview chatbots.
arXiv Detail & Related papers (2020-02-05T16:52:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.