Addressing Inquiries about History: An Efficient and Practical Framework
for Evaluating Open-domain Chatbot Consistency
- URL: http://arxiv.org/abs/2106.02228v1
- Date: Fri, 4 Jun 2021 03:04:13 GMT
- Title: Addressing Inquiries about History: An Efficient and Practical Framework
for Evaluating Open-domain Chatbot Consistency
- Authors: Zekang Li, Jinchao Zhang, Zhengcong Fei, Yang Feng, Jie Zhou
- Abstract summary: We propose the Addressing Inquiries about History (AIH) framework for the consistency evaluation.
At the conversation stage, AIH attempts to address appropriate inquiries about the dialogue history to induce the chatbots to redeclare the historical facts or opinions.
At the contradiction recognition stage, we can either employ human judges or a natural language inference (NLI) model to recognize whether the answers to the inquiries are contradictory with history.
- Score: 28.255324166852535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A good open-domain chatbot should avoid presenting contradictory responses
about facts or opinions in a conversational session, known as its consistency
capacity. However, evaluating the consistency capacity of a chatbot is still
challenging. Employing human judges to interact with chatbots on purpose to
check their capacities is costly and low-efficient, and difficult to get rid of
subjective bias. In this paper, we propose the Addressing Inquiries about
History (AIH), an efficient and practical framework for the consistency
evaluation. At the conversation stage, AIH attempts to address appropriate
inquiries about the dialogue history to induce the chatbot to redeclare the
historical facts or opinions. We carry out the conversation between chatbots,
which is more efficient than the human-bot interaction and can also alleviate
the subjective bias. In this way, we manage to rapidly obtain a dialog session
that contains responses with high contradiction possibilities. At the
contradiction recognition stage, we can either employ human judges or a natural
language inference (NLI) model to recognize whether the answers to the
inquiries are contradictory with history. Finally, we are able to rank chatbots
according to the contradiction statistics. Experiments on open-domain chatbots
show that our approach can efficiently and reliably assess the consistency
capacity of chatbots and achieve a high ranking correlation with the human
evaluation. We release the framework and hope to help improve the consistency
capacity of chatbots. \footnote{\url{https://github.com/ictnlp/AIH}}
Related papers
- Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - CDConv: A Benchmark for Contradiction Detection in Chinese Conversations [74.78715797366395]
We propose a benchmark for Contradiction Detection in Chinese Conversations, namely CDConv.
It contains 12K multi-turn conversations annotated with three typical contradiction categories: Intra-sentence Contradiction, Role Confusion, and History Contradiction.
arXiv Detail & Related papers (2022-10-16T11:37:09Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - A Deep Learning Approach to Integrate Human-Level Understanding in a
Chatbot [0.4632366780742501]
Unlike humans, chatbots can serve multiple customers at a time, are available 24/7 and reply in less than a fraction of a second.
We performed sentiment analysis, emotion detection, intent classification and named-entity recognition using deep learning to develop chatbots with humanistic understanding and intelligence.
arXiv Detail & Related papers (2021-12-31T22:26:41Z) - Evaluator for Emotionally Consistent Chatbots [2.8348950186890467]
The most recent work only evaluates on the aspects of context coherence, language fluency, response diversity, or logical self-consistency between dialogues.
This work proposes training an evaluator to determine the emotional consistency of chatbots.
arXiv Detail & Related papers (2021-12-02T21:47:29Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Spot The Bot: A Robust and Efficient Framework for the Evaluation of
Conversational Dialogue Systems [21.36935947626793]
emphSpot The Bot replaces human-bot conversations with conversations between bots.
Human judges only annotate for each entity in a conversation whether they think it is human or not.
emphSurvival Analysis measures which bot can uphold human-like behavior the longest.
arXiv Detail & Related papers (2020-10-05T16:37:52Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z) - If I Hear You Correctly: Building and Evaluating Interview Chatbots with
Active Listening Skills [4.395837214164745]
It is challenging to build effective interview chatbots that can handle user free-text responses to open-ended questions.
We are investigating the feasibility and effectiveness of using publicly available, practical AI technologies to build effective interview chatbots.
arXiv Detail & Related papers (2020-02-05T16:52:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.