Generate, Evaluate, and Select: A Dialogue System with a Response
Evaluator for Diversity-Aware Response Generation
- URL: http://arxiv.org/abs/2206.04937v1
- Date: Fri, 10 Jun 2022 08:22:22 GMT
- Title: Generate, Evaluate, and Select: A Dialogue System with a Response
Evaluator for Diversity-Aware Response Generation
- Authors: Ryoma Sakaeda, Daisuke Kawahara
- Abstract summary: We aim to overcome the lack of diversity in responses of current dialogue systems.
We propose a generator-evaluator model that evaluates multiple responses generated by a response generator.
We conduct human evaluations to compare the output of the proposed system with that of a baseline system.
- Score: 9.247397520986999
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We aim to overcome the lack of diversity in responses of current dialogue
systems and to develop a dialogue system that is engaging as a conversational
partner. We propose a generator-evaluator model that evaluates multiple
responses generated by a response generator and selects the best response by an
evaluator. By generating multiple responses, we obtain diverse responses. We
conduct human evaluations to compare the output of the proposed system with
that of a baseline system. The results of the human evaluations showed that the
proposed system's responses were often judged to be better than the baseline
system's, and indicated the effectiveness of the proposed method.
Related papers
- PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - EM Pre-training for Multi-party Dialogue Response Generation [86.25289241604199]
In multi-party dialogues, the addressee of a response utterance should be specified before it is generated.
We propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels.
arXiv Detail & Related papers (2023-05-21T09:22:41Z) - Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue
Systems [71.33737787564966]
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap'
We propose a reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system.
Our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results.
arXiv Detail & Related papers (2022-11-07T15:59:49Z) - Measuring and Improving Semantic Diversity of Dialogue Generation [21.59385143783728]
We introduce a new automatic evaluation metric to measure the semantic diversity of generated responses.
We show that our proposed metric captures human judgments on response diversity better than existing lexical-level diversity metrics.
We also propose a simple yet effective learning method that improves the semantic diversity of generated responses.
arXiv Detail & Related papers (2022-10-11T18:36:54Z) - Assessing Dialogue Systems with Distribution Distances [48.61159795472962]
We propose to measure the performance of a dialogue system by computing the distribution-wise distance between its generated conversations and real-world conversations.
Experiments on several dialogue corpora show that our proposed metrics correlate better with human judgments than existing metrics.
arXiv Detail & Related papers (2021-05-06T10:30:13Z) - Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical
Analysis of System-wise Evaluation [114.48767388174218]
This paper presents an empirical analysis on different types of dialog systems composed of different modules in different settings.
Our results show that a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels.
arXiv Detail & Related papers (2020-05-15T05:20:06Z) - Weakly-Supervised Neural Response Selection from an Ensemble of
Task-Specialised Dialogue Agents [11.21333474984984]
We model the problem of selecting the best response from a set of responses generated by a heterogeneous set of dialogue agents.
The proposed method is trained to predict a coherent set of responses within a single conversation, considering its own predictions via a curriculum training mechanism.
arXiv Detail & Related papers (2020-05-06T18:40:26Z) - Evaluating Dialogue Generation Systems via Response Selection [42.56640173047927]
We propose a method to construct response selection test sets with well-chosen false candidates.
We demonstrate that evaluating systems via response selection with the test sets developed by our method correlates more strongly with human evaluation.
arXiv Detail & Related papers (2020-04-29T16:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.