Related papers: GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

URL: http://arxiv.org/abs/2010.03994v1
Date: Thu, 8 Oct 2020 14:07:32 GMT
Title: GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Authors: Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang
Abstract summary: We propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation. Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence. Experimental results show that our GRADE significantly outperforms other state-of-the-art metrics on measuring diverse dialogue models.
Score: 133.13117064357425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatically evaluating dialogue coherence is a challenging but high-demand ability for developing high-quality open-domain dialogue systems. However, current evaluation metrics consider only surface features or utterance-level semantics, without explicitly considering the fine-grained topic transition dynamics of dialogue flows. Here, we first consider that the graph structure constituted with topics in a dialogue can accurately depict the underlying communication logic, which is a more natural way to produce persuasive metrics. Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation. Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence. The graph representations are obtained by reasoning over topic-level dialogue graphs enhanced with the evidence from a commonsense graph, including k-hop neighboring representations and hop-attention weights. Experimental results show that our GRADE significantly outperforms other state-of-the-art metrics on measuring diverse dialogue models in terms of the Pearson and Spearman correlations with human judgements. Besides, we release a new large-scale human evaluation benchmark to facilitate future research on automatic metrics.

Related papers

WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech. Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z)
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems. Previous studies suggest using only a portion of the dialogue context in the annotation process. This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z)
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction [2.3590037806133024]
This paper presents a novel conversational model for human--robot interaction that rests upon a graph-based representation of the dialogue state. The neural conversational model employed to respond to user utterances relies on a simple but effective graph-to-text mechanism. The proposed approach is empirically evaluated through a user study with a humanoid robot.
arXiv Detail & Related papers (2023-11-03T15:44:28Z)
GraphWOZ: Dialogue Management with Conversational Knowledge Graphs [2.938377447673471]
We present a new approach to dialogue management using conversational knowledge graphs as core representation of the dialogue state. We introduce a new dataset, GraphWOZ, which comprises Wizard-of-Oz dialogues in which human participants interact with a robot acting as a receptionist.
arXiv Detail & Related papers (2022-11-23T10:53:21Z)
DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework. It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z)
Discovering Dialog Structure Graph for Open-Domain Dialog Generation [51.29286279366361]
We conduct unsupervised discovery of dialog structure from chitchat corpora. We then leverage it to facilitate dialog generation in downstream systems. We present a Discrete Variational Auto-Encoder with Graph Neural Network (DVAE-GNN), to discover a unified human-readable dialog structure.
arXiv Detail & Related papers (2020-12-31T10:58:37Z)
Dialogue Relation Extraction with Document-level Heterogeneous Graph Attention Networks [21.409522845011907]
Dialogue relation extraction (DRE) aims to detect the relation between two entities mentioned in a multi-party dialogue. We present a graph attention network-based method for DRE where a graph contains meaningfully connected speaker, entity, entity-type, and utterance nodes. We empirically show that this graph-based approach quite effectively captures the relations between different entity pairs in a dialogue as it outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-10T18:51:48Z)
Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.