FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
Act Flows
- URL: http://arxiv.org/abs/2202.06633v1
- Date: Mon, 14 Feb 2022 11:37:20 GMT
- Title: FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
Act Flows
- Authors: Jianqiao Zhao, Yanyang Li, Wanyu Du, Yangfeng Ji, Dong Yu, Michael R.
Lyu, Liwei Wang
- Abstract summary: We propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.
To utilize segment act flows, sequences of segment acts, for evaluation, we develop the first consensus-based dialogue evaluation framework, FlowEval.
- Score: 63.116280145770006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent progress in open-domain dialogue evaluation, how to develop
automatic metrics remains an open problem. We explore the potential of dialogue
evaluation featuring dialog act information, which was hardly explicitly
modeled in previous methods. However, defined at the utterance level in
general, dialog act is of coarse granularity, as an utterance can contain
multiple segments possessing different functions. Hence, we propose segment
act, an extension of dialog act from utterance level to segment level, and
crowdsource a large-scale dataset for it. To utilize segment act flows,
sequences of segment acts, for evaluation, we develop the first consensus-based
dialogue evaluation framework, FlowEval. This framework provides a
reference-free approach for dialog evaluation by finding pseudo-references.
Extensive experiments against strong baselines on three benchmark datasets
demonstrate the effectiveness and other desirable characteristics of our
FlowEval, pointing out a potential path for better dialogue evaluation.
Related papers
- DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue
Assessment [38.26039323208791]
We release a large-scale dialogue quality assessment dataset (DiQAD) for automatically assessing open-domain dialogue quality.
Specifically, we establish the assessment criteria based on the dimensions conforming to human judgements on dialogue qualities.
We also annotate large-scale dialogues that conversed between real users based on these annotation criteria, which contains around 100,000 dialogues.
arXiv Detail & Related papers (2023-10-25T03:04:57Z) - Toward More Accurate and Generalizable Evaluation Metrics for
Task-Oriented Dialogs [19.43845920149182]
We introduce a new dialog-level annotation workflow called Dialog Quality.
DQA expert annotators evaluate the quality of dialogs as a whole, and also label dialogs for attributes such as goal completion and user sentiment.
We argue that having high-quality human-annotated data is an important component of evaluating interaction quality for large industrial-scale voice assistant platforms.
arXiv Detail & Related papers (2023-06-06T19:43:29Z) - SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance
Representation [51.22712675266523]
Dialogue Topic (DTS) plays an essential role in a variety of dialogue modeling tasks.
We propose a novel unsupervised DTS framework, which learns topic-aware utterance representations from unlabeled dialogue data.
arXiv Detail & Related papers (2023-05-04T11:35:23Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Improving Multi-Party Dialogue Discourse Parsing via Domain Integration [25.805553277418813]
Multi-party conversations are implicitly organized by semantic level correlations across the interactive turns.
dialogue discourse analysis can be applied to predict the dependency structure and relations between the elementary discourse units.
Existing corpora with dialogue discourse annotation are collected from specific domains with limited sample sizes.
arXiv Detail & Related papers (2021-10-09T09:36:22Z) - DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings [33.89889949577356]
We propose DialogueCSE, a dialogue-based contrastive learning approach to tackle this issue.
We evaluate our model on three multi-turn dialogue datasets: the Microsoft Dialogue Corpus, the Jing Dong Dialogue Corpus, and the E-commerce Dialogue Corpus.
arXiv Detail & Related papers (2021-09-26T13:25:41Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.