Related papers: Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information

Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information

URL: http://arxiv.org/abs/2305.16967v3
Date: Sat, 10 Jun 2023 13:23:41 GMT
Title: Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information
Authors: Kun Zhao, Bohao Yang, Chenghua Lin, Wenge Rong, Aline Villavicencio and Xiaohui Cui
Abstract summary: We propose a novel learning-based automatic evaluation metric (CMN) for open-domain dialogues. We employ Conditional Variational Autoencoders (CVAEs) with a Next Sentence Prediction (NSP) objective and employing Mutual Information (MI) to model the semantic similarity of text in the latent space. Experimental results on two open-domain dialogue datasets demonstrate the superiority of our method compared with a wide range of baselines.
Score: 18.859159491548006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The long-standing one-to-many issue of the open-domain dialogues poses significant challenges for automatic evaluation methods, i.e., there may be multiple suitable responses which differ in semantics for a given conversational context. To tackle this challenge, we propose a novel learning-based automatic evaluation metric (CMN), which can robustly evaluate open-domain dialogues by augmenting Conditional Variational Autoencoders (CVAEs) with a Next Sentence Prediction (NSP) objective and employing Mutual Information (MI) to model the semantic similarity of text in the latent space. Experimental results on two open-domain dialogue datasets demonstrate the superiority of our method compared with a wide range of baselines, especially in handling responses which are distant to the golden reference responses in semantics.

Related papers

SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation [23.203761925540736]
We propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation) Our approach achieves state-of-the-art performance in both the classification and evaluation tasks, and additionally the SLIDE exhibits better correlation with human evaluators.
arXiv Detail & Related papers (2024-05-24T20:32:49Z)
Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues [34.78482218571574]
We propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference. Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.
arXiv Detail & Related papers (2022-10-30T13:26:49Z)
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog. We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups. A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z)
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows [63.116280145770006]
We propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it. To utilize segment act flows, sequences of segment acts, for evaluation, we develop the first consensus-based dialogue evaluation framework, FlowEval.
arXiv Detail & Related papers (2022-02-14T11:37:20Z)
User Response and Sentiment Prediction for Automatic Dialogue Evaluation [69.11124655437902]
We propose to use the sentiment of the next user utterance for turn or dialog level evaluation. Experiments show our model outperforming existing automatic evaluation metrics on both written and spoken open-domain dialogue datasets.
arXiv Detail & Related papers (2021-11-16T22:19:17Z)
DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings [33.89889949577356]
We propose DialogueCSE, a dialogue-based contrastive learning approach to tackle this issue. We evaluate our model on three multi-turn dialogue datasets: the Microsoft Dialogue Corpus, the Jing Dong Dialogue Corpus, and the E-commerce Dialogue Corpus.
arXiv Detail & Related papers (2021-09-26T13:25:41Z)
Semantic-Enhanced Explainable Finetuning for Open-Domain Dialogues [33.50099424582726]
We propose to combine pretrained language models with the modular dialogue paradigm for open-domain dialogue modeling. Our method, semantic-enhanced finetuning, instantiates conversation understanding, planning, and response generation as a language model finetuning task.
arXiv Detail & Related papers (2021-06-06T09:03:41Z)
Meta Dialogue Policy Learning [58.045067703675095]
We propose Deep Transferable Q-Network (DTQN) to utilize shareable low-level signals between domains. We decompose the state and action representation space into feature subspaces corresponding to these low-level components. In experiments, our model outperforms baseline models in terms of both success rate and dialogue efficiency.
arXiv Detail & Related papers (2020-06-03T23:53:06Z)
Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
Non-Autoregressive Dialog State Tracking [122.2328875457225]
We propose a novel framework of Non-Autoregressive Dialog State Tracking (NADST) NADST can factor in potential dependencies among domains and slots to optimize the models towards better prediction of dialogue states as a complete set rather than separate slots. Our results show that our model achieves the state-of-the-art joint accuracy across all domains on the MultiWOZ 2.1 corpus.
arXiv Detail & Related papers (2020-02-19T06:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.