Speaker Sensitive Response Evaluation Model
- URL: http://arxiv.org/abs/2006.07015v1
- Date: Fri, 12 Jun 2020 08:59:10 GMT
- Title: Speaker Sensitive Response Evaluation Model
- Authors: JinYeong Bak, Alice Oh
- Abstract summary: We propose an automatic evaluation model based on the similarity of the generated response with the conversational context.
We learn the model parameters from an unlabeled conversation corpus.
We show that our model can be applied to movie dialogues without any additional training.
- Score: 17.381658875470638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic evaluation of open-domain dialogue response generation is very
challenging because there are many appropriate responses for a given context.
Existing evaluation models merely compare the generated response with the
ground truth response and rate many of the appropriate responses as
inappropriate if they deviate from the ground truth. One approach to resolve
this problem is to consider the similarity of the generated response with the
conversational context. In this paper, we propose an automatic evaluation model
based on that idea and learn the model parameters from an unlabeled
conversation corpus. Our approach considers the speakers in defining the
different levels of similar context. We use a Twitter conversation corpus that
contains many speakers and conversations to test our evaluation model.
Experiments show that our model outperforms the other existing evaluation
metrics in terms of high correlation with human annotation scores. We also show
that our model trained on Twitter can be applied to movie dialogues without any
additional training. We provide our code and the learned parameters so that
they can be used for automatic evaluation of dialogue response generation
models.
Related papers
- Automatic Evaluation of Speaker Similarity [0.0]
We introduce a new automatic evaluation method for speaker similarity assessment, consistent with human perceptual scores.
Our experiments show that we can train a model to predict speaker similarity MUSHRA scores from speaker embeddings with 0.96 accuracy and significant correlation up to 0.78 Pearson score at the utterance level.
arXiv Detail & Related papers (2022-07-01T11:23:16Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
Evaluation [73.03318027164605]
We propose to use information that can be automatically extracted from the next user utterance as a proxy to measure the quality of the previous system response.
Our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.
arXiv Detail & Related papers (2022-03-25T22:09:52Z) - User Response and Sentiment Prediction for Automatic Dialogue Evaluation [69.11124655437902]
We propose to use the sentiment of the next user utterance for turn or dialog level evaluation.
Experiments show our model outperforming existing automatic evaluation metrics on both written and spoken open-domain dialogue datasets.
arXiv Detail & Related papers (2021-11-16T22:19:17Z) - Generating Dialogue Responses from a Semantic Latent Space [75.18449428414736]
We propose an alternative to the end-to-end classification on vocabulary.
We learn the pair relationship between the prompts and responses as a regression task on a latent space.
Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative.
arXiv Detail & Related papers (2020-10-04T19:06:16Z) - The Adapter-Bot: All-In-One Controllable Conversational Model [66.48164003532484]
We propose a dialogue model that uses a fixed backbone model such as DialGPT and triggers on-demand dialogue skills via different adapters.
Depending on the skills, the model is able to process multiple knowledge types, such as text, tables, and emphatic responses.
We evaluate our model using automatic evaluation by comparing it with existing state-of-the-art conversational models.
arXiv Detail & Related papers (2020-08-28T10:59:31Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z) - Designing Precise and Robust Dialogue Response Evaluators [35.137244385158034]
We propose to build a reference-free evaluator and exploit the power of semi-supervised training and pretrained language models.
Experimental results demonstrate that the proposed evaluator achieves a strong correlation (> 0.6) with human judgement.
arXiv Detail & Related papers (2020-04-10T04:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.