What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
Evaluation
- URL: http://arxiv.org/abs/2203.13927v1
- Date: Fri, 25 Mar 2022 22:09:52 GMT
- Title: What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
Evaluation
- Authors: Sarik Ghazarian, Behnam Hedayatnia, Alexandros Papangelis, Yang Liu,
Dilek Hakkani-Tur
- Abstract summary: We propose to use information that can be automatically extracted from the next user utterance as a proxy to measure the quality of the previous system response.
Our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.
- Score: 73.03318027164605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate automatic evaluation metrics for open-domain dialogs are in high
demand. Existing model-based metrics for system response evaluation are trained
on human annotated data, which is cumbersome to collect. In this work, we
propose to use information that can be automatically extracted from the next
user utterance, such as its sentiment or whether the user explicitly ends the
conversation, as a proxy to measure the quality of the previous system
response. This allows us to train on a massive set of dialogs with weak
supervision, without requiring manual system turn quality annotations.
Experiments show that our model is comparable to models trained on human
annotated data. Furthermore, our model generalizes across both spoken and
written open-domain dialog corpora collected from real and paid users.
Related papers
- GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - User Response and Sentiment Prediction for Automatic Dialogue Evaluation [69.11124655437902]
We propose to use the sentiment of the next user utterance for turn or dialog level evaluation.
Experiments show our model outperforming existing automatic evaluation metrics on both written and spoken open-domain dialogue datasets.
arXiv Detail & Related papers (2021-11-16T22:19:17Z) - How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for
Token-level Evaluation Metrics [47.20761880464552]
generative dialogue modeling is widely seen as a language modeling task.
The task demands an agent to have a complex natural language understanding of its input text to carry a meaningful interaction with an user.
The automatic metrics used evaluate the quality of the generated text as a proxy to the holistic interaction of the agent.
arXiv Detail & Related papers (2020-08-24T13:28:35Z) - Speaker Sensitive Response Evaluation Model [17.381658875470638]
We propose an automatic evaluation model based on the similarity of the generated response with the conversational context.
We learn the model parameters from an unlabeled conversation corpus.
We show that our model can be applied to movie dialogues without any additional training.
arXiv Detail & Related papers (2020-06-12T08:59:10Z) - Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for
Automatic Dialog Evaluation [69.03658685761538]
Open Domain dialog system evaluation is one of the most important challenges in dialog research.
We propose an automatic evaluation model CMADE that automatically cleans self-reported user ratings as it trains on them.
Our experiments show that CMADE achieves 89.2% accuracy in the dialog comparison task.
arXiv Detail & Related papers (2020-05-21T15:14:49Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.