Joint Turn and Dialogue level User Satisfaction Estimation on
Multi-Domain Conversations
- URL: http://arxiv.org/abs/2010.02495v2
- Date: Thu, 8 Oct 2020 21:10:47 GMT
- Title: Joint Turn and Dialogue level User Satisfaction Estimation on
Multi-Domain Conversations
- Authors: Praveen Kumar Bodigutla, Aditya Tiwari, Josep Valls Vargas, Lazaros
Polymenakos, Spyros Matsoukas
- Abstract summary: Current automated methods to estimate turn and dialogue level user satisfaction employ hand-crafted features.
We propose a novel user satisfaction estimation approach which minimizes an adaptive multi-task loss function.
The BiLSTM based deep neural net model automatically weighs each turn's contribution towards the estimated dialogue-level rating.
- Score: 6.129731338249762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue level quality estimation is vital for optimizing data driven
dialogue management. Current automated methods to estimate turn and dialogue
level user satisfaction employ hand-crafted features and rely on complex
annotation schemes, which reduce the generalizability of the trained models. We
propose a novel user satisfaction estimation approach which minimizes an
adaptive multi-task loss function in order to jointly predict turn-level
Response Quality labels provided by experts and explicit dialogue-level ratings
provided by end users. The proposed BiLSTM based deep neural net model
automatically weighs each turn's contribution towards the estimated
dialogue-level rating, implicitly encodes temporal dependencies, and removes
the need to hand-craft features.
On dialogues sampled from 28 Alexa domains, two dialogue systems and three
user groups, the joint dialogue-level satisfaction estimation model achieved up
to an absolute 27% (0.43->0.70) and 7% (0.63->0.70) improvement in linear
correlation performance over baseline deep neural net and benchmark Gradient
boosting regression models, respectively.
Related papers
- CAUSE: Counterfactual Assessment of User Satisfaction Estimation in Task-Oriented Dialogue Systems [60.27663010453209]
We leverage large language models (LLMs) to generate satisfaction-aware counterfactual dialogues.
We gather human annotations to ensure the reliability of the generated samples.
Our results shed light on the need for data augmentation approaches for user satisfaction estimation in TOD systems.
arXiv Detail & Related papers (2024-03-27T23:45:31Z) - Toward More Accurate and Generalizable Evaluation Metrics for
Task-Oriented Dialogs [19.43845920149182]
We introduce a new dialog-level annotation workflow called Dialog Quality.
DQA expert annotators evaluate the quality of dialogs as a whole, and also label dialogs for attributes such as goal completion and user sentiment.
We argue that having high-quality human-annotated data is an important component of evaluating interaction quality for large industrial-scale voice assistant platforms.
arXiv Detail & Related papers (2023-06-06T19:43:29Z) - FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation [58.46761798403072]
We propose a dialogue-level metric that consists of three sub-metrics with each targeting a specific dimension.
The sub-metrics are trained with novel self-supervised objectives and exhibit strong correlations with human judgment for their respective dimensions.
Compared to the existing state-of-the-art metric, the combined metrics achieve around 16% relative improvement on average.
arXiv Detail & Related papers (2022-10-25T08:26:03Z) - What Went Wrong? Explaining Overall Dialogue Quality through
Utterance-Level Impacts [15.018259942339448]
This paper presents a novel approach to automated analysis of conversation logs that learns the relationship between user-system interactions and overall dialogue quality.
Our approach learns the impact of each interaction from the overall user rating without utterance-level annotation.
Experiments show that the automated analysis from our model agrees with expert judgments, making this work the first to show that such weakly-supervised learning of utterance-level quality prediction is highly achievable.
arXiv Detail & Related papers (2021-10-31T19:12:29Z) - WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation
for Multi-turn Dialogue [17.663449579168297]
We simulate a dialogue between an agent and a user (modelled similar to an agent with supervised learning objective) to interact with each other.
The agent uses dynamic blocking to generate ranked diverse responses and exploration-exploitation to select among the Top-K responses.
Empirical studies with two benchmarks indicate that our model can significantly out-perform the response quality and lead to a successful conversation.
arXiv Detail & Related papers (2021-08-01T08:00:45Z) - DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework.
It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue.
Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z) - Turn-level Dialog Evaluation with Dialog-level Weak Signals for
Bot-Human Hybrid Customer Service Systems [0.0]
We developed a machine learning approach that quantifies multiple aspects of the success or values in Customer Service contacts, at anytime during the interaction.
We show how it improves Amazon customer service quality in several applications.
arXiv Detail & Related papers (2020-10-25T19:36:23Z) - Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired
Data [61.71319905364992]
We propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data.
A data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data.
A ranking module is employed to filter out low-quality dialogues.
A model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs.
arXiv Detail & Related papers (2020-09-20T13:06:38Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z) - Modeling Long Context for Task-Oriented Dialogue State Generation [51.044300192906995]
We propose a multi-task learning model with a simple yet effective utterance tagging technique and a bidirectional language model.
Our approaches attempt to solve the problem that the performance of the baseline significantly drops when the input dialogue context sequence is long.
In our experiments, our proposed model achieves a 7.03% relative improvement over the baseline, establishing a new state-of-the-art joint goal accuracy of 52.04% on the MultiWOZ 2.0 dataset.
arXiv Detail & Related papers (2020-04-29T11:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.