An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue
Systems
- URL: http://arxiv.org/abs/2401.04867v2
- Date: Tue, 23 Jan 2024 06:48:45 GMT
- Title: An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue
Systems
- Authors: Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze
- Abstract summary: We investigate the relationship between user behaviors and subjective evaluation scores in social dialogue tasks.
The results reveal that in dialogue tasks where user utterances are primary, like attentive listening and job interview, indicators like the number of utterances and words play a significant role in evaluation.
- Score: 26.003947740875482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Establishing evaluation schemes for spoken dialogue systems is important, but
it can also be challenging. While subjective evaluations are commonly used in
user experiments, objective evaluations are necessary for research comparison
and reproducibility. To address this issue, we propose a framework for
indirectly but objectively evaluating systems based on users' behaviors. In
this paper, to this end, we investigate the relationship between user behaviors
and subjective evaluation scores in social dialogue tasks: attentive listening,
job interview, and first-meeting conversation. The results reveal that in
dialogue tasks where user utterances are primary, such as attentive listening
and job interview, indicators like the number of utterances and words play a
significant role in evaluation. Observing disfluency also can indicate the
effectiveness of formal tasks, such as job interview. On the other hand, in
dialogue tasks with high interactivity, such as first-meeting conversation,
behaviors related to turn-taking, like average switch pause length, become more
important. These findings suggest that selecting appropriate user behaviors can
provide valuable insights for objective evaluation in each social dialogue
task.
Related papers
- Joint Learning of Context and Feedback Embeddings in Spoken Dialogue [3.8673630752805446]
We investigate the possibility of embedding short dialogue contexts and feedback responses in the same representation space using a contrastive learning objective.
Our results show that the model outperforms humans given the same ranking task and that the learned embeddings carry information about the conversational function of feedback responses.
arXiv Detail & Related papers (2024-06-11T14:22:37Z) - Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain
Dialogue Systems [14.98159964397052]
We analyzed and examined what features are needed in an automatic response evaluator from the interlocutor's perspective.
The first experiment on the Hazumi dataset revealed that interlocutor awareness plays a critical role in making automatic response evaluation correlate with the interlocutor's judgments.
The second experiment using massive conversations on X (formerly Twitter) confirmed that dialogue continuity prediction can train an interlocutor-aware response evaluator without human feedback.
arXiv Detail & Related papers (2024-01-04T13:15:41Z) - Toward More Accurate and Generalizable Evaluation Metrics for
Task-Oriented Dialogs [19.43845920149182]
We introduce a new dialog-level annotation workflow called Dialog Quality.
DQA expert annotators evaluate the quality of dialogs as a whole, and also label dialogs for attributes such as goal completion and user sentiment.
We argue that having high-quality human-annotated data is an important component of evaluating interaction quality for large industrial-scale voice assistant platforms.
arXiv Detail & Related papers (2023-06-06T19:43:29Z) - Dialogue Evaluation with Offline Reinforcement Learning [2.580163308334609]
Task-oriented dialogue systems aim to fulfill user goals through natural language interactions.
They are ideally evaluated with human users, which is unattainable to do at every iteration of the development phase.
We propose the use of offline reinforcement learning for dialogue evaluation based on a static corpus.
arXiv Detail & Related papers (2022-09-02T08:32:52Z) - Interacting with Non-Cooperative User: A New Paradigm for Proactive
Dialogue Policy [83.61404191470126]
We propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting.
Specifically, we learn the trade-off via a learned goal weight, which consists of four factors.
The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
arXiv Detail & Related papers (2022-04-07T14:11:31Z) - User Satisfaction Estimation with Sequential Dialogue Act Modeling in
Goal-oriented Conversational Systems [65.88679683468143]
We propose a novel framework, namely USDA, to incorporate the sequential dynamics of dialogue acts for predicting user satisfaction.
USDA incorporates the sequential transitions of both content and act features in the dialogue to predict the user satisfaction.
Experimental results on four benchmark goal-oriented dialogue datasets show that the proposed method substantially and consistently outperforms existing methods on USE.
arXiv Detail & Related papers (2022-02-07T02:50:07Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework.
It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue.
Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.