Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
- URL: http://arxiv.org/abs/2407.06479v1
- Date: Tue, 9 Jul 2024 00:56:59 GMT
- Title: Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
- Authors: Rena Gao, Carsten Roever, Jey Han Lau,
- Abstract summary: We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language speakers.
Our framework collects dialogue-level interactivity labels and micro-level span features.
We study how the micro-level features influence the (higher level) interactivity quality of ESL dialogues by constructing various machine learning-based models.
- Score: 22.56326809612278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language (ESL) speakers. Our framework collects dialogue-level interactivity labels (e.g., topic management; 4 labels in total) and micro-level span features (e.g., backchannels; 17 features in total). Given our annotated data, we study how the micro-level features influence the (higher level) interactivity quality of ESL dialogues by constructing various machine learning-based models. Our results demonstrate that certain micro-level features strongly correlate with interactivity quality, like reference word (e.g., she, her, he), revealing new insights about the interaction between higher-level dialogue quality and lower-level linguistic signals. Our framework also provides a means to assess ESL communication, which is useful for language assessment.
Related papers
- CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues [21.34138535130589]
We develop CNIMA, a Chinese-as-a-second-language labelled dataset with 10K dialogues.
We annotate CNIMA using an evaluation framework that assesses micro-level features.
We propose an approach to automate the evaluation and find strong performance.
arXiv Detail & Related papers (2024-08-29T13:28:52Z) - Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue.
This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis.
We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z) - Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - Toward More Accurate and Generalizable Evaluation Metrics for
Task-Oriented Dialogs [19.43845920149182]
We introduce a new dialog-level annotation workflow called Dialog Quality.
DQA expert annotators evaluate the quality of dialogs as a whole, and also label dialogs for attributes such as goal completion and user sentiment.
We argue that having high-quality human-annotated data is an important component of evaluating interaction quality for large industrial-scale voice assistant platforms.
arXiv Detail & Related papers (2023-06-06T19:43:29Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - What Helps Transformers Recognize Conversational Structure? Importance
of Context, Punctuation, and Labels in Dialog Act Recognition [41.1669799542627]
We apply two pre-trained transformer models to structure a conversational transcript as a sequence of dialog acts.
We find that the inclusion of a broader conversational context helps disambiguate many dialog act classes.
A detailed analysis reveals specific segmentation patterns observed in its absence.
arXiv Detail & Related papers (2021-07-05T21:56:00Z) - DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework.
It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue.
Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z) - GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating
Open-Domain Dialogue Systems [133.13117064357425]
We propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.
Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence.
Experimental results show that our GRADE significantly outperforms other state-of-the-art metrics on measuring diverse dialogue models.
arXiv Detail & Related papers (2020-10-08T14:07:32Z) - Exploring Recurrent, Memory and Attention Based Architectures for
Scoring Interactional Aspects of Human-Machine Text Dialog [9.209192502526285]
This paper builds on previous work in this direction to investigate multiple neural architectures.
We conduct experiments on a conversational database of text dialogs from human learners interacting with a cloud-based dialog system.
We find that fusion of multiple architectures performs competently on our automated scoring task relative to expert inter-rater agreements.
arXiv Detail & Related papers (2020-05-20T03:23:00Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.