Related papers: Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations

Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations

URL: http://arxiv.org/abs/2407.06479v1
Date: Tue, 9 Jul 2024 00:56:59 GMT
Title: Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
Authors: Rena Gao, Carsten Roever, Jey Han Lau,
Abstract summary: We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language speakers. Our framework collects dialogue-level interactivity labels and micro-level span features. We study how the micro-level features influence the (higher level) interactivity quality of ESL dialogues by constructing various machine learning-based models.
Score: 22.56326809612278
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language (ESL) speakers. Our framework collects dialogue-level interactivity labels (e.g., topic management; 4 labels in total) and micro-level span features (e.g., backchannels; 17 features in total). Given our annotated data, we study how the micro-level features influence the (higher level) interactivity quality of ESL dialogues by constructing various machine learning-based models. Our results demonstrate that certain micro-level features strongly correlate with interactivity quality, like reference word (e.g., she, her, he), revealing new insights about the interaction between higher-level dialogue quality and lower-level linguistic signals. Our framework also provides a means to assess ESL communication, which is useful for language assessment.

Related papers

CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues [21.34138535130589]
We develop CNIMA, a Chinese-as-a-second-language labelled dataset with 10K dialogues. We annotate CNIMA using an evaluation framework that assesses micro-level features. We propose an approach to automate the evaluation and find strong performance.
arXiv Detail & Related papers (2024-08-29T13:28:52Z)
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z)
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems. Previous studies suggest using only a portion of the dialogue context in the annotation process. This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z)
Toward More Accurate and Generalizable Evaluation Metrics for Task-Oriented Dialogs [19.43845920149182]
We introduce a new dialog-level annotation workflow called Dialog Quality. DQA expert annotators evaluate the quality of dialogs as a whole, and also label dialogs for attributes such as goal completion and user sentiment. We argue that having high-quality human-annotated data is an important component of evaluating interaction quality for large industrial-scale voice assistant platforms.
arXiv Detail & Related papers (2023-06-06T19:43:29Z)
FCC: Fusing Conversation History and Candidate Provenance for Contextual Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels. We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z)
What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition [41.1669799542627]
We apply two pre-trained transformer models to structure a conversational transcript as a sequence of dialog acts. We find that the inclusion of a broader conversational context helps disambiguate many dialog act classes. A detailed analysis reveals specific segmentation patterns observed in its absence.
arXiv Detail & Related papers (2021-07-05T21:56:00Z)
DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework. It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z)
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems [133.13117064357425]
We propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation. Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence. Experimental results show that our GRADE significantly outperforms other state-of-the-art metrics on measuring diverse dialogue models.
arXiv Detail & Related papers (2020-10-08T14:07:32Z)
Exploring Recurrent, Memory and Attention Based Architectures for Scoring Interactional Aspects of Human-Machine Text Dialog [9.209192502526285]
This paper builds on previous work in this direction to investigate multiple neural architectures. We conduct experiments on a conversational database of text dialogs from human learners interacting with a cloud-based dialog system. We find that fusion of multiple architectures performs competently on our automated scoring task relative to expert inter-rater agreements.
arXiv Detail & Related papers (2020-05-20T03:23:00Z)
Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE. We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks. Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.