Automated Speaking Assessment of Conversation Tests with Novel Graph-based Modeling on Spoken Response Coherence
- URL: http://arxiv.org/abs/2409.07064v1
- Date: Wed, 11 Sep 2024 07:24:07 GMT
- Title: Automated Speaking Assessment of Conversation Tests with Novel Graph-based Modeling on Spoken Response Coherence
- Authors: Jiun-Ting Li, Bi-Cheng Yan, Tien-Hong Lo, Yi-Cheng Wang, Yung-Chang Hsu, Berlin Chen,
- Abstract summary: ASAC aims to evaluate the overall speaking proficiency of an L2 speaker in a setting where an interlocutor interacts with one or more candidates.
We propose a hierarchical graph model that aptly incorporates both broad inter-response interactions and nuanced semantic information.
Extensive experimental results on the NICT-JLE benchmark dataset suggest that our proposed modeling approach can yield considerable improvements in prediction accuracy.
- Score: 11.217656140423207
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automated speaking assessment in conversation tests (ASAC) aims to evaluate the overall speaking proficiency of an L2 (second-language) speaker in a setting where an interlocutor interacts with one or more candidates. Although prior ASAC approaches have shown promising performance on their respective datasets, there is still a dearth of research specifically focused on incorporating the coherence of the logical flow within a conversation into the grading model. To address this critical challenge, we propose a hierarchical graph model that aptly incorporates both broad inter-response interactions (e.g., discourse relations) and nuanced semantic information (e.g., semantic words and speaker intents), which is subsequently fused with contextual information for the final prediction. Extensive experimental results on the NICT-JLE benchmark dataset suggest that our proposed modeling approach can yield considerable improvements in prediction accuracy with respect to various assessment metrics, as compared to some strong baselines. This also sheds light on the importance of investigating coherence-related facets of spoken responses in ASAC.
Related papers
- FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Conversational speech recognition leveraging effective fusion methods
for cross-utterance language modeling [12.153618111267514]
We put forward disparate conversation history fusion methods for language modeling in automatic speech recognition.
A novel audio-fusion mechanism is introduced, which manages to fuse and utilize the acoustic embeddings of a current utterance and the semantic content of its corresponding conversation history.
To flesh out our ideas, we frame the ASR N-best hypothesis rescoring task as a prediction problem, leveraging BERT, an iconic pre-trained LM.
arXiv Detail & Related papers (2021-11-05T09:07:23Z) - Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment
Model with Cross-Domain Features [30.57631206882462]
The MOSA-Net is designed to estimate speech quality, intelligibility, and distortion assessment scores based on a test speech signal as input.
We show that the MOSA-Net can precisely predict perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (BLS) scores when tested on both noisy and enhanced speech utterances.
arXiv Detail & Related papers (2021-11-03T17:30:43Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings [33.89889949577356]
We propose DialogueCSE, a dialogue-based contrastive learning approach to tackle this issue.
We evaluate our model on three multi-turn dialogue datasets: the Microsoft Dialogue Corpus, the Jing Dong Dialogue Corpus, and the E-commerce Dialogue Corpus.
arXiv Detail & Related papers (2021-09-26T13:25:41Z) - Self-supervised Dialogue Learning for Spoken Conversational Question
Answering [29.545937716796082]
In spoken conversational question answering (SCQA), the answer to the corresponding question is generated by retrieving and then analyzing a fixed spoken document, including multi-part conversations.
We introduce a self-supervised learning approach, including incoherence discrimination, insertion detection, and question prediction, to explicitly capture the coreference resolution and dialogue coherence.
Our proposed method provides more coherent, meaningful, and appropriate responses, yielding superior performance gains compared to the original pre-trained language models.
arXiv Detail & Related papers (2021-06-04T00:09:38Z) - I like fish, especially dolphins: Addressing Contradictions in Dialogue
Modeling [104.09033240889106]
We introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues.
We then compare a structured utterance-based approach of using pre-trained Transformer models for contradiction detection with the typical unstructured approach.
arXiv Detail & Related papers (2020-12-24T18:47:49Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.