Enhancing conversational quality in language learning chatbots: An
evaluation of GPT4 for ASR error correction
- URL: http://arxiv.org/abs/2307.09744v1
- Date: Wed, 19 Jul 2023 04:25:21 GMT
- Title: Enhancing conversational quality in language learning chatbots: An
evaluation of GPT4 for ASR error correction
- Authors: Long Mai and Julie Carson-Berndsen
- Abstract summary: This paper explores the use of GPT4 for ASR error correction in conversational settings.
We find that transcriptions corrected by GPT4 lead to higher conversation quality, despite an increase in WER.
- Score: 20.465220855548292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of natural language processing (NLP) technologies into
educational applications has shown promising results, particularly in the
language learning domain. Recently, many spoken open-domain chatbots have been
used as speaking partners, helping language learners improve their language
skills. However, one of the significant challenges is the high word-error-rate
(WER) when recognizing non-native/non-fluent speech, which interrupts
conversation flow and leads to disappointment for learners. This paper explores
the use of GPT4 for ASR error correction in conversational settings. In
addition to WER, we propose to use semantic textual similarity (STS) and next
response sensibility (NRS) metrics to evaluate the impact of error correction
models on the quality of the conversation. We find that transcriptions
corrected by GPT4 lead to higher conversation quality, despite an increase in
WER. GPT4 also outperforms standard error correction methods without the need
for in-domain training data.
Related papers
- Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge.
This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition.
We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z) - Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking [17.96115263146684]
We introduce a simple yet effective data augmentation method to improve robustness of Dialogue State Tracking model.
Our method generates sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.
arXiv Detail & Related papers (2024-09-10T07:06:40Z) - Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue.
This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis.
We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - ChatGPT-4 as a Tool for Reviewing Academic Books in Spanish [1.0052074659955383]
ChatGPT-4 is an artificial intelligence language model developed by OpenAI.
This study evaluates the potential of ChatGPT-4 as an editing tool for Spanish literary and academic books.
arXiv Detail & Related papers (2023-09-20T11:44:45Z) - Does Correction Remain A Problem For Large Language Models? [63.24433996856764]
This paper investigates the role of correction in the context of large language models by conducting two experiments.
The first experiment focuses on correction as a standalone task, employing few-shot learning techniques with GPT-like models for error correction.
The second experiment explores the notion of correction as a preparatory task for other NLP tasks, examining whether large language models can tolerate and perform adequately on texts containing certain levels of noise or errors.
arXiv Detail & Related papers (2023-08-03T14:09:31Z) - DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z) - Is ChatGPT a Highly Fluent Grammatical Error Correction System? A
Comprehensive Evaluation [41.94480044074273]
ChatGPT is a large-scale language model based on the advanced GPT-3.5 architecture.
We design zero-shot chain-of-thought (CoT) and few-shot CoT settings using in-context learning for ChatGPT.
Our evaluation involves assessing ChatGPT's performance on five official test sets in three different languages, along with three document-level GEC test sets in English.
arXiv Detail & Related papers (2023-04-04T12:33:40Z) - Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition [12.354292498112347]
We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
arXiv Detail & Related papers (2020-06-09T13:45:30Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.