Related papers: Can Generative Large Language Models Perform ASR Error Correction?

Can Generative Large Language Models Perform ASR Error Correction?

URL: http://arxiv.org/abs/2307.04172v2
Date: Fri, 29 Sep 2023 07:32:03 GMT
Title: Can Generative Large Language Models Perform ASR Error Correction?
Authors: Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill
Abstract summary: generative large language models (LLMs) have been applied to a wide range of natural language processing tasks. In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction. Experiments show that this generative LLM approach can yield performance gains for two different state-of-the-art ASR architectures.
Score: 16.246481696611117
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: ASR error correction is an interesting option for post processing speech recognition system outputs. These error correction models are usually trained in a supervised fashion using the decoding results of a target ASR system. This approach can be computationally intensive and the model is tuned to a specific ASR system. Recently generative large language models (LLMs) have been applied to a wide range of natural language processing tasks, as they can operate in a zero-shot or few shot fashion. In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction. Based on the ASR N-best output, we propose both unconstrained and constrained, where a member of the N-best list is selected, approaches. Additionally, zero and 1-shot settings are evaluated. Experiments show that this generative LLM approach can yield performance gains for two different state-of-the-art ASR architectures, transducer and attention-encoder-decoder based, and multiple test sets.

Related papers

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction [20.04650481108717]
This paper studies ASR error correction in the Chinese language. To the best of our knowledge, it is the first Chinese ASR error correction benchmark. Inspired by the recent advances in emphlarge language models (LLMs), we investigate how to harness the power of LLMs to correct ASR errors.
arXiv Detail & Related papers (2024-12-04T06:52:10Z)
ASR Error Correction using Large Language Models [4.75940708384553]
Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions. This work investigates the use of large language models (LLMs) for error correction across diverse scenarios.
arXiv Detail & Related papers (2024-09-14T23:33:38Z)
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction [34.32834323898953]
generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text utterances. We also introduce a new multi-pass augmented generative error correction (MPA GER) by integrating multiple system hypotheses on the input side with corrections from multiple LLMs on the output side and then merging them.
arXiv Detail & Related papers (2024-08-29T00:18:12Z)
Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs) To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z)
Prompt Optimization via Adversarial In-Context Learning [51.18075178593142]
adv-ICL is implemented as a two-player game between a generator and a discriminator. The generator tries to generate realistic enough output to fool the discriminator. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques.
arXiv Detail & Related papers (2023-12-05T09:44:45Z)
Generative error correction for code-switching speech recognition using large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system. We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model. Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z)
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment. FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model. It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z)
An Approach to Improve Robustness of NLP Systems against ASR Errors [39.57253455717825]
Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules. The errors of the ASR system can seriously downgrade the performance of the NLP modules. Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
arXiv Detail & Related papers (2021-03-25T05:15:43Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.