Can Generative Large Language Models Perform ASR Error Correction?
- URL: http://arxiv.org/abs/2307.04172v2
- Date: Fri, 29 Sep 2023 07:32:03 GMT
- Title: Can Generative Large Language Models Perform ASR Error Correction?
- Authors: Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill
- Abstract summary: generative large language models (LLMs) have been applied to a wide range of natural language processing tasks.
In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction.
Experiments show that this generative LLM approach can yield performance gains for two different state-of-the-art ASR architectures.
- Score: 16.246481696611117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ASR error correction is an interesting option for post processing speech
recognition system outputs. These error correction models are usually trained
in a supervised fashion using the decoding results of a target ASR system. This
approach can be computationally intensive and the model is tuned to a specific
ASR system. Recently generative large language models (LLMs) have been applied
to a wide range of natural language processing tasks, as they can operate in a
zero-shot or few shot fashion. In this paper we investigate using ChatGPT, a
generative LLM, for ASR error correction. Based on the ASR N-best output, we
propose both unconstrained and constrained, where a member of the N-best list
is selected, approaches. Additionally, zero and 1-shot settings are evaluated.
Experiments show that this generative LLM approach can yield performance gains
for two different state-of-the-art ASR architectures, transducer and
attention-encoder-decoder based, and multiple test sets.
Related papers
- Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition [21.516152600963775]
Denoising LM (DLM) is a $textitscaled$ error correction model trained with vast amounts of synthetic data.
DLM achieves 1.5% word error rate (WER) on $textittest-clean$ and 3.3% WER on $textittest-other$ on Librispeech.
arXiv Detail & Related papers (2024-05-24T05:05:12Z) - Prompt Optimization via Adversarial In-Context Learning [51.18075178593142]
adv-ICL is implemented as a two-player game between a generator and a discriminator.
The generator tries to generate realistic enough output to fool the discriminator.
We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques.
arXiv Detail & Related papers (2023-12-05T09:44:45Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system.
We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model.
Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z) - Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS)
ASR errors directly affect the quality of the output summary in the cascade approach.
We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z) - Improving Distinction between ASR Errors and Speech Disfluencies with
Feature Space Interpolation [0.0]
Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing.
This paper proposes a scheme to improve existing LM-based ASR error detection systems.
arXiv Detail & Related papers (2021-08-04T02:11:37Z) - FastCorrect: Fast Error Correction with Edit Alignment for Automatic
Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment.
FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model.
It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z) - An Approach to Improve Robustness of NLP Systems against ASR Errors [39.57253455717825]
Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules.
The errors of the ASR system can seriously downgrade the performance of the NLP modules.
Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
arXiv Detail & Related papers (2021-03-25T05:15:43Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.