Generative error correction for code-switching speech recognition using
large language models
- URL: http://arxiv.org/abs/2310.13013v1
- Date: Tue, 17 Oct 2023 14:49:48 GMT
- Title: Generative error correction for code-switching speech recognition using
large language models
- Authors: Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco
Siniscalchi, Eng Siong Chng
- Abstract summary: Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
- Score: 49.06203730433107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-switching (CS) speech refers to the phenomenon of mixing two or more
languages within the same sentence. Despite the recent advances in automatic
speech recognition (ASR), CS-ASR is still a challenging task ought to the
grammatical structure complexity of the phenomenon and the data scarcity of
specific training corpus. In this work, we propose to leverage large language
models (LLMs) and lists of hypotheses generated by an ASR to address the CS
problem. Specifically, we first employ multiple well-trained ASR models for
N-best hypotheses generation, with the aim of increasing the diverse and
informative elements in the set of hypotheses. Next, we utilize the LLMs to
learn the hypotheses-to-transcription (H2T) mapping by adding a trainable
low-rank adapter. Such a generative error correction (GER) method directly
predicts the accurate transcription according to its expert linguistic
knowledge and N-best hypotheses, resulting in a paradigm shift from the
traditional language model rescoring or error correction techniques.
Experimental evidence demonstrates that GER significantly enhances CS-ASR
accuracy, in terms of reduced mixed error rate (MER). Furthermore, LLMs show
remarkable data efficiency for H2T learning, providing a potential solution to
the data scarcity problem of CS-ASR in low-resource languages.
Related papers
- Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction [34.32834323898953]
generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors.
This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text utterances.
We also introduce a new multi-pass augmented generative error correction (MPA GER) by integrating multiple system hypotheses on the input side with corrections from multiple LLMs on the output side and then merging them.
arXiv Detail & Related papers (2024-08-29T00:18:12Z) - Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models [41.997517537042434]
Large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this paper, we propose ClozeGER, a new paradigm for ASR generative error correction.
arXiv Detail & Related papers (2024-05-16T12:05:45Z) - Large Language Models are Efficient Learners of Noise-Robust Speech
Recognition [65.95847272465124]
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER.
Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate.
arXiv Detail & Related papers (2024-01-19T01:29:27Z) - Prompt Optimization via Adversarial In-Context Learning [51.18075178593142]
adv-ICL is implemented as a two-player game between a generator and a discriminator.
The generator tries to generate realistic enough output to fool the discriminator.
We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques.
arXiv Detail & Related papers (2023-12-05T09:44:45Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Boosting Chinese ASR Error Correction with Dynamic Error Scaling
Mechanism [27.09416337926635]
Current mainstream models often struggle with effectively utilizing word-level features and phonetic information.
This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text.
arXiv Detail & Related papers (2023-08-07T09:19:59Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.