Towards Contextual Spelling Correction for Customization of End-to-end
Speech Recognition Systems
- URL: http://arxiv.org/abs/2203.00888v1
- Date: Wed, 2 Mar 2022 06:00:48 GMT
- Title: Towards Contextual Spelling Correction for Customization of End-to-end
Speech Recognition Systems
- Authors: Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Veljko Miljanic, Sheng Zhao,
Hosam Khalil
- Abstract summary: We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system.
We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model.
Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
- Score: 27.483603895258437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual biasing is an important and challenging task for end-to-end
automatic speech recognition (ASR) systems, which aims to achieve better
recognition performance by biasing the ASR system to particular context phrases
such as person names, music list, proper nouns, etc. Existing methods mainly
include contextual LM biasing and adding bias encoder into end-to-end ASR
models. In this work, we introduce a novel approach to do contextual biasing by
adding a contextual spelling correction model on top of the end-to-end ASR
system. We incorporate contextual information into a sequence-to-sequence
spelling correction model with a shared context encoder. Our proposed model
includes two different mechanisms: autoregressive (AR) and non-autoregressive
(NAR). We propose filtering algorithms to handle large-size context lists, and
performance balancing mechanisms to control the biasing degree of the model. We
demonstrate the proposed model is a general biasing solution which is
domain-insensitive and can be adopted in different scenarios. Experiments show
that the proposed method achieves as much as 51% relative word error rate (WER)
reduction over ASR system and outperforms traditional biasing methods. Compared
to the AR solution, the proposed NAR model reduces model size by 43.2% and
speeds up inference by 2.1 times.
Related papers
- Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR)
The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task.
We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR
Error Correction [0.9502148118198473]
We propose PATCorrect, a novel non-autoregressive (NAR) approach to reduce word error rate (WER)
We demonstrate that PATCorrect consistently outperforms state-of-the-art NAR method on English corpus across different upstream ASR systems.
arXiv Detail & Related papers (2023-02-10T04:05:24Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system.
We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model.
Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z) - A Light-weight contextual spelling correction model for customizing
transducer-based speech recognition systems [42.05399301143457]
We introduce a light-weight contextual spelling correction model to correct context-related recognition errors.
Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction.
The model also shows excellent performance for out-of-vocabulary terms not seen during training.
arXiv Detail & Related papers (2021-08-17T08:14:37Z) - FastCorrect: Fast Error Correction with Edit Alignment for Automatic
Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment.
FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model.
It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.