Improving Contextual Spelling Correction by External Acoustics Attention
and Semantic Aware Data Augmentation
- URL: http://arxiv.org/abs/2302.11192v1
- Date: Wed, 22 Feb 2023 08:00:08 GMT
- Title: Improving Contextual Spelling Correction by External Acoustics Attention
and Semantic Aware Data Augmentation
- Authors: Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Sheng Zhao
- Abstract summary: We propose an improved non-autoregressive spelling correction model for contextual biasing in E2E neural transducer-based ASR systems.
We incorporate acoustics information with an external attention as well as text hypotheses into CSC to better distinguish target phrase from dissimilar or irrelevant phrases.
Experiments show that the improved method outperforms the baseline ASR+Biasing system by as much as 20.3% relative name recall gain.
- Score: 31.408074817254732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We previously proposed contextual spelling correction (CSC) to correct the
output of end-to-end (E2E) automatic speech recognition (ASR) models with
contextual information such as name, place, etc. Although CSC has achieved
reasonable improvement in the biasing problem, there are still two drawbacks
for further accuracy improvement. First, due to information limitation in text
only hypothesis or weak performance of ASR model on rare domains, the CSC model
may fail to correct phrases with similar pronunciation or anti-context cases
where all biasing phrases are not present in the utterance. Second, there is a
discrepancy between the training and inference of CSC. The bias list in
training is randomly selected but in inference there may be more similarity
between ground truth phrase and other phrases. To solve above limitations, in
this paper we propose an improved non-autoregressive (NAR) spelling correction
model for contextual biasing in E2E neural transducer-based ASR systems to
improve the previous CSC model from two perspectives: Firstly, we incorporate
acoustics information with an external attention as well as text hypotheses
into CSC to better distinguish target phrase from dissimilar or irrelevant
phrases. Secondly, we design a semantic aware data augmentation schema in
training phrase to reduce the mismatch between training and inference to
further boost the biasing accuracy. Experiments show that the improved method
outperforms the baseline ASR+Biasing system by as much as 20.3% relative name
recall gain and achieves stable improvement compared to the previous CSC method
over different bias list name coverage ratio.
Related papers
- Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices [8.77712061194924]
We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models.
Our algorithm performs grapheme-to-phoneme (G2P) conversion directly from wordpieces into phonemes, avoiding explicit word representations.
We achieved up to a 15.2% relative reduction in sentence error rate (SER) on a test set with contextually relevant entities.
arXiv Detail & Related papers (2024-09-24T21:42:25Z) - Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction [40.11364098789309]
Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora.
Two data augmentation methods are widely adopted: (1) textitRandom Replacement with the guidance of confusion sets and (2) textitOCR/ASR-based Generation that simulates character misusing.
arXiv Detail & Related papers (2024-07-22T09:26:35Z) - Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR [1.8477401359673709]
Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions.
We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM)
We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
arXiv Detail & Related papers (2024-01-06T16:29:13Z) - Grammatical Error Correction via Mixed-Grained Weighted Training [68.94921674855621]
Grammatical Error Correction (GEC) aims to automatically correct grammatical errors in natural texts.
MainGEC designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation.
arXiv Detail & Related papers (2023-11-23T08:34:37Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - Towards Contextual Spelling Correction for Customization of End-to-end
Speech Recognition Systems [27.483603895258437]
We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system.
We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model.
Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
arXiv Detail & Related papers (2022-03-02T06:00:48Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Exploration and Exploitation: Two Ways to Improve Chinese Spelling
Correction Models [51.744357472072416]
We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances.
Experimental results show that such an adversarial training method combined with the pretraining strategy can improve both the generalization and robustness of multiple CSC models.
arXiv Detail & Related papers (2021-05-31T09:17:33Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.