Related papers: SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

URL: http://arxiv.org/abs/2306.02317v1
Date: Sun, 4 Jun 2023 10:00:12 GMT
Title: SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings
Authors: Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg
Abstract summary: Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition. We propose a novel algorithm for candidate retrieval based on misspelled n-gram mappings. Experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.
Score: 76.87664008338317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition (ASR) quality given user vocabulary. To deal with large user vocabularies, most of these models include candidate retrieval mechanisms, usually based on minimum edit distance between fragments of ASR hypothesis and user phrases. However, the edit-distance approach is slow, non-trainable, and may have low recall as it relies only on common letters. We propose: 1) a novel algorithm for candidate retrieval, based on misspelled n-gram mappings, which gives up to 90% recall with just the top 10 candidates on Spoken Wikipedia; 2) a non-autoregressive neural model based on BERT architecture, where the initial transcript and ten candidates are combined into one input. The experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.

Related papers

Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices [8.77712061194924]
We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models. Our algorithm performs grapheme-to-phoneme (G2P) conversion directly from wordpieces into phonemes, avoiding explicit word representations. We achieved up to a 15.2% relative reduction in sentence error rate (SER) on a test set with contextually relevant entities.
arXiv Detail & Related papers (2024-09-24T21:42:25Z)
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation [27.057810339120664]
We propose two techniques to improve context-aware ASR models. On LibriSpeech, our techniques together reduce the rare word error rate by 60% and 25% relatively compared to no biasing and shallow fusion. On SPGISpeech and a real-world dataset ConEC, our techniques also yield good improvements over the baselines.
arXiv Detail & Related papers (2024-07-14T19:32:33Z)
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model [0.0]
We release contextual biasing lists to accompany the Earnings21 dataset. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative.
arXiv Detail & Related papers (2022-09-02T19:30:16Z)
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator [19.372248692745167]
Contextual knowledge is essential for reducing speech recognition errors on high-valued long-tail words. This paper proposes a novel tree-constrained pointer generator ( TCPGen) component that enables end-to-end ASR models to bias towards a list of long-tail words.
arXiv Detail & Related papers (2022-05-18T16:40:50Z)
Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z)
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition [92.12910821300034]
We propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline.
arXiv Detail & Related papers (2021-09-29T13:48:03Z)
Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models. Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.