Related papers: Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition

Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition

URL: http://arxiv.org/abs/2506.18703v1
Date: Mon, 23 Jun 2025 14:42:03 GMT
Title: Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition
Authors: Christian Huber, Alexander Waibel,
Abstract summary: We propose a method which allows corrections of substitution errors to improve the recognition accuracy of challenging words.<n>We show that with this method we get a relative improvement in biased word error rate of up to 11%, while maintaining a competitive overall word error rate.
Score: 56.972851337263755
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition. When using appropriate modeling units, e.g., byte-pair encoded characters, these systems are in principal open vocabulary systems. In practice, however, they often fail to recognize words not seen during training, e.g., named entities, acronyms, or domain-specific special words. To address this problem, many context biasing methods have been proposed; however, for words with a pronunciation-orthography mismatch, these methods may still struggle. We propose a method which allows corrections of substitution errors to improve the recognition accuracy of such challenging words. Users can add corrections on the fly during inference. We show that with this method we get a relative improvement in biased word error rate of up to 11\%, while maintaining a competitive overall word error rate.

Related papers

WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing [5.50485371072671]
We propose a method to improve recognition accuracy of rare words in CTC-based models without additional training or text-to-speech systems.<n>For keyword detection, we adopt a wildcard CTC that is both fast and tolerant of ambiguous matches.<n>In experiments on Japanese speech recognition, the proposed method achieved a 29% improvement in the F1 score for unknown words.
arXiv Detail & Related papers (2025-06-02T02:30:26Z)
Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora.<n>We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.<n>This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
arXiv Detail & Related papers (2024-06-12T16:30:58Z)
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search [44.94458898538114]
This paper proposes an attention-based contextual biasing method that can be customized using an editable phrase list. The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.
arXiv Detail & Related papers (2024-01-19T01:36:07Z)
Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition.<n>We use a memory-enhanced ASR model from the literature to decode new words from the slides.<n>We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z)
Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z)
Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z)
Fast and Robust Unsupervised Contextual Biasing for Speech Recognition [16.557586847398778]
We propose an alternative approach that does not entail explicit contextual language model. We derive the bias score for every word in the system vocabulary from the training corpus. We show significant improvement in recognition accuracy when the relevant context is available.
arXiv Detail & Related papers (2020-05-04T17:29:59Z)
Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system. One of the popular approach to cover OOVs is to use subword units rather then words. In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.