Related papers: WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing

WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing

URL: http://arxiv.org/abs/2506.01263v1
Date: Mon, 02 Jun 2025 02:30:26 GMT
Title: WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
Authors: Yu Nakagome, Michael Hentschel,
Abstract summary: We propose a method to improve recognition accuracy of rare words in CTC-based models without additional training or text-to-speech systems.<n>For keyword detection, we adopt a wildcard CTC that is both fast and tolerant of ambiguous matches.<n>In experiments on Japanese speech recognition, the proposed method achieved a 29% improvement in the F1 score for unknown words.
Score: 5.50485371072671
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite recent advances in end-to-end speech recognition methods, the output tends to be biased to the training data's vocabulary, resulting in inaccurate recognition of proper nouns and other unknown terms. To address this issue, we propose a method to improve recognition accuracy of such rare words in CTC-based models without additional training or text-to-speech systems. Specifically, keyword spotting is performed using acoustic features of intermediate layers during inference, and a bias is applied to the subsequent layers of the acoustic model for detected keywords. For keyword detection, we adopt a wildcard CTC that is both fast and tolerant of ambiguous matches, allowing flexible handling of words that are difficult to match strictly. Since this method does not require retraining of existing models, it can be easily applied to even large-scale models. In experiments on Japanese speech recognition, the proposed method achieved a 29% improvement in the F1 score for unknown words.

Related papers

Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition [56.972851337263755]
We propose a method which allows corrections of substitution errors to improve the recognition accuracy of challenging words.<n>We show that with this method we get a relative improvement in biased word error rate of up to 11%, while maintaining a competitive overall word error rate.
arXiv Detail & Related papers (2025-06-23T14:42:03Z)
An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition [10.234673954430221]
We study the impact of altering the context list to have words with different frequency distributions on model performance. A series of experiments conducted on the AISHELL-1 benchmark dataset suggests that using all vocabulary words from the training corpus as the context list and pairing them with our balanced objective yields the best performance.
arXiv Detail & Related papers (2024-09-10T12:52:36Z)
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions [5.50485371072671]
Our method improves the recognition accuracy of misrecognized target keywords by substituting intermediate CTC predictions with corrected labels. Experiments conducted in Japanese demonstrated that our method successfully improved the F1 score for unknown words.
arXiv Detail & Related papers (2024-06-21T06:25:10Z)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z)
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords. We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z)
Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition.<n>We use a memory-enhanced ASR model from the literature to decode new words from the slides.<n>We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z)
Open-vocabulary Keyword-spotting with Adaptive Instance Normalization [18.250276540068047]
We propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. We show significant improvements over recent keyword spotting and ASR baselines.
arXiv Detail & Related papers (2023-09-13T13:49:42Z)
Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z)
Frequency-Aware Contrastive Learning for Neural Machine Translation [24.336356651877388]
Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems. Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective. We propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words.
arXiv Detail & Related papers (2021-12-29T10:10:10Z)
Spell my name: keyword boosted speech recognition [25.931897154065663]
uncommon words such as names and technical terminology are important to understanding conversations in context. We propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords. The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions. We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations.
arXiv Detail & Related papers (2021-10-06T14:16:57Z)
MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.