Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function
- URL: http://arxiv.org/abs/2509.09197v1
- Date: Thu, 11 Sep 2025 07:12:17 GMT
- Title: Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function
- Authors: Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng,
- Abstract summary: We propose a keyword-aware loss function that focuses on biased words when training biasing modules.<n>By adapting Whisper to 10 hours of synthetic data, our method reduced the word error rate on the NSC Part 2 test set from 29.71% to 11.81%.
- Score: 39.43946600820569
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rare word recognition can be improved by adapting ASR models to synthetic data that includes these words. Further improvements can be achieved through contextual biasing, which trains and adds a biasing module into the model architecture to prioritize rare words. While training the module on synthetic rare word data is more effective than using non-rare-word data, it can lead to overfitting due to artifacts in the synthetic audio. To address this, we enhance the TCPGen-based contextual biasing approach and propose a keyword-aware loss function that additionally focuses on biased words when training biasing modules. This loss includes a masked cross-entropy term for biased word prediction and a binary classification term for detecting biased word positions. These two terms complementarily support the decoding of biased words during inference. By adapting Whisper to 10 hours of synthetic data, our method reduced the word error rate on the NSC Part 2 test set from 29.71% to 11.81%.
Related papers
- Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper [33.50962290311746]
Deepfake speech utterances can be forged by replacing one or more words in a bona fide utterance with semantically different words.<n>We develop a method that fine-tunes a pre-trained Whisper model to detect synthetic words while transcribing the input utterance.
arXiv Detail & Related papers (2026-02-26T06:17:56Z) - Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition [56.972851337263755]
We propose a method which allows corrections of substitution errors to improve the recognition accuracy of challenging words.<n>We show that with this method we get a relative improvement in biased word error rate of up to 11%, while maintaining a competitive overall word error rate.
arXiv Detail & Related papers (2025-06-23T14:42:03Z) - WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing [5.50485371072671]
We propose a method to improve recognition accuracy of rare words in CTC-based models without additional training or text-to-speech systems.<n>For keyword detection, we adopt a wildcard CTC that is both fast and tolerant of ambiguous matches.<n>In experiments on Japanese speech recognition, the proposed method achieved a 29% improvement in the F1 score for unknown words.
arXiv Detail & Related papers (2025-06-02T02:30:26Z) - Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation [27.057810339120664]
We propose two techniques to improve context-aware ASR models.
On LibriSpeech, our techniques together reduce the rare word error rate by 60% and 25% relatively compared to no biasing and shallow fusion.
On SPGISpeech and a real-world dataset ConEC, our techniques also yield good improvements over the baselines.
arXiv Detail & Related papers (2024-07-14T19:32:33Z) - CB-Conformer: Contextual biasing Conformer for biased word recognition [33.28780163232423]
We introduce the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.
Our proposed method brings a 15.34% character error rate reduction, a 14.13% biased word recall increase, and a 6.80% biased word F1-score increase compared with the base Conformer.
arXiv Detail & Related papers (2023-04-19T12:26:04Z) - Improving Contextual Recognition of Rare Words with an Alternate
Spelling Prediction Model [0.0]
We release contextual biasing lists to accompany the Earnings21 dataset.
We show results for shallow fusion contextual biasing applied to two different decoding algorithms.
We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative.
arXiv Detail & Related papers (2022-09-02T19:30:16Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Taking Notes on the Fly Helps BERT Pre-training [94.43953312613577]
Taking Notes on the Fly (TNF) takes notes for rare words on the fly during pre-training to help the model understand them when they occur next time.
TNF provides better data utilization since cross-sentence information is employed to cover the inadequate semantics caused by rare words in the sentences.
arXiv Detail & Related papers (2020-08-04T11:25:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.