CB-Conformer: Contextual biasing Conformer for biased word recognition
- URL: http://arxiv.org/abs/2304.09607v2
- Date: Tue, 25 Apr 2023 07:45:40 GMT
- Title: CB-Conformer: Contextual biasing Conformer for biased word recognition
- Authors: Yaoxun Xu and Baiji Liu and Qiaochu Huang and, Xingchen Song and
Zhiyong Wu and Shiyin Kang and Helen Meng
- Abstract summary: We introduce the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.
Our proposed method brings a 15.34% character error rate reduction, a 14.13% biased word recall increase, and a 6.80% biased word F1-score increase compared with the base Conformer.
- Score: 33.28780163232423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the mismatch between the source and target domains, how to better
utilize the biased word information to improve the performance of the automatic
speech recognition model in the target domain becomes a hot research topic.
Previous approaches either decode with a fixed external language model or
introduce a sizeable biasing module, which leads to poor adaptability and slow
inference. In this work, we propose CB-Conformer to improve biased word
recognition by introducing the Contextual Biasing Module and the Self-Adaptive
Language Model to vanilla Conformer. The Contextual Biasing Module combines
audio fragments and contextual information, with only 0.2% model parameters of
the original Conformer. The Self-Adaptive Language Model modifies the internal
weights of biased words based on their recall and precision, resulting in a
greater focus on biased words and more successful integration with the
automatic speech recognition model than the standard fixed language model. In
addition, we construct and release an open-source Mandarin biased-word dataset
based on WenetSpeech. Experiments indicate that our proposed method brings a
15.34% character error rate reduction, a 14.13% biased word recall increase,
and a 6.80% biased word F1-score increase compared with the base Conformer.
Related papers
- Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model [0.0]
OpenAI's Whisper Automated Speech Recognition model excels in generalizing across diverse datasets and domains.
We propose a method to enhance transcription accuracy without explicit fine-tuning or altering model parameters.
arXiv Detail & Related papers (2024-10-24T01:58:11Z) - Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition [9.03519622415822]
This study introduces a Cross-lingual Contextual Biasing(XCB) module.
We augment a pre-trained ASR model for the dominant language by integrating an auxiliary language biasing module and a language-specific loss.
Experimental results conducted on our in-house code-switching dataset have validated the efficacy of our approach.
arXiv Detail & Related papers (2024-08-20T04:00:19Z) - Contextualized End-to-End Speech Recognition with Contextual Phrase
Prediction Network [14.115294331065318]
We introduce a contextual phrase prediction network for an attention-based deep bias method.
This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model.
Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models.
arXiv Detail & Related papers (2023-05-21T16:08:04Z) - Robust Acoustic and Semantic Contextual Biasing in Neural Transducers
for Speech Recognition [14.744220870243932]
We propose to use lightweight character representations to encode fine-grained pronunciation features to improve contextual biasing.
We further integrate pretrained neural language model (NLM) based encoders to encode the utterance's semantic context.
Experiments using a Conformer Transducer model on the Librispeech dataset show a 4.62% - 9.26% relative WER improvement on different biasing list sizes.
arXiv Detail & Related papers (2023-05-09T08:51:44Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Fast and Robust Unsupervised Contextual Biasing for Speech Recognition [16.557586847398778]
We propose an alternative approach that does not entail explicit contextual language model.
We derive the bias score for every word in the system vocabulary from the training corpus.
We show significant improvement in recognition accuracy when the relevant context is available.
arXiv Detail & Related papers (2020-05-04T17:29:59Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.