Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
- URL: http://arxiv.org/abs/2310.00178v1
- Date: Fri, 29 Sep 2023 22:50:10 GMT
- Title: Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
- Authors: Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe
Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng,
Ding Zhao, Tara Sainath, Pedro Moreno Mengibar
- Abstract summary: Contextual biasing refers to the problem of biasing automatic speech recognition systems towards rare entities.
We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching.
- Score: 45.42075576656938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextual biasing refers to the problem of biasing the automatic speech
recognition (ASR) systems towards rare entities that are relevant to the
specific user or application scenarios. We propose algorithms for contextual
biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During
beam search, we boost the score of a token extension if it extends matching
into a set of biasing phrases. Our method simulates the classical approaches
often implemented in the weighted finite state transducer (WFST) framework, but
avoids the FST language altogether, with careful considerations on memory
footprint and efficiency on tensor processing units (TPUs) by vectorization.
Without introducing additional model parameters, our method achieves
significant word error rate (WER) reductions on biasing test sets by itself,
and yields further performance gain when combined with a model-based biasing
method.
Related papers
- Contextualized Automatic Speech Recognition with Attention-Based Bias
Phrase Boosted Beam Search [44.94458898538114]
This paper proposes an attention-based contextual biasing method that can be customized using an editable phrase list.
The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.
arXiv Detail & Related papers (2024-01-19T01:36:07Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR)
The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task.
We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z) - Robust Acoustic and Semantic Contextual Biasing in Neural Transducers
for Speech Recognition [14.744220870243932]
We propose to use lightweight character representations to encode fine-grained pronunciation features to improve contextual biasing.
We further integrate pretrained neural language model (NLM) based encoders to encode the utterance's semantic context.
Experiments using a Conformer Transducer model on the Librispeech dataset show a 4.62% - 9.26% relative WER improvement on different biasing list sizes.
arXiv Detail & Related papers (2023-05-09T08:51:44Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Space-Efficient Representation of Entity-centric Query Language Models [8.712427362992237]
We introduce a deterministic approximation to probabilistic grammars that avoids the explicit expansion of non-terminals at model creation time.
We obtain a 10% relative word error rate improvement on long tail entity queries compared to when a similarly-sized n-gram model is used.
arXiv Detail & Related papers (2022-06-29T19:59:50Z) - Towards Contextual Spelling Correction for Customization of End-to-end
Speech Recognition Systems [27.483603895258437]
We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system.
We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model.
Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
arXiv Detail & Related papers (2022-03-02T06:00:48Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.