Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
- URL: http://arxiv.org/abs/2310.00178v1
- Date: Fri, 29 Sep 2023 22:50:10 GMT
- Title: Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
- Authors: Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe
Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng,
Ding Zhao, Tara Sainath, Pedro Moreno Mengibar
- Abstract summary: Contextual biasing refers to the problem of biasing automatic speech recognition systems towards rare entities.
We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching.
- Score: 45.42075576656938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextual biasing refers to the problem of biasing the automatic speech
recognition (ASR) systems towards rare entities that are relevant to the
specific user or application scenarios. We propose algorithms for contextual
biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During
beam search, we boost the score of a token extension if it extends matching
into a set of biasing phrases. Our method simulates the classical approaches
often implemented in the weighted finite state transducer (WFST) framework, but
avoids the FST language altogether, with careful considerations on memory
footprint and efficiency on tensor processing units (TPUs) by vectorization.
Without introducing additional model parameters, our method achieves
significant word error rate (WER) reductions on biasing test sets by itself,
and yields further performance gain when combined with a model-based biasing
method.
Related papers
- Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval [18.333752341467083]
The biasing mechanism is typically based on a cross-attention module between the audio and a catalogue of biasing entries.
This work proposes an approximation to cross-attention scoring based on vector quantization.
We show that retrieval based shortlisting allows the system to efficiently leverage biasing catalogues of several thousands of entries.
arXiv Detail & Related papers (2024-11-01T15:28:03Z) - LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR [3.841280537264271]
We propose a light on-the-fly method to improve automatic speech recognition performance.
We combine a bias list of named entities with a word-level n-gram language model with the shallow fusion approach based on the Aho-Corasick string matching algorithm.
We achieve up to 21.6% relative improvement in the general word error rate with no practical difference in the inverse real-time factor.
arXiv Detail & Related papers (2024-09-20T13:53:37Z) - Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications [5.266869303483375]
The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR)
We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER.
We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation.
arXiv Detail & Related papers (2024-08-28T08:14:51Z) - Contextualized Automatic Speech Recognition with Attention-Based Bias
Phrase Boosted Beam Search [44.94458898538114]
This paper proposes an attention-based contextual biasing method that can be customized using an editable phrase list.
The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.
arXiv Detail & Related papers (2024-01-19T01:36:07Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR)
The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task.
We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Towards Contextual Spelling Correction for Customization of End-to-end
Speech Recognition Systems [27.483603895258437]
We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system.
We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model.
Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
arXiv Detail & Related papers (2022-03-02T06:00:48Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.