Contextualized Automatic Speech Recognition with Attention-Based Bias
Phrase Boosted Beam Search
- URL: http://arxiv.org/abs/2401.10449v1
- Date: Fri, 19 Jan 2024 01:36:07 GMT
- Title: Contextualized Automatic Speech Recognition with Attention-Based Bias
Phrase Boosted Beam Search
- Authors: Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji
Watanabe
- Abstract summary: This paper proposes an attention-based contextual biasing method that can be customized using an editable phrase list.
The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.
- Score: 44.94458898538114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end (E2E) automatic speech recognition (ASR) methods exhibit
remarkable performance. However, since the performance of such methods is
intrinsically linked to the context present in the training data, E2E-ASR
methods do not perform as desired for unseen user contexts (e.g., technical
terms, personal names, and playlists). Thus, E2E-ASR methods must be easily
contextualized by the user or developer. This paper proposes an attention-based
contextual biasing method that can be customized using an editable phrase list
(referred to as a bias list). The proposed method can be trained effectively by
combining a bias phrase index loss and special tokens to detect the bias
phrases in the input speech data. In addition, to improve the contextualization
performance during inference further, we propose a bias phrase boosted (BPB)
beam search algorithm based on the bias phrase index probability. Experimental
results demonstrate that the proposed method consistently improves the word
error rate and the character error rate of the target phrases in the bias list
on both the Librispeech-960 (English) and our in-house (Japanese) dataset,
respectively.
Related papers
- XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition [9.03519622415822]
This study introduces a Cross-lingual Contextual Biasing(XCB) module.
We augment a pre-trained ASR model for the dominant language by integrating an auxiliary language biasing module and a language-specific loss.
Experimental results conducted on our in-house code-switching dataset have validated the efficacy of our approach.
arXiv Detail & Related papers (2024-08-20T04:00:19Z) - Contextualized Automatic Speech Recognition with Dynamic Vocabulary [41.892863381787684]
This paper proposes a dynamic vocabulary where bias tokens can be added during inference.
Each entry in a bias list is represented as a single token, unlike a sequence of existing subword tokens.
Experimental results demonstrate that the proposed method improves the bias phrase WER on English and Japanese datasets by 3.1 -- 4.9 points.
arXiv Detail & Related papers (2024-05-22T05:03:39Z) - Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm [45.42075576656938]
Contextual biasing refers to the problem of biasing automatic speech recognition systems towards rare entities.
We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching.
arXiv Detail & Related papers (2023-09-29T22:50:10Z) - Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR)
The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task.
We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Wake Word Detection with Alignment-Free Lattice-Free MMI [66.12175350462263]
Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input.
We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data.
We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures.
arXiv Detail & Related papers (2020-05-17T19:22:25Z) - Fast and Robust Unsupervised Contextual Biasing for Speech Recognition [16.557586847398778]
We propose an alternative approach that does not entail explicit contextual language model.
We derive the bias score for every word in the system vocabulary from the training corpus.
We show significant improvement in recognition accuracy when the relevant context is available.
arXiv Detail & Related papers (2020-05-04T17:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.