Spell my name: keyword boosted speech recognition
- URL: http://arxiv.org/abs/2110.02791v1
- Date: Wed, 6 Oct 2021 14:16:57 GMT
- Title: Spell my name: keyword boosted speech recognition
- Authors: Namkyu Jung, Geonmin Kim, Joon Son Chung
- Abstract summary: uncommon words such as names and technical terminology are important to understanding conversations in context.
We propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords.
The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions.
We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations.
- Score: 25.931897154065663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognition of uncommon words such as names and technical terminology is
important to understanding conversations in context. However, the ability to
recognise such words remains a challenge in modern automatic speech recognition
(ASR) systems.
In this paper, we propose a simple but powerful ASR decoding method that can
better recognise these uncommon keywords, which in turn enables better
readability of the results. The method boosts the probabilities of given
keywords in a beam search based on acoustic model predictions. The method does
not require any training in advance.
We demonstrate the effectiveness of our method on the LibriSpeeech test sets
and also internal data of real-world conversations. Our method significantly
boosts keyword accuracy on the test sets, while maintaining the accuracy of the
other words, and as well as providing significant qualitative improvements.
This method is applicable to other tasks such as machine translation, or
wherever unseen and difficult keywords need to be recognised in beam search.
Related papers
- Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions [5.50485371072671]
Our method improves the recognition accuracy of misrecognized target keywords by substituting intermediate CTC predictions with corrected labels.
Experiments conducted in Japanese demonstrated that our method successfully improved the F1 score for unknown words.
arXiv Detail & Related papers (2024-06-21T06:25:10Z) - Open-vocabulary Keyword-spotting with Adaptive Instance Normalization [18.250276540068047]
We propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters.
We show significant improvements over recent keyword spotting and ASR baselines.
arXiv Detail & Related papers (2023-09-13T13:49:42Z) - PWESuite: Phonetic Word Embeddings and Tasks They Facilitate [37.09948594297879]
We develop three methods that use articulatory features to build phonetically informed word embeddings.
We also contribute a task suite to fairly evaluate past, current, and future methods.
arXiv Detail & Related papers (2023-04-05T16:03:42Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - On Vocabulary Reliance in Scene Text Recognition [79.21737876442253]
Methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary.
We call this phenomenon "vocabulary reliance"
We propose a simple yet effective mutual learning strategy to allow models of two families to learn collaboratively.
arXiv Detail & Related papers (2020-05-08T11:16:58Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.