Short-Term Word-Learning in a Dynamically Changing Environment
- URL: http://arxiv.org/abs/2203.15404v1
- Date: Tue, 29 Mar 2022 10:05:39 GMT
- Title: Short-Term Word-Learning in a Dynamically Changing Environment
- Authors: Christian Huber, Rishu Kumar, Ond\v{r}ej Bojar, Alexander Waibel
- Abstract summary: We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
- Score: 63.025297637716534
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural sequence-to-sequence automatic speech recognition (ASR) systems are in
principle open vocabulary systems, when using appropriate modeling units. In
practice, however, they often fail to recognize words not seen during training,
e.g., named entities, numbers or technical terms. To alleviate this problem,
Huber et al. proposed to supplement an end-to-end ASR system with a word/phrase
memory and a mechanism to access this memory to recognize the words and phrases
correctly. In this paper we study, a) methods to acquire important words for
this memory dynamically and, b) the trade-off between improvement in
recognition accuracy of new words and the potential danger of false alarms for
those added words. We demonstrate significant improvements in the detection
rate of new words with only a minor increase in false alarms (F1 score 0.30
$\rightarrow$ 0.80), when using an appropriate number of new words. In
addition, we show that important keywords can be extracted from supporting
documents and used effectively.
Related papers
- Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose an self-supervised continual learning approach to recognize new words.
We use a memory-enhanced Automatic Speech Recognition model from previous work.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - Open-vocabulary Keyword-spotting with Adaptive Instance Normalization [18.250276540068047]
We propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters.
We show significant improvements over recent keyword spotting and ASR baselines.
arXiv Detail & Related papers (2023-09-13T13:49:42Z) - SpellMapper: A non-autoregressive neural spellchecker for ASR
customization with candidate retrieval based on n-gram mappings [76.87664008338317]
Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition.
We propose a novel algorithm for candidate retrieval based on misspelled n-gram mappings.
Experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.
arXiv Detail & Related papers (2023-06-04T10:00:12Z) - Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End
Speech Recognition [21.61242091927018]
Out-Of-Vocabulary words, such as trending words and new named entities, pose problems to modern ASR systems.
We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words.
arXiv Detail & Related papers (2023-02-20T02:21:30Z) - Spell my name: keyword boosted speech recognition [25.931897154065663]
uncommon words such as names and technical terminology are important to understanding conversations in context.
We propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords.
The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions.
We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations.
arXiv Detail & Related papers (2021-10-06T14:16:57Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - Wake Word Detection with Alignment-Free Lattice-Free MMI [66.12175350462263]
Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input.
We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data.
We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures.
arXiv Detail & Related papers (2020-05-17T19:22:25Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.