Related papers: Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

URL: http://arxiv.org/abs/2003.09024v1
Date: Thu, 19 Mar 2020 21:24:45 GMT
Title: Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems
Authors: Nikolay Malkovsky, Vladimir Bataev, Dmitrii Sviridkin, Natalia Kizhaeva, Aleksandr Laptev, Ildar Valiev, Oleg Petrov
Abstract summary: The problem of out of vocabulary words (OOV) is typical for any speech recognition system. One of the popular approach to cover OOVs is to use subword units rather then words. In this paper we explore different existing methods of this solution on both graph construction and search method levels.
Score: 54.49880724137688
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The problem of out of vocabulary words (OOV) is typical for any speech recognition system, hybrid systems are usually constructed to recognize a fixed set of words and rarely can include all the words that will be encountered during exploitation of the system. One of the popular approach to cover OOVs is to use subword units rather then words. Such system can potentially recognize any previously unseen word if the word can be constructed from present subword units, but also non-existing words can be recognized. The other popular approach is to modify HMM part of the system so that it can be easily and effectively expanded with custom set of words we want to add to the system. In this paper we explore different existing methods of this solution on both graph construction and search method levels. We also present a novel vocabulary expansion techniques which solve some common internal subroutine problems regarding recognition graph processing.

Related papers

Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition [56.972851337263755]
We propose a method which allows corrections of substitution errors to improve the recognition accuracy of challenging words.<n>We show that with this method we get a relative improvement in biased word error rate of up to 11%, while maintaining a competitive overall word error rate.
arXiv Detail & Related papers (2025-06-23T14:42:03Z)
Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora. We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling. This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
arXiv Detail & Related papers (2024-06-12T16:30:58Z)
Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z)
Spell my name: keyword boosted speech recognition [25.931897154065663]
uncommon words such as names and technical terminology are important to understanding conversations in context. We propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords. The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions. We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations.
arXiv Detail & Related papers (2021-10-06T14:16:57Z)
A Comparison of Methods for OOV-word Recognition on a New Public Dataset [0.0]
We propose using the CommonVoice dataset to create test sets for languages with a high out-of-vocabulary ratio. We then evaluate, within the context of a hybrid ASR system, how much better subword models are at recognizing OOVs. We propose a new method for modifying a subword-based language model so as to better recognize OOV-words.
arXiv Detail & Related papers (2021-07-16T19:39:30Z)
LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models. This is achieved by combining contextual information with knowledge from structured lexical resources. Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z)
Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z)
Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change. We show that our method can be used for the detection of semantic change with any alignment method. We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z)
Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z)
SubICap: Towards Subword-informed Image Captioning [37.42085521950802]
We decompose words into smaller constituent units'subwords' and represent captions as a sequence of subwords instead of words. Our captioning system improves various metric scores, with a training vocabulary size approximately 90% less than the baseline.
arXiv Detail & Related papers (2020-12-24T06:10:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.