Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search
- URL: http://arxiv.org/abs/2005.13827v2
- Date: Thu, 10 Sep 2020 12:33:57 GMT
- Title: Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search
- Authors: Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo
- Abstract summary: In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system.
Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity.
In this paper, we propose to interpolate the conventional n-gram models and the RNNLM approximation for better OOV recognition.
- Score: 17.492336084190658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words
not observed when training the speech recognition system. Using subword
language models (LMs) in the first-pass recognition makes it possible to
recognize the OOV words, but even the subword n-gram LMs suffer from data
sparsity. Recurrent Neural Network (RNN) LMs alleviate the sparsity problems
but are not suitable for first-pass recognition as such. One way to solve this
is to approximate the RNNLMs by back-off n-gram models. In this paper, we
propose to interpolate the conventional n-gram models and the RNNLM
approximation for better OOV recognition. Furthermore, we develop a new RNNLM
approximation method suitable for subword units: It produces variable-order
n-grams to include long-span approximations and considers also n-grams that
were not originally observed in the training corpus. To evaluate these models
on OOVs, we setup Arabic and Finnish Keyword Search tasks concentrating only on
OOV words. On these tasks, interpolating the baseline RNNLM approximation and a
conventional LM outperforms the conventional LM in terms of the Maximum Term
Weighted Value for single-character subwords. Moreover, replacing the baseline
approximation with the proposed method achieves the best performance on both
multi- and single-character subwords.
Related papers
- Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - Investigating the Effect of Language Models in Sequence Discriminative
Training for Neural Transducers [36.60689278751483]
We investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training.
Experimental results on Librispeech show that using the word-level LM in training outperforms the phoneme-level LM.
Our results reveal the pivotal importance of the hypothesis space quality in sequence discriminative training.
arXiv Detail & Related papers (2023-10-11T09:53:17Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Why do Nearest Neighbor Language Models Work? [93.71050438413121]
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context.
Retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore.
arXiv Detail & Related papers (2023-01-07T11:12:36Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z) - A Comparison of Methods for OOV-word Recognition on a New Public Dataset [0.0]
We propose using the CommonVoice dataset to create test sets for languages with a high out-of-vocabulary ratio.
We then evaluate, within the context of a hybrid ASR system, how much better subword models are at recognizing OOVs.
We propose a new method for modifying a subword-based language model so as to better recognize OOV-words.
arXiv Detail & Related papers (2021-07-16T19:39:30Z) - Deep learning models for representing out-of-vocabulary words [1.4502611532302039]
We present a performance evaluation of deep learning models for representing out-of-vocabulary (OOV) words.
Although the best technique for handling OOV words is different for each task, Comick, a deep learning method that infers the embedding based on the context and the morphological structure of the OOV word, obtained promising results.
arXiv Detail & Related papers (2020-07-14T19:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.