Context-based out-of-vocabulary word recovery for ASR systems in Indian
languages
- URL: http://arxiv.org/abs/2206.04305v1
- Date: Thu, 9 Jun 2022 06:51:31 GMT
- Title: Context-based out-of-vocabulary word recovery for ASR systems in Indian
languages
- Authors: Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale,
Sharath Adavanne, Nagaraj Adiga
- Abstract summary: We propose a post-processing technique to improve the performance of context-based OOV recovery.
The effectiveness of the proposed cost function is evaluated at both word-level and sentence-level.
- Score: 5.930734371401316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting and recovering out-of-vocabulary (OOV) words is always challenging
for Automatic Speech Recognition (ASR) systems. Many existing methods focus on
modeling OOV words by modifying acoustic and language models and integrating
context words cleverly into models. To train such complex models, we need a
large amount of data with context words, additional training time, and
increased model size. However, after getting the ASR transcription to recover
context-based OOV words, the post-processing method has not been explored much.
In this work, we propose a post-processing technique to improve the performance
of context-based OOV recovery. We created an acoustically boosted language
model with a sub-graph made at phone level with an OOV words list. We proposed
two methods to determine a suitable cost function to retrieve the OOV words
based on the context. The cost function is defined based on phonetic and
acoustic knowledge for matching and recovering the correct context words in the
decode. The effectiveness of the proposed cost function is evaluated at both
word-level and sentence-level. The evaluation results show that this approach
can recover an average of 50% context-based OOV words across multiple
categories.
Related papers
- Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation [27.057810339120664]
We propose two techniques to improve context-aware ASR models.
On LibriSpeech, our techniques together reduce the rare word error rate by 60% and 25% relatively compared to no biasing and shallow fusion.
On SPGISpeech and a real-world dataset ConEC, our techniques also yield good improvements over the baselines.
arXiv Detail & Related papers (2024-07-14T19:32:33Z) - Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End
Speech Recognition [21.61242091927018]
Out-Of-Vocabulary words, such as trending words and new named entities, pose problems to modern ASR systems.
We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words.
arXiv Detail & Related papers (2023-02-20T02:21:30Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - A Comparison of Methods for OOV-word Recognition on a New Public Dataset [0.0]
We propose using the CommonVoice dataset to create test sets for languages with a high out-of-vocabulary ratio.
We then evaluate, within the context of a hybrid ASR system, how much better subword models are at recognizing OOVs.
We propose a new method for modifying a subword-based language model so as to better recognize OOV-words.
arXiv Detail & Related papers (2021-07-16T19:39:30Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Deep learning models for representing out-of-vocabulary words [1.4502611532302039]
We present a performance evaluation of deep learning models for representing out-of-vocabulary (OOV) words.
Although the best technique for handling OOV words is different for each task, Comick, a deep learning method that infers the embedding based on the context and the morphological structure of the OOV word, obtained promising results.
arXiv Detail & Related papers (2020-07-14T19:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.