Improving Speech Recognition for Indic Languages using Language Model
- URL: http://arxiv.org/abs/2203.16595v1
- Date: Wed, 30 Mar 2022 18:22:12 GMT
- Title: Improving Speech Recognition for Indic Languages using Language Model
- Authors: Ankur Dhuriya, Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah,
Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan
- Abstract summary: We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages.
We fine-tune wav2vec $2.0$ models for $18$ Indic languages and adjust the results with language models trained on text derived from a variety of sources.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the effect of applying a language model (LM) on the output of
Automatic Speech Recognition (ASR) systems for Indic languages. We fine-tune
wav2vec $2.0$ models for $18$ Indic languages and adjust the results with
language models trained on text derived from a variety of sources. Our findings
demonstrate that the average Character Error Rate (CER) decreases by over $28$
\% and the average Word Error Rate (WER) decreases by about $36$ \% after
decoding with LM. We show that a large LM may not provide a substantial
improvement as compared to a diverse one. We also demonstrate that high quality
transcriptions can be obtained on domain-specific data without retraining the
ASR model and show results on biomedical domain.
Related papers
- Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem [4.830018386227]
This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline.
We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of information retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora.
arXiv Detail & Related papers (2024-06-21T20:02:22Z) - A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
Given a general language model and its aligned version, there exists a trade-off between the average reward and average log-likelihood of the strings under the general language model.
We provide a formal treatment of this issue and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation [79.96416609433724]
Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
arXiv Detail & Related papers (2023-09-28T17:02:36Z) - Improving Multilingual Neural Machine Translation System for Indic
Languages [0.0]
We propose a multilingual neural machine translation (MNMT) system to address the issues related to low-resource language translation.
A state-of-the-art transformer architecture is used to realize the proposed model.
Trials over a good amount of data reveal its superiority over the conventional models.
arXiv Detail & Related papers (2022-09-27T09:51:56Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Is Word Error Rate a good evaluation metric for Speech Recognition in
Indic Languages? [0.0]
We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR)
This new metric is for languages that contain half characters and where the same character can be written in different forms.
We implement our methodology in Hindi which is one of the main languages from Indic context.
arXiv Detail & Related papers (2022-03-30T18:32:08Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - CLSRIL-23: Cross Lingual Speech Representations for Indic Languages [0.0]
CLSRIL-23 is a self supervised learning based model which learns cross lingual speech representations from raw audio across 23 Indic languages.
It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations.
We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining.
arXiv Detail & Related papers (2021-07-15T15:42:43Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.