Data and knowledge-driven approaches for multilingual training to
improve the performance of speech recognition systems of Indian languages
- URL: http://arxiv.org/abs/2201.09494v1
- Date: Mon, 24 Jan 2022 07:17:17 GMT
- Title: Data and knowledge-driven approaches for multilingual training to
improve the performance of speech recognition systems of Indian languages
- Authors: A. Madhavaraj, Ramakrishnan Angarai Ganesan
- Abstract summary: We propose data and knowledge-driven approaches for multilingual training of the automated speech recognition system for a target language.
In phone/senone mapping, deep neural network (DNN) learns to map senones or phones from one language to the others.
In the other approach, we model the acoustic information for all the languages simultaneously.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose data and knowledge-driven approaches for multilingual training of
the automated speech recognition (ASR) system for a target language by pooling
speech data from multiple source languages. Exploiting the acoustic
similarities between Indian languages, we implement two approaches. In
phone/senone mapping, deep neural network (DNN) learns to map senones or phones
from one language to the others, and the transcriptions of the source languages
are modified such that they can be used along with the target language data to
train and fine-tune the target language ASR system. In the other approach, we
model the acoustic information for all the languages simultaneously by training
a multitask DNN (MTDNN) to predict the senones of each language in different
output layers. The cross-entropy loss and the weight update procedure are
modified such that only the shared layers and the output layer responsible for
predicting the senone classes of a language are updated during training, if the
feature vector belongs to that particular language. In the low-resource setting
(LRS), 40 hours of transcribed data each for Tamil, Telugu and Gujarati
languages are used for training. The DNN based senone mapping technique gives
relative improvements in word error rates (WER) of 9.66%, 7.2% and 15.21% over
the baseline system for Tamil, Gujarati and Telugu languages, respectively. In
medium-resourced setting (MRS), 160, 275 and 135 hours of data for Tamil,
Kannada and Hindi languages are used, where, the same technique gives better
relative improvements of 13.94%, 10.28% and 27.24% for Tamil, Kannada and
Hindi, respectively. The MTDNN with senone mapping based training in LRS, gives
higher relative WER improvements of 15.0%, 17.54% and 16.06%, respectively for
Tamil, Gujarati and Telugu, whereas in MRS, we see improvements of 21.24%
21.05% and 30.17% for Tamil, Kannada and Hindi languages, respectively.
Related papers
- Predicting positive transfer for improved low-resource speech
recognition using acoustic pseudo-tokens [31.83988006684616]
We show that supplementing the target language with data from a similar, higher-resource 'donor' language can help.
For example, continued pre-training on only 10 hours of low-resource Punjabi supplemented with 60 hours of donor Hindi is almost as good as continued pretraining on 70 hours of Punjabi.
arXiv Detail & Related papers (2024-02-03T23:54:03Z) - cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in
Under-resourced Languages [0.0]
This paper describes our homophobia/transphobia in social media comments detection system developed as part of the shared task at LT-EDI-2024.
We took a transformer-based approach to develop our multiclass classification model for ten language conditions.
We introduced synthetic and organic instances of script-switched language data during domain adaptation to mirror the linguistic realities of social media language.
arXiv Detail & Related papers (2024-01-28T21:58:04Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - CLSRIL-23: Cross Lingual Speech Representations for Indic Languages [0.0]
CLSRIL-23 is a self supervised learning based model which learns cross lingual speech representations from raw audio across 23 Indic languages.
It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations.
We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining.
arXiv Detail & Related papers (2021-07-15T15:42:43Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - Facebook AI's WMT20 News Translation Task Submission [69.92594751788403]
This paper describes Facebook AI's submission to WMT20 shared news translation task.
We focus on the low resource setting and participate in two language pairs, Tamil -> English and Inuktitut -> English.
We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain.
arXiv Detail & Related papers (2020-11-16T21:49:00Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.