Multilingual and code-switching ASR challenges for low resource Indian
languages
- URL: http://arxiv.org/abs/2104.00235v1
- Date: Thu, 1 Apr 2021 03:37:01 GMT
- Title: Multilingual and code-switching ASR challenges for low resource Indian
languages
- Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa
Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi
Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali,
Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul
Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram and Basil Abraham
- Abstract summary: We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
- Score: 59.2906853285309
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, there is increasing interest in multilingual automatic speech
recognition (ASR) where a speech recognition system caters to multiple low
resource languages by taking advantage of low amounts of labeled corpora in
multiple languages. With multilingualism becoming common in today's world,
there has been increasing interest in code-switching ASR as well. In
code-switching, multiple languages are freely interchanged within a single
sentence or between sentences. The success of low-resource multilingual and
code-switching ASR often depends on the variety of languages in terms of their
acoustics, linguistic characteristics as well as the amount of data available
and how these are carefully considered in building the ASR system. In this
challenge, we would like to focus on building multilingual and code-switching
ASR systems through two different subtasks related to a total of seven Indian
languages, namely Hindi, Marathi, Odia, Tamil, Telugu, Gujarati and Bengali.
For this purpose, we provide a total of ~600 hours of transcribed speech data,
comprising train and test sets, in these languages including two code-switched
language pairs, Hindi-English and Bengali-English. We also provide a baseline
recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of
multilingual and code-switching subtasks, respectively.
Related papers
- DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common
Label Set [0.0]
Common Label Set ( CLS) maps graphemes of various languages with similar sounds to common labels.
Since Indian languages are mostly phonetic, building a transliteration to convert from native script to CLS is easy.
We propose a novel architecture called Multilingual-Decoder-Decoder for building multilingual systems.
arXiv Detail & Related papers (2022-10-30T04:01:26Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Dual Script E2E framework for Multilingual and Code-Switching ASR [4.697788649564087]
We train multilingual and code-switching ASR systems for Indian languages.
Inspired by results in text-to-speech synthesis, we use an in-house rule-based common label set ( CLS) representation.
We show our results on the multilingual and code-switching tasks of the Indic ASR Challenge 2021.
arXiv Detail & Related papers (2021-06-02T18:08:27Z) - Exploiting Spectral Augmentation for Code-Switched Spoken Language
Identification [2.064612766965483]
We perform spoken LID on three Indian languages code-mixed with English.
This task was organized by the Microsoft research team as a spoken LID challenge.
arXiv Detail & Related papers (2020-10-14T14:37:03Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.