Related papers: Building Robust and Scalable Multilingual ASR for Indian Languages

Building Robust and Scalable Multilingual ASR for Indian Languages

URL: http://arxiv.org/abs/2511.15418v1
Date: Wed, 19 Nov 2025 13:17:16 GMT
Title: Building Robust and Scalable Multilingual ASR for Indian Languages
Authors: Arjun Gangwar, Kaousheik Jayakumar, S. Umesh,
Abstract summary: This paper describes the systems developed by SPRING Lab, Indian Institute of Technology Madras, for the ASRU MADASR 2.0 challenge.<n>The systems developed focuses on adapting ASR systems to improve in predicting the language and dialect of the utterance among 8 languages across 33 dialects.
Score: 0.5352699766206809
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper describes the systems developed by SPRING Lab, Indian Institute of Technology Madras, for the ASRU MADASR 2.0 challenge. The systems developed focuses on adapting ASR systems to improve in predicting the language and dialect of the utterance among 8 languages across 33 dialects. We participated in Track 1 and Track 2, which restricts the use of additional data and develop from-the-scratch multilingual systems. We presented a novel training approach using Multi-Decoder architecture with phonemic Common Label Set (CLS) as intermediate representation. It improved the performance over the baseline (in the CLS space). We also discuss various methods used to retain the gain obtained in the phonemic space while converting them back to the corresponding grapheme representations. Our systems beat the baseline in 3 languages (Track 2) in terms of WER/CER and achieved the highest language ID and dialect ID accuracy among all participating teams (Track 2).

Related papers

Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively.<n>However, Whisper struggles with unseen languages, those not included in its pre-training.<n>We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.<n>Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.<n>We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z)
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z)
Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific. We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID) We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z)
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language. We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z)
BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling [52.99188200886738]
BiToD is the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base.
arXiv Detail & Related papers (2021-06-05T03:38:42Z)
Dual Script E2E framework for Multilingual and Code-Switching ASR [4.697788649564087]
We train multilingual and code-switching ASR systems for Indian languages. Inspired by results in text-to-speech synthesis, we use an in-house rule-based common label set ( CLS) representation. We show our results on the multilingual and code-switching tasks of the Indic ASR Challenge 2021.
arXiv Detail & Related papers (2021-06-02T18:08:27Z)
Multilingual and code-switching ASR challenges for low resource Indian languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages. We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z)
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification [2.064612766965483]
We perform spoken LID on three Indian languages code-mixed with English. This task was organized by the Microsoft research team as a spoken LID challenge.
arXiv Detail & Related papers (2020-10-14T14:37:03Z)
Streaming End-to-End Bilingual ASR Systems with Joint Language Identification [19.09014345299161]
We introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification. The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India.
arXiv Detail & Related papers (2020-07-08T05:00:25Z)
Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages [19.569525304938033]
Two approaches are considered for under-resourced, code-switched speech in five South African languages. The first constructs four separate bilingual automatic speech recognisers corresponding to four different language pairs. The second uses a single, unified, five-lingual ASR system that represents all the languages.
arXiv Detail & Related papers (2020-03-06T11:08:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.