Dual Script E2E framework for Multilingual and Code-Switching ASR
- URL: http://arxiv.org/abs/2106.01400v1
- Date: Wed, 2 Jun 2021 18:08:27 GMT
- Title: Dual Script E2E framework for Multilingual and Code-Switching ASR
- Authors: Mari Ganesh Kumar, Jom Kuriakose, Anand Thyagachandran, Arun Kumar A,
Ashish Seth, Lodagala Durga Prasad, Saish Jaiswal, Anusha Prakash, Hema
Murthy
- Abstract summary: We train multilingual and code-switching ASR systems for Indian languages.
Inspired by results in text-to-speech synthesis, we use an in-house rule-based common label set ( CLS) representation.
We show our results on the multilingual and code-switching tasks of the Indic ASR Challenge 2021.
- Score: 4.697788649564087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: India is home to multiple languages, and training automatic speech
recognition (ASR) systems for languages is challenging. Over time, each
language has adopted words from other languages, such as English, leading to
code-mixing. Most Indian languages also have their own unique scripts, which
poses a major limitation in training multilingual and code-switching ASR
systems.
Inspired by results in text-to-speech synthesis, in this work, we use an
in-house rule-based phoneme-level common label set (CLS) representation to
train multilingual and code-switching ASR for Indian languages. We propose two
end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on
the CLS representation, and we use a novel data-driven back-end to recover the
native language script. In the second system, we propose a modification to the
E2E model, wherein the CLS representation and the native language characters
are used simultaneously for training. We show our results on the multilingual
and code-switching tasks of the Indic ASR Challenge 2021. Our best results
achieve 6% and 5% improvement (approx) in word error rate over the baseline
system for the multilingual and code-switching tasks, respectively, on the
challenge development data.
Related papers
- DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common
Label Set [0.0]
Common Label Set ( CLS) maps graphemes of various languages with similar sounds to common labels.
Since Indian languages are mostly phonetic, building a transliteration to convert from native script to CLS is easy.
We propose a novel architecture called Multilingual-Decoder-Decoder for building multilingual systems.
arXiv Detail & Related papers (2022-10-30T04:01:26Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - Transformer-Transducers for Code-Switched Speech Recognition [23.281314397784346]
We present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition.
First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching.
Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching.
arXiv Detail & Related papers (2020-11-30T17:27:41Z) - Streaming End-to-End Bilingual ASR Systems with Joint Language
Identification [19.09014345299161]
We introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification.
The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India.
arXiv Detail & Related papers (2020-07-08T05:00:25Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.