A two-stage transliteration approach to improve performance of a multilingual ASR
- URL: http://arxiv.org/abs/2410.14709v1
- Date: Wed, 09 Oct 2024 05:30:33 GMT
- Title: A two-stage transliteration approach to improve performance of a multilingual ASR
- Authors: Rohit Kumar,
- Abstract summary: This paper presents an approach to build a language-agnostic end-to-end model trained on a grapheme set.
We performed experiments with an end-to-end multilingual speech recognition system for two Indic languages.
- Score: 1.9511556030544333
- License:
- Abstract: End-to-end Automatic Speech Recognition (ASR) systems are rapidly claiming to become state-of-art over other modeling methods. Several techniques have been introduced to improve their ability to handle multiple languages. However, due to variation in writing scripts for different languages, while decoding acoustically similar units, they do not always map to an appropriate grapheme in the target language. This restricts the scalability and adaptability of the model while dealing with multiple languages in code-mixing scenarios. This paper presents an approach to build a language-agnostic end-to-end model trained on a grapheme set obtained by projecting the multilingual grapheme data to the script of a more generic target language. This approach saves the acoustic model from retraining to span over a larger space and can easily be extended to multiple languages. A two-stage transliteration process realizes this approach and proves to minimize speech-class confusion. We performed experiments with an end-to-end multilingual speech recognition system for two Indic Languages, namely Nepali and Telugu. The original grapheme space of these languages is projected to the Devanagari script. We achieved a relative reduction of 20% in the Word Error Rate (WER) and 24% in the Character Error Rate (CER) in the transliterated space, over other language-dependent modeling methods.
Related papers
- LangSAMP: Language-Script Aware Multilingual Pretraining [48.16511046793275]
Recent multilingual pretrained language models (mPLMs) often avoid using language embeddings.
LangSAMP incorporates both language and script embeddings to enhance representation learning.
We apply LangSAMP to the continual pretraining of XLM-R on a highly multilingual corpus covering more than 500 languages.
arXiv Detail & Related papers (2024-09-26T18:29:10Z) - Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Distilling a Pretrained Language Model to a Multilingual ASR Model [3.4012007729454816]
We distill the rich knowledge embedded inside a well-trained teacher text model to the student speech model.
We show the superiority of our method on 20 low-resource languages of the CommonVoice dataset with less than 100 hours of speech data.
arXiv Detail & Related papers (2022-06-25T12:36:11Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Language-agnostic Multilingual Modeling [23.06484126933893]
We build a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer.
We show with four Indic languages, namely, Hindi, Bengali, Tamil and Kannada, that the language-agnostic multilingual model achieves up to 10% relative reduction in Word Error Rate (WER) over a language-dependent multilingual model.
arXiv Detail & Related papers (2020-04-20T18:57:43Z) - Learning to Scale Multilingual Representations for Vision-Language Tasks [51.27839182889422]
The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date.
We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.
arXiv Detail & Related papers (2020-04-09T01:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.