Towards One Model to Rule All: Multilingual Strategy for Dialectal
Code-Switching Arabic ASR
- URL: http://arxiv.org/abs/2105.14779v1
- Date: Mon, 31 May 2021 08:20:38 GMT
- Title: Towards One Model to Rule All: Multilingual Strategy for Dialectal
Code-Switching Arabic ASR
- Authors: Shammur Absar Chowdhury, Amir Hussein, Ahmed Abdelali, Ahmed Ali
- Abstract summary: We design a large multilingual end-to-end ASR using self-attention based conformer architecture.
We trained the system using Arabic (Ar), English (En) and French (Fr) languages.
Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
- Score: 11.363966269198064
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the advent of globalization, there is an increasing demand for
multilingual automatic speech recognition (ASR), handling language and
dialectal variation of spoken content. Recent studies show its efficacy over
monolingual systems. In this study, we design a large multilingual end-to-end
ASR using self-attention based conformer architecture. We trained the system
using Arabic (Ar), English (En) and French (Fr) languages. We evaluate the
system performance handling: (i) monolingual (Ar, En and Fr); (ii)
multi-dialectal (Modern Standard Arabic, along with dialectal variation such as
Egyptian and Moroccan); (iii) code-switching -- cross-lingual (Ar-En/Fr) and
dialectal (MSA-Egyptian dialect) test cases, and compare with current
state-of-the-art systems. Furthermore, we investigate the influence of
different embedding/character representations including character vs
word-piece; shared vs distinct input symbol per language. Our findings
demonstrate the strength of such a model by outperforming state-of-the-art
monolingual dialectal Arabic and code-switching Arabic ASR.
Related papers
- ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs [1.6381055567716192]
This paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems.
We focus on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic.
We present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma.
arXiv Detail & Related papers (2024-06-26T07:19:51Z) - ALDi: Quantifying the Arabic Level of Dialectness of Text [17.37857915257019]
We argue that Arabic speakers perceive a spectrum of dialectness, which we operationalize at the sentence level as the Arabic Level of Dialectness (ALDi)
We provide a detailed analysis of AOC-ALDi and show that a model trained on it can effectively identify levels of dialectness on a range of other corpora.
arXiv Detail & Related papers (2023-10-20T18:07:39Z) - VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System [16.420831300734697]
VoxArabica is a system for dialect identification (DID) and automatic speech recognition (ASR) of Arabic.
We train a wide range of models such as HuBERT (DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASR tasks.
We finetune our ASR models on MSA, Egyptian, Moroccan, and mixed data.
We integrate these models into a single web interface with diverse features such as audio recording, file upload, model selection, and the option to raise flags for incorrect outputs.
arXiv Detail & Related papers (2023-10-17T08:33:02Z) - Unify and Conquer: How Phonetic Feature Representation Affects Polyglot
Text-To-Speech (TTS) [3.57486761615991]
unified representations consistently achieves better cross-lingual synthesis with respect to both naturalness and accent.
Separate representations tend to have an order of magnitude more tokens than unified ones, which may affect model capacity.
arXiv Detail & Related papers (2022-07-04T16:14:57Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.