Investigations on Speech Recognition Systems for Low-Resource Dialectal
Arabic-English Code-Switching Speech
- URL: http://arxiv.org/abs/2108.12881v1
- Date: Sun, 29 Aug 2021 17:23:30 GMT
- Title: Investigations on Speech Recognition Systems for Low-Resource Dialectal
Arabic-English Code-Switching Speech
- Authors: Injy Hamed, Pavel Denisov, Chia-Yu Li, Mohamed Elmahdy, Slim
Abdennadher, Ngoc Thang Vu
- Abstract summary: We present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR)
We build our ASR systems using DNN-based hybrid and Transformer-based end-to-end models.
We show that recognition can be improved by combining the outputs of both systems.
- Score: 32.426525641734344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-switching (CS), defined as the mixing of languages in conversations, has
become a worldwide phenomenon. The prevalence of CS has been recently met with
a growing demand and interest to build CS ASR systems. In this paper, we
present our work on code-switched Egyptian Arabic-English automatic speech
recognition (ASR). We first contribute in filling the huge gap in resources by
collecting, analyzing and publishing our spontaneous CS Egyptian Arabic-English
speech corpus. We build our ASR systems using DNN-based hybrid and
Transformer-based end-to-end models. In this paper, we present a thorough
comparison between both approaches under the setting of a low-resource,
orthographically unstandardized, and morphologically rich language pair. We
show that while both systems give comparable overall recognition results, each
system provides complementary sets of strength points. We show that recognition
can be improved by combining the outputs of both systems. We propose several
effective system combination approaches, where hypotheses of both systems are
merged on sentence- and word-levels. Our approaches result in overall WER
relative improvement of 4.7%, over a baseline performance of 32.1% WER. In the
case of intra-sentential CS sentences, we achieve WER relative improvement of
4.8%. Our best performing system achieves 30.6% WER on ArzEn test set.
Related papers
- MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Speech collage: code-switched audio generation by collaging monolingual
corpora [50.356820349870986]
Speech Collage is a method that synthesizes CS data from monolingual corpora by splicing audio segments.
We investigate the impact of generated data on speech recognition in two scenarios.
arXiv Detail & Related papers (2023-09-27T14:17:53Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z) - Arabic Code-Switching Speech Recognition using Monolingual Data [13.513655231184261]
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization.
Recent research in multilingual ASR shows potential improvement over monolingual systems.
We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments.
arXiv Detail & Related papers (2021-07-04T08:40:49Z) - Towards One Model to Rule All: Multilingual Strategy for Dialectal
Code-Switching Arabic ASR [11.363966269198064]
We design a large multilingual end-to-end ASR using self-attention based conformer architecture.
We trained the system using Arabic (Ar), English (En) and French (Fr) languages.
Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
arXiv Detail & Related papers (2021-05-31T08:20:38Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - You Do Not Need More Data: Improving End-To-End Speech Recognition by
Text-To-Speech Data Augmentation [59.31769998728787]
We build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition model.
Our system establishes a competitive result for end-to-end ASR trained on LibriSpeech train-clean-100 set with WER 4.3% for test-clean and 13.5% for test-other.
arXiv Detail & Related papers (2020-05-14T17:24:57Z) - Semi-supervised Development of ASR Systems for Multilingual
Code-switched Speech in Under-resourced Languages [19.569525304938033]
Two approaches are considered for under-resourced, code-switched speech in five South African languages.
The first constructs four separate bilingual automatic speech recognisers corresponding to four different language pairs.
The second uses a single, unified, five-lingual ASR system that represents all the languages.
arXiv Detail & Related papers (2020-03-06T11:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.