Dialectal Coverage And Generalization in Arabic Speech Recognition
- URL: http://arxiv.org/abs/2411.05872v1
- Date: Thu, 07 Nov 2024 22:23:30 GMT
- Title: Dialectal Coverage And Generalization in Arabic Speech Recognition
- Authors: Amirbek Djanibekov, Hawau Olamide Toyin, Raghad Alshalan, Abdullah Alitr, Hanan Aldarmaki,
- Abstract summary: This study explores the role of dialectal coverage in pre-training, the effectiveness of dialect-specific fine-tuning, and the ability to generalize to unseen dialects.
Through extensive experiments across different dialect combinations, our findings offer key insights towards advancing the development of ASR systems for pluricentric languages like Arabic.
- Score: 0.6757476692230007
- License:
- Abstract: Developing robust automatic speech recognition (ASR) systems for Arabic, a language characterized by its rich dialectal diversity and often considered a low-resource language in speech technology, demands effective strategies to manage its complexity. This study explores three critical factors influencing ASR performance: the role of dialectal coverage in pre-training, the effectiveness of dialect-specific fine-tuning compared to a multi-dialectal approach, and the ability to generalize to unseen dialects. Through extensive experiments across different dialect combinations, our findings offer key insights towards advancing the development of ASR systems for pluricentric languages like Arabic.
Related papers
- CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation [68.81271028921647]
We introduce CORAL, a benchmark designed to assess RAG systems in realistic multi-turn conversational settings.
CORAL includes diverse information-seeking conversations automatically derived from Wikipedia.
It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling.
arXiv Detail & Related papers (2024-10-30T15:06:32Z) - Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain [0.0]
This work is an attempt to introduce a comprehensive benchmark for Arabic speech recognition, specifically tailored to address the challenges of telephone conversations in Arabic language.
Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications.
arXiv Detail & Related papers (2024-03-07T07:24:32Z) - The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese [5.308321515594125]
This study is dedicated to a comprehensive exploration of the Whisper and MMS systems.
Our investigation encompasses various categories, including gender, age, skin tone color, and geo-location.
We empirically show that oversampling techniques alleviate such stereotypical biases.
arXiv Detail & Related papers (2024-02-12T09:35:13Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Exploring the Integration of Speech Separation and Recognition with
Self-Supervised Learning Representation [83.36685075570232]
This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end.
We explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model.
A proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set.
arXiv Detail & Related papers (2023-07-23T05:39:39Z) - MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech
Recognition [12.23416994447554]
We present a multi-lingual speech recognition network named Mixture-of-Language-Expert(MoLE)
MoLE analyzes linguistic expression from input speech in arbitrary languages, activating a language-specific expert with a lightweight language tokenizer.
Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding.
arXiv Detail & Related papers (2023-02-27T13:26:17Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech
Recognition [80.87085897419982]
We propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM.
Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously.
The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.
arXiv Detail & Related papers (2022-05-06T06:07:09Z) - Towards One Model to Rule All: Multilingual Strategy for Dialectal
Code-Switching Arabic ASR [11.363966269198064]
We design a large multilingual end-to-end ASR using self-attention based conformer architecture.
We trained the system using Arabic (Ar), English (En) and French (Fr) languages.
Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
arXiv Detail & Related papers (2021-05-31T08:20:38Z) - Accented Speech Recognition: A Survey [0.0]
We present a survey of current promising approaches to accented speech recognition.
The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
arXiv Detail & Related papers (2021-04-21T20:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.