Related papers: Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic

URL: http://arxiv.org/abs/2507.13977v1
Date: Fri, 18 Jul 2025 14:42:18 GMT
Title: Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
Authors: Lilit Grigoryan, Nikolay Karpov, Enas Albasiri, Vitaly Lavrukhin, Boris Ginsburg,
Abstract summary: We introduce a universal methodology for Arabic speech and text processing designed to address unique challenges of the language.<n>We train two novel models based on the FastConformer architecture: one designed specifically for Modern Standard Arabic (MSA) and the other, the first unified public model for both MSA and Classical Arabic (CA)<n>The MSA model sets a new benchmark with state-of-the-art (SOTA) performance on related datasets, while the unified model achieves SOTA accuracy with diacritics for CA while maintaining strong performance for MSA.
Score: 15.807843278492847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite Arabic being one of the most widely spoken languages, the development of Arabic Automatic Speech Recognition (ASR) systems faces significant challenges due to the language's complexity, and only a limited number of public Arabic ASR models exist. While much of the focus has been on Modern Standard Arabic (MSA), there is considerably less attention given to the variations within the language. This paper introduces a universal methodology for Arabic speech and text processing designed to address unique challenges of the language. Using this methodology, we train two novel models based on the FastConformer architecture: one designed specifically for MSA and the other, the first unified public model for both MSA and Classical Arabic (CA). The MSA model sets a new benchmark with state-of-the-art (SOTA) performance on related datasets, while the unified model achieves SOTA accuracy with diacritics for CA while maintaining strong performance for MSA. To promote reproducibility, we open-source the models and their training recipes.

Related papers

SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System [0.995313069446686]
This paper introduces textbfSHAMI-MT, a bidirectional machine translation system specifically engineered to bridge the communication gap between Modern Standard Arabic (MSA) and the Syrian dialect.<n>We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture.<n>Our MSA-to-Shami model achieved an outstanding average quality score of textbf4.01 out of 5.0 when judged by OPENAI model GPT-4.1.
arXiv Detail & Related papers (2025-08-04T10:21:11Z)
Efficient Multilingual ASR Finetuning via LoRA Language Experts [59.27778147311189]
This paper proposes an efficient finetuning framework for customized multilingual ASR via prepared LoRA language experts based on Whisper.<n>Through LoRA expert fusion or knowledge distillation, our approach achieves better recognition performance on target languages than standard fine-tuning methods.<n> Experimental results demonstrate that the proposed models yield approximately 10% and 15% relative performance gains in language-aware and language-agnostic scenarios.
arXiv Detail & Related papers (2025-06-11T07:06:27Z)
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning [0.0]
We employ weakly supervised learning to train an Arabic ASR model using the Conformer architecture.<n>Our model is trained from scratch on 15,000 hours of weakly annotated speech data covering both Modern Standard Arabic (MSA) and Dialectal Arabic (DA)
arXiv Detail & Related papers (2025-04-16T17:05:14Z)
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion [55.27025066199226]
This paper addresses the need for democratizing large language models (LLM) in the Arab world.<n>One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding.<n>Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion.
arXiv Detail & Related papers (2024-12-16T19:29:06Z)
Dialectal Coverage And Generalization in Arabic Speech Recognition [0.6757476692230007]
Existing ASR systems fall short in coverage and generalization across the multitude of spoken variants.<n>Code-switching with English and French is also common in different regions of the Arab world.<n>We introduce a suite of ASR models optimized to effectively recognize multiple variants of spoken Arabic.
arXiv Detail & Related papers (2024-11-07T22:23:30Z)
ALLaM: Large Language Models for Arabic and English [9.881560166505452]
We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT) Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining can steer a model towards a new language (Arabic) without any catastrophic forgetting in the original language (English) We show that extensive alignment with human preferences can significantly enhance the performance of a language model compared to models of a larger scale with lower quality alignment.
arXiv Detail & Related papers (2024-07-22T05:35:17Z)
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic [51.922112625469836]
We present datasetname, the first multi-task language understanding benchmark for the Arabic language. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA) and is carefully constructed by collaborating with native speakers in the region. Our evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models.
arXiv Detail & Related papers (2024-02-20T09:07:41Z)
Language Models as a Service: Overview of a New Paradigm and its Challenges [47.75762014254756]
Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or programming. This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, replicability, reliability, and trustworthiness of LM interfaces. On the other hand, it serves as a comprehensive resource for existing knowledge on current, major LM, offering a synthesized overview of the licences and capabilities their interfaces offer.
arXiv Detail & Related papers (2023-09-28T16:29:52Z)
AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z)
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models [57.76998376458017]
We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs) The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts. We provide a detailed description of the training, the tuning, the safety alignment, and the evaluation of the models.
arXiv Detail & Related papers (2023-08-30T17:07:17Z)
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers [63.151811561972515]
State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages. We propose an approach to adapt USMs for multi-talker ASR. We first develop an enhanced version of serialized output training to jointly perform multi-talker ASR and utterance timestamp prediction.
arXiv Detail & Related papers (2023-05-30T05:05:52Z)
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR [11.363966269198064]
We design a large multilingual end-to-end ASR using self-attention based conformer architecture. We trained the system using Arabic (Ar), English (En) and French (Fr) languages. Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
arXiv Detail & Related papers (2021-05-31T08:20:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.