Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
- URL: http://arxiv.org/abs/2509.00094v1
- Date: Wed, 27 Aug 2025 15:28:46 GMT
- Title: Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
- Authors: Abdullah Abdelfattah, Mahmoud I. Khalil, Hazem Abbas,
- Abstract summary: We build a 98% automated pipeline to produce high-quality Quranic datasets.<n>We use our custom Quran Phonetic Script to encode Tajweed rules.<n>We release all code, data, and models as open-source.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Assessing spoken language is challenging, and quantifying pronunciation metrics for machine learning models is even harder. However, for the Holy Quran, this task is simplified by the rigorous recitation rules (tajweed) established by Muslim scholars, enabling highly effective assessment. Despite this advantage, the scarcity of high-quality annotated data remains a significant barrier. In this work, we bridge these gaps by introducing: (1) A 98% automated pipeline to produce high-quality Quranic datasets -- encompassing: Collection of recitations from expert reciters, Segmentation at pause points (waqf) using our fine-tuned wav2vec2-BERT model, Transcription of segments, Transcript verification via our novel Tasmeea algorithm; (2) 850+ hours of audio (~300K annotated utterances); (3) A novel ASR-based approach for pronunciation error detection, utilizing our custom Quran Phonetic Script (QPS) to encode Tajweed rules (unlike the IPA standard for Modern Standard Arabic). QPS uses a two-level script: (Phoneme level): Encodes Arabic letters with short/long vowels. (Sifa level): Encodes articulation characteristics of every phoneme. We further include comprehensive modeling with our novel multi-level CTC Model which achieved 0.16% average Phoneme Error Rate (PER) on the testset. We release all code, data, and models as open-source: https://obadx.github.io/prepare-quran-dataset/
Related papers
- Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran [1.3481884955361023]
Quran MD is a comprehensive dataset of the Quran that integrates textual, linguistic, and audio dimensions at the verse and word levels.<n>This dataset supports various applications, including natural language processing, speech recognition, text-to-speech synthesis, linguistic analysis, and digital Islamic studies.
arXiv Detail & Related papers (2026-01-25T15:23:37Z) - AHELM: A Holistic Evaluation of Audio-Language Models [78.20477815156484]
multimodal audio-language models (ALMs) take interleaved audio and text as input and output text.<n>AHELM is a benchmark that aggregates various datasets -- including 2 new synthetic audio-text datasets called PARADE and CoRe-Bench.<n>We also standardize the prompts, inference parameters, and evaluation metrics to ensure equitable comparisons across models.
arXiv Detail & Related papers (2025-08-29T07:40:39Z) - Few-Shot Prompting for Extractive Quranic QA with Instruction-Tuned LLMs [1.0124625066746595]
It addresses challenges related to complex language, unique terminology, and deep meaning in the text.<n>The second uses few-shot prompting with instruction-tuned large language models such as Gemini and DeepSeek.<n>A specialized Arabic prompt framework is developed for span extraction.
arXiv Detail & Related papers (2025-08-08T08:02:59Z) - Cross-Language Approach for Quranic QA [1.0124625066746595]
The Quranic QA system holds significant importance as it facilitates a deeper understanding of the Quran, a Holy text for over a billion people worldwide.<n>These systems face unique challenges, including the linguistic disparity between questions written in Modern Standard Arabic and answers found in Quranic verses written in Classical Arabic.<n>We adopt a cross-language approach by expanding and enriching the dataset through machine translation to convert Arabic questions into English, paraphrasing questions to create linguistic diversity, and retrieving answers from an English translation of the Quran to align with multilingual training requirements.
arXiv Detail & Related papers (2025-01-29T07:13:27Z) - Localizing Factual Inconsistencies in Attributable Text Generation [74.11403803488643]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.<n>We show that QASemConsistency yields factual consistency scores that correlate well with human judgments.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment [101.2489492032816]
VALL-E R is a robust and efficient zero-shot Text-to-Speech system.
This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia.
arXiv Detail & Related papers (2024-06-12T04:09:44Z) - Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers [1.2124551005857038]
This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers.
We use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets.
We have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries.
arXiv Detail & Related papers (2024-05-04T14:29:05Z) - Speech collage: code-switched audio generation by collaging monolingual
corpora [50.356820349870986]
Speech Collage is a method that synthesizes CS data from monolingual corpora by splicing audio segments.
We investigate the impact of generated data on speech recognition in two scenarios.
arXiv Detail & Related papers (2023-09-27T14:17:53Z) - Mispronunciation Detection of Basic Quranic Recitation Rules using Deep
Learning [0.0]
In Islam, readers must apply a set of pronunciation rules called Tajweed rules to recite the Quran.
The number of Tajweed teachers is not enough nowadays for daily recitation practice for every Muslim.
We propose a solution that consists of Mel-Frequency Cepstral Coefficient (MFCC) features with Long Short-Term Memory (LSTM) neural networks which use the time series.
arXiv Detail & Related papers (2023-05-10T19:31:25Z) - Quran Recitation Recognition using End-to-End Deep Learning [0.0]
The Quran is the holy scripture of Islam, and its recitation is an important aspect of the religion.
Recognizing the recitation of the Holy Quran automatically is a challenging task due to its unique rules.
We propose a novel end-to-end deep learning model for recognizing the recitation of the Holy Quran.
arXiv Detail & Related papers (2023-05-10T18:40:01Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.