Related papers: Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran

Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran

URL: http://arxiv.org/abs/2601.17880v1
Date: Sun, 25 Jan 2026 15:23:37 GMT
Title: Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran
Authors: Muhammad Umar Salman, Mohammad Areeb Qazi, Mohammed Talha Alam,
Abstract summary: Quran MD is a comprehensive dataset of the Quran that integrates textual, linguistic, and audio dimensions at the verse and word levels.<n>This dataset supports various applications, including natural language processing, speech recognition, text-to-speech synthesis, linguistic analysis, and digital Islamic studies.
Score: 1.3481884955361023
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We present Quran MD, a comprehensive multimodal dataset of the Quran that integrates textual, linguistic, and audio dimensions at the verse and word levels. For each verse (ayah), the dataset provides its original Arabic text, English translation, and phonetic transliteration. To capture the rich oral tradition of Quranic recitation, we include verse-level audio from 32 distinct reciters, reflecting diverse recitation styles and dialectical nuances. At the word level, each token is paired with its corresponding Arabic script, English translation, transliteration, and an aligned audio recording, allowing fine-grained analysis of pronunciation, phonology, and semantic context. This dataset supports various applications, including natural language processing, speech recognition, text-to-speech synthesis, linguistic analysis, and digital Islamic studies. Bridging text and audio modalities across multiple reciters, this dataset provides a unique resource to advance computational approaches to Quranic recitation and study. Beyond enabling tasks such as ASR, tajweed detection, and Quranic TTS, it lays the foundation for multimodal embeddings, semantic retrieval, style transfer, and personalized tutoring systems that can support both research and community applications. The dataset is available at https://huggingface.co/datasets/Buraaq/quran-audio-text-dataset

Related papers

Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition [0.0]
This study proposes a transformer-based multimodal framework for Arabic phoneme mispronunciation detection.<n>The framework integrates UniSpeech-derived acoustic embeddings with BERT-based textual embeddings extracted from Whisper transcriptions.<n>The study contributes to the development of intelligent, speaker-independent, and multimodal Computer-Aided Language Learning (CALL) systems.
arXiv Detail & Related papers (2025-11-21T18:25:46Z)
Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning [0.0]
We build a 98% automated pipeline to produce high-quality Quranic datasets.<n>We use our custom Quran Phonetic Script to encode Tajweed rules.<n>We release all code, data, and models as open-source.
arXiv Detail & Related papers (2025-08-27T15:28:46Z)
A computational system to handle the orthographic layer of tajwid in contemporary Quranic Orthography [0.0]
We explore the systematicity of the rules of tajwid, as they are encountered in the Cairo Quran.<n>We develop a python module that can remove or add the orthographic layer of tajwid from a Quranic text in CQO.
arXiv Detail & Related papers (2025-05-16T15:41:51Z)
Kimi-Audio Technical Report [67.69331679172303]
Kimi-Audio is an open-source audio foundation model that excels in audio understanding, generation, and conversation.<n>We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation.
arXiv Detail & Related papers (2025-04-25T15:31:46Z)
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations [65.59784436914548]
We introduce the Audio-Visual Speech Romanizer (AV-Romanizer), which learns language-agnostic speech representations by predicting Roman text.<n>We convert the predicted Roman text into language-specific graphemes, forming the proposed Cascaded Zero-AVSR.<n>To capture the wide spectrum of phonetic and linguistic diversity, we also introduce a Multilingual Audio-Visual Romanized Corpus (MARC)
arXiv Detail & Related papers (2025-03-08T16:40:13Z)
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z)
Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers [1.2124551005857038]
This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries.
arXiv Detail & Related papers (2024-05-04T14:29:05Z)
Quran Recitation Recognition using End-to-End Deep Learning [0.0]
The Quran is the holy scripture of Islam, and its recitation is an important aspect of the religion. Recognizing the recitation of the Holy Quran automatically is a challenging task due to its unique rules. We propose a novel end-to-end deep learning model for recognizing the recitation of the Holy Quran.
arXiv Detail & Related papers (2023-05-10T18:40:01Z)
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset [51.75617364782418]
This paper introduces a high-quality rich annotated Mandarin conversational (RAMC) speech dataset called MagicData-RAMC. The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz.
arXiv Detail & Related papers (2022-03-31T07:01:06Z)
Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning. Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness. We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.