Related papers: Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

URL: http://arxiv.org/abs/2405.02675v1
Date: Sat, 4 May 2024 14:29:05 GMT
Title: Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers
Authors: Raghad Salameh, Mohamad Al Mdfaa, Nursultan Askarbekuly, Manuel Mazzara,
Abstract summary: This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries.
Score: 1.2124551005857038
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and the expert judgments.

Related papers

Kimi-Audio Technical Report [67.69331679172303]
Kimi-Audio is an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation.
arXiv Detail & Related papers (2025-04-25T15:31:46Z)
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations [65.59784436914548]
We introduce the Audio-Visual Speech Romanizer (AV-Romanizer), which learns language-agnostic speech representations by predicting Roman text. We convert the predicted Roman text into language-specific graphemes, forming the proposed Cascaded Zero-AVSR. To capture the wide spectrum of phonetic and linguistic diversity, we also introduce a Multilingual Audio-Visual Romanized Corpus (MARC)
arXiv Detail & Related papers (2025-03-08T16:40:13Z)
Quran Recitation Recognition using End-to-End Deep Learning [0.0]
The Quran is the holy scripture of Islam, and its recitation is an important aspect of the religion. Recognizing the recitation of the Holy Quran automatically is a challenging task due to its unique rules. We propose a novel end-to-end deep learning model for recognizing the recitation of the Holy Quran.
arXiv Detail & Related papers (2023-05-10T18:40:01Z)
An ensemble-based framework for mispronunciation detection of Arabic phonemes [0.0]
This work introduces an ensemble model that defines the mispronunciation of Arabic phonemes. Experiment results demonstrate that the utilization of voting as an ensemble algorithm with Mel spectrogram feature extraction technique exhibits remarkable classification result with 95.9% of accuracy.
arXiv Detail & Related papers (2023-01-03T22:17:08Z)
ASR2K: Speech Recognition for Around 2000 Languages without Audio [100.41158814934802]
We present a speech recognition pipeline that does not require any audio for the target language. Our pipeline consists of three components: acoustic, pronunciation, and language models. We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database.
arXiv Detail & Related papers (2022-09-06T22:48:29Z)
TCE at Qur'an QA 2022: Arabic Language Question Answering Over Holy Qur'an Using a Post-Processed Ensemble of BERT-based Models [0.0]
Arabic is the language of the Holy Qur'an; the sacred text for 1.8 billion people across the world. We propose an ensemble learning model based on Arabic variants of BERT models. Our system achieves a Partial Reciprocal Rank (pRR) score of 56.6% on the official test set.
arXiv Detail & Related papers (2022-06-03T13:00:48Z)
Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition [56.048783994698425]
Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages in East Africa. The Amharic writing system, Abugida, has 282 syllables, 15 punctuation marks, and 20 numerals. We presented the first comprehensive public datasets named HUST-ART, HUST-AST, ABE, and Tana for Amharic script detection and recognition in the natural scene.
arXiv Detail & Related papers (2022-03-23T03:19:35Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels. Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus [11.113497373432411]
We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain. This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel.
arXiv Detail & Related papers (2021-06-24T13:20:40Z)
The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling [19.525392906001624]
We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text.
arXiv Detail & Related papers (2021-04-29T23:53:37Z)
Spot the conversation: speaker diarisation in the wild [108.61222789195209]
We propose an automatic audio-visual diarisation method for YouTube videos. Second, we integrate our method into a semi-automatic dataset creation pipeline. Third, we use this pipeline to create a large-scale diarisation dataset called VoxConverse.
arXiv Detail & Related papers (2020-07-02T15:55:54Z)
Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning. Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness. We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data. Our model is able to recognize unseen phonemes in the target language without any training data. It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.